How to download website from web.archive.org on Windows or Linux
Introduction
It exists a lot of softwares to download websites on internet and surf on them offline. It can be useful to piece of documentation or interesting content. But if the site is offline and only available on web.archive.org, it can be tricky to download it. There is some company which can help you to download web.archive.org content but this is not free ! Hopefully, a nice project was released on GitHub named wayback-machine-downloader and written by hartator. This project is available here : https://github.com/hartator/wayback-machine-downloader.
Process
Please note that native wayback-machine-downloader works only on linux.
Windows
If you are on Linux, please go directly on the next chapter.
You need a Linux Bash simulator to execute wayback-machine-downloader. Personnally, I advice you to get Cygwin Portable. You can get it on the link below.
https://ci.appveyor.com/project/MachinaCore/cygwinportable/build/artifacts
- Install Cygwin Portable on C:\CygwinPortable.
- Download and unzip.
- Launch CygwinPortable.exe, CygwinConfig will run automatically.
- Then, download Ruby and Git through CygwinConfig.
- Open Cygwin bash
Linux
- If you are on Linux download and install Ruby and Git
- Open bash shell
Download on web.archive.org
1 2 3 |
git clone https://github.com/hartator/wayback-machine-downloader gem install wayback_machine_downloader cd wayback-machine-downloader/bin |
Then, to download <http://www.exemple.org>
1 |
./wayback_machine_downloader <http://www.exemple.org> --concurrency 20 |
For exemple with satanion.dk (it was a very good Counter-Strike website)
1 |
./wayback_machine_downloader http://www.satanion.dk --concurrency 20 |
Et voilà !
Thanks to hartator for this very good tool !
0 commentaire