There might come a time where you’d want to backup a website on your own VPS. Perhaps you’d want to mirror, archive or preserve a website in its entirety. Luckily, Httrack is an app that allows you to accomplish this quite easily.
In this tutorial, we’ll use an Ubuntu 16.04 image. Depending upon the size and the amount of data that you’ll be backing up, you might want to get a VPS with enough disk space to accommodate your backup.
At the end of the tutorial, you’ll be able to mirror a website and publish it publicly using Apache. Let’s go ahead get started.
> sudo apt-get update
Once your repository is updated, you’re ready to kick off the Httrack installer.
> sudo apt-get install httrack –y
Congrats, you’ve installed Httrack! Let’s test it out to make sure it works.
The following command will allow you to backup the homepage of Ubuntu.com
> httrack “https://www.ubuntu.com/” -O “/tmp/www.ubuntu.com/”
The –O switch dictates where the output of the mirrored homepage will reside. In the above example, we’re simply putting the contents of the archived website in the TMP directory. Httrack is extremely powerful in the sense that if we put this data in our Apache HTML directory, we can see the result of the copied website through your browser.
Let’s go ahead and try this out.
Let’s go back to our SSH session and kickoff the Apache install.
> sudo apt-get install apache2 –y
This is a precautionary step to ensure that the proper ports are open on your server
> sudo ufw allow in “Apache Full”
To make sure that this worked, we can go to http://<YOURIPADDRESS> and you’ll see the Apache config page.
With this command, we’ll mirror the Ubuntu homepage to the default Apache directory
> httrack “https://www.ubuntu.com/” -O “/var/www/html/www.ubuntu.com”
So if you want to test this out, you can go to
http://<YOURIPADDRESS>/www.ubuntu.com/
Here is the output:
You might find yourself in a spot where it might be prudent to use a proxy to mirror websites. Here’s how to do this:
> sudo httrack “https://www.ubuntu.com/” -O “/var/www/html/www.ubuntu.com” -P <user>:<pass>@<proxy>:<port>
You’d replace the following attributes in the above statements with your proxy server’s information
Tip: You can find a list of updated proxy servers at ProxyNova.com. If your proxy server doesn’t require a username and password, you can simply delete the @ sign and everything before it up until the –P switch.
Your command would look like this:
> sudo httrack “https://www.ubuntu.com/” -O “/var/www/html/www.ubuntu.com” -P <proxy>:<port>
On the Httrack website, you’ll find a complete “User’s Guide” that was written by Fred Cohen. This guide will give you the ins and outs of the Httrack app. For example, with the commands listed in the User’s Guide, you’ll be able to throttle the rate at which you grab web pages and use specific parameters to grab exactly what you need to mirror.
Have you mirrored a website with Httrack? Tell us about your experiences with Httrack in the comments section below.