Creating Long-Term Backups with Amazon Glacier on Linux

Amazon Glacier LogoIf you haven’t heard about Amazon Glacier already, it’s definitely something to be excited about. Amazon Glacier is a service that makes it extremely affordable to store gigabytes upon gigabytes of data for the long term in the cloud. Your data is stored immediately, but retrieval requests take at least 4 hours to make your data available again for your downloading. Let’s back up a ton of files on Linux to Glacier.

For my purposes, I wanted to back up my entire music library, which is about 35 gigabytes of data. Due to the fact that Amazon doesn’t store folder structures, we’ll need to create archives of what we’d like to store. We’ll do that and split our archive into 200MB blocks out of convenience. I’m not currently aware of any per-file maximum size on Glacier, but it’s much easier to retry a 200MB upload rather than a 35GB one. Plus, this will make it much easier to script our uploads in a way that makes it easy to validate that our uploads completed properly. To zip up our music library, we’ll use tar and split, standard Linux utilities. I’ll be applying GZip compression to my tar archive, though this is not strictly necessary or even beneficial in our case, since most music files are already compressed using much more advanced algorithms than standard compression methods.

$ BACKUP_TIME="$(date +%Y%m%d%H%M%S)"
$ tar cvzf - ~/Music | split --bytes=200MB - "music.backup.$BACKUP_TIME.tar.gz."

What this will do is create a tar.gz archive split every 200MB into files looking like "music.backup.20130120120503.tar.gz.aa". The date is generated before we do any real work, and the aa suffix will change on the archive count, so you’ll have archives ending in aa, ab, ac, etc. If you’re as paranoid as you should be, you’d now encrypt these archive using a GPG key, but that’s outside of the scope of this tutorial for now.

Now that we have our data, we’ll use a Python script library called glacier-cmd to upload our files and perform other operations on our Glacier vaults. Let’s get that library installed:

$ sudo apt-get install python-setuptools git
$ git clone git://github.com/uskudnik/amazon-glacier-cmd-interface.git
$ cd amazon-glacier-cmd-interface
$ sudo python setup.py install

Everything should now be installed for us to be able to start our backup process. All we need to do before we start uploading is to setup our configuration file and create our vault. First, create a file at ~/.glacier-cmd filled with the following contents:

[aws]
access_key=YOUR_AWS_ACCESS_KEY
access_secret=YOUR_AWS_ACCESS_SECRET

[glacier]
region=YOUR_AWS_REGION
logfile=~/.glacier-cmd.log
loglevel=INFO
output=print

You’ll need to supply your AWS access/secret keys to proper variables as well as configure the AWS region. I’m in us-west-1, but you may wish to store your information in a different region.

Last step before uploading: create our Glacier vault. Choose a really awesome globally-unique vault name and create it like so:

$ glacier-cmd mkvault "my-super-ridiculously-awesome-longterm-backup-solution"

Provided that it completes properly, we’re in business. Now, let’s get onto the uploading. Since I’ve split my file into 200MB chunks, I’d like to upload them one at a time, moving completed uploads into another folder. For this, I’ve devised a pretty quick find/while loop to find my backup files, and run a series of commands on them.

$ find . -maxdepth 1 -type f -name "music.backup.*.tar.gz.*" | sort | while read file ; do
    echo "Uploading $(basename "$file") to Amazon Glacier."
    glacier-cmd upload --description "$(basename "$file")" \
        my-super-ridiculously-awesome-longterm-backup-solution "$file" && \
        mv "$file" "Completed Backups" 
done

I use a find command to locate all files matching the "music.backup.*.tar.gz" pattern in the current directory only. I pass each found file to the while loop and it makes each file available as the $file variable in the loop. Before each file is uploaded, I echo a status message to let me know which file I’m currently uploading. I then do the actual upload using the glacier-cmd we installed before, setting the description of each uploaded file to the actual filename. If the upload completes without an issue, I then move the uploaded file into the “Completed Backups” folder so that I can know the file was uploaded successfully.

After your upload completes in a few days, you can use other commands of the glacier-cmd utility to query the status of your vault. Remember that inventories take about 4 hours or so, so don’t expect to get a directory listing back so quickly. Welcome to long-term backup with Glacier!

5 thoughts on “Creating Long-Term Backups with Amazon Glacier on Linux

  1. Nice blog post.

    I wanted to have a practical example that illustrates the usage of Glacier which I found here :-) .

    If I were you however I would create a longer script that makes groups of mp3′s that are almost 200MB in total and then make a tarbal out of it. Because if you get corruption in one of the archives your whole mp3 collection is gone. Just my 2 cents.

  2. That’s for sure. Another thing to add would definitely be file-encryption, too. I don’t want someone looking at my data ;)

    I might just have to do another tutorial on this to fill these missing gaps. Getting tarballs of only 200MB would be kind of difficult to do, I wouldn’t know how to do that without getting too complicated in the script. I’d probably have to write a Python script which would find the files and group them in 200MB chunks.

    Is there a way to group them in 200MB chunks without making it one big, long, split tarball?

  3. Thanks, very useful.

    To keep it updated, the new version changes the name of “access_secret” to “secret_key”

  4. Hello,

    We are proud to present Glaciate Archive.
    It’s the only browser based Multi-Platform Amazon Glacier client (service) that has built in search,
    metadata management, retrieval calculator, pausable downloads, automatically updating inventory list, upload/download notifications and least but not least user management with the possibility of setting credit limits per user.

    You can check it out at http://www.glaciatearchive.com and request your demo account

    _
    Tonis Leissoo
    Glaciate Archive

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>