Creating Long-Term Backups with Amazon Glacier on Linux

Amazon Glacier LogoIf you haven’t heard about Amazon Glacier already, it’s definitely something to be excited about. Amazon Glacier is a service that makes it extremely affordable to store gigabytes upon gigabytes of data for the long term in the cloud. Your data is stored immediately, but retrieval requests take at least 4 hours to make your data available again for your downloading. Let’s back up a ton of files on Linux to Glacier.

For my purposes, I wanted to back up my entire music library, which is about 35 gigabytes of data. Due to the fact that Amazon doesn’t store folder structures, we’ll need to create archives of what we’d like to store. We’ll do that and split our archive into 200MB blocks out of convenience. I’m not currently aware of any per-file maximum size on Glacier, but it’s much easier to retry a 200MB upload rather than a 35GB one. Plus, this will make it much easier to script our uploads in a way that makes it easy to validate that our uploads completed properly. To zip up our music library, we’ll use tar and split, standard Linux utilities. I’ll be applying GZip compression to my tar archive, though this is not strictly necessary or even beneficial in our case, since most music files are already compressed using much more advanced algorithms than standard compression methods.

$ BACKUP_TIME="$(date +%Y%m%d%H%M%S)"
$ tar cvzf - ~/Music | split --bytes=200MB - "music.backup.$BACKUP_TIME.tar.gz."

What this will do is create a tar.gz archive split every 200MB into files looking like "music.backup.20130120120503.tar.gz.aa". The date is generated before we do any real work, and the aa suffix will change on the archive count, so you’ll have archives ending in aa, ab, ac, etc. If you’re as paranoid as you should be, you’d now encrypt these archive using a GPG key, but that’s outside of the scope of this tutorial for now.

Now that we have our data, we’ll use a Python script library called glacier-cmd to upload our files and perform other operations on our Glacier vaults. Let’s get that library installed:

$ sudo apt-get install python-setuptools git
$ git clone git://
$ cd amazon-glacier-cmd-interface
$ sudo python install

Everything should now be installed for us to be able to start our backup process. All we need to do before we start uploading is to setup our configuration file and create our vault. First, create a file at ~/.glacier-cmd filled with the following contents:



You’ll need to supply your AWS access/secret keys to proper variables as well as configure the AWS region. I’m in us-west-1, but you may wish to store your information in a different region.

Last step before uploading: create our Glacier vault. Choose a really awesome globally-unique vault name and create it like so:

$ glacier-cmd mkvault "my-super-ridiculously-awesome-longterm-backup-solution"

Provided that it completes properly, we’re in business. Now, let’s get onto the uploading. Since I’ve split my file into 200MB chunks, I’d like to upload them one at a time, moving completed uploads into another folder. For this, I’ve devised a pretty quick find/while loop to find my backup files, and run a series of commands on them.

$ find . -maxdepth 1 -type f -name "music.backup.*.tar.gz.*" | sort | while read file ; do
    echo "Uploading $(basename "$file") to Amazon Glacier."
    glacier-cmd upload --description "$(basename "$file")" \
        my-super-ridiculously-awesome-longterm-backup-solution "$file" && \
        mv "$file" "Completed Backups" 

I use a find command to locate all files matching the "music.backup.*.tar.gz" pattern in the current directory only. I pass each found file to the while loop and it makes each file available as the $file variable in the loop. Before each file is uploaded, I echo a status message to let me know which file I’m currently uploading. I then do the actual upload using the glacier-cmd we installed before, setting the description of each uploaded file to the actual filename. If the upload completes without an issue, I then move the uploaded file into the “Completed Backups” folder so that I can know the file was uploaded successfully.

After your upload completes in a few days, you can use other commands of the glacier-cmd utility to query the status of your vault. Remember that inventories take about 4 hours or so, so don’t expect to get a directory listing back so quickly. Welcome to long-term backup with Glacier!

28 thoughts on “Creating Long-Term Backups with Amazon Glacier on Linux

  1. Nice blog post.

    I wanted to have a practical example that illustrates the usage of Glacier which I found here :-).

    If I were you however I would create a longer script that makes groups of mp3′s that are almost 200MB in total and then make a tarbal out of it. Because if you get corruption in one of the archives your whole mp3 collection is gone. Just my 2 cents.

  2. That’s for sure. Another thing to add would definitely be file-encryption, too. I don’t want someone looking at my data ;)

    I might just have to do another tutorial on this to fill these missing gaps. Getting tarballs of only 200MB would be kind of difficult to do, I wouldn’t know how to do that without getting too complicated in the script. I’d probably have to write a Python script which would find the files and group them in 200MB chunks.

    Is there a way to group them in 200MB chunks without making it one big, long, split tarball?

  3. Thanks, very useful.

    To keep it updated, the new version changes the name of “access_secret” to “secret_key”

  4. Hello,

    We are proud to present Glaciate Archive.
    It’s the only browser based Multi-Platform Amazon Glacier client (service) that has built in search,
    metadata management, retrieval calculator, pausable downloads, automatically updating inventory list, upload/download notifications and least but not least user management with the possibility of setting credit limits per user.

    You can check it out at and request your demo account

    Tonis Leissoo
    Glaciate Archive

  5. I’m using version a2d5763 of glacier-cmd from Tue Mar 5 2013 and “access_key” and “access_secret” do not work anymore. You get “glacier-cmd: error: argument –aws-secret-key is required”.
    Using “aws-secret-key” and “aws-access-key” in ~/.glacier-cmd works.

  6. For a performance increase, try pigz instead of gzip.

    tar cvf – /mnt/s10_1 | pigz -4 | split -a 4 –bytes=200M – “s10_1.$BACKUP_TIME.tar.gz.”

    I also used ‘split -a 4′ because the amount of data I was splitting.

    For installing on RH based systems, try:

    # yum install python-setuptools git
    # pip-2.6 install boto –upgrade
    # git clone git://
    # cd amazon-glacier-cmd-interface
    # python install

    Also, as per Marc above, .glacier-cmd needs to look like this now:


    region=us-east-1 (or whatever)

  7. Hi,

    I am using this tool but i am not able to upload data into glaceir.

    Also not getting any error.


    1. I have made tar file
    2. Vault created
    3. then run the script as you guide.

    find . -maxdepth 1 -type f -name “files.tar*” | sort | while read file ; do
    echo “Uploading $(basename “$file”) to Amazon Glacier.” glacier-cmd upload –description “$(basename “$file”)” vaultname “$file” && mv “$file” “Completed Backups”

    Please help

  8. Thanks for this but alas nothing happened for me – all I got was an ‘>’ and a flashing cursor for ages. I don’t think I understand what to do exactly with your series of commands – certainly trying to cut and paste them didn’t work, nor did typing it all out on one line. O well.
    What I would like to know that I can’t seem to do a successful search on, is how do I remove the ‘$ BACKUP_TIME=”$(date +%Y%m%d%H%M%S)”‘ reference from my system? It most likely does no harm, but I just want to remove it if possible.

    • There’s not much to do to remove the BACKUP_TIME variable, it’s only persisted for your current shell. This article was written mostly from a tutorial point-of-view targeted at people already comfortable with Linux scripting and automation.

  9. I wrote some python code to keep file integrity instead of splitting the tar with split. Files will likely be smaller than 200 since I only record the input size, not the output. I look forward to any comments:
    [code]#!/usr/bin/env python
    import logging
    import argparse
    import tarfile
    from pprint import pprint
    import os
    import os.path as pa

    def parseInputs():
    parser = argparse.ArgumentParser()
    parser.add_argument("-d","--directory",help="Directory containing files to compress",default="music",required=True)
    parser.add_argument("-s","--size",help="Number of bytes after text to parse in Megabytes (MB)", default=200, type=int)
    parser.add_argument("-o","--output",help="The output filename template", default="compressed")
    parser.add_argument("-v","--verbose",help="Turn verbose on. (1-5, 5 is least verbose)", default=1, type=int)
    return ( args.verbose, args.size,, args.output)

    class tarlimit:
    """Creates multiple tar files from a given input directory
    limiting the size of each tar file to "size" MBs. Additionally,
    it breaks files along file boundaries, which would be safer
    should a corruption occur than using tar xyz | split method.
    results will be autoincremented starting at
    NOTE: this will always deliver a size less than "size" MB
    since it solely calculates the input file size and assumes compression
    will deliver a smaller size.
    possible decompression method:
    for ii in *.bz2; do tar xvf $ii; done

    loggingLevels={1: logging.DEBUG,
    2: logging.INFO,
    3: logging.WARNING,
    4: logging.ERROR,
    5: logging.CRITICAL}

    def __init__(self, verboseLevel, size, directory, output):
    #debug(), info(), warning(), error(), critical()
    if (verboseLevel in range(1,6)):
    self.logger.critical("Incorrect logging level specified!") = directory
    self.size = size
    self.outputfiletemplate = output
    self.logger.debug("Using root directory {0}".format(

    def run(self):
    """Get directory structure and start adding sizes up."""
    directorylist = os.walk(, onerror="self.exitgracefully")
    sumsize = 0
    numberofiterations = 0
    lowerbound = 0
    filenametocompress = self._createfilename(numberofiterations)"Using compressed file: {0}".format(filenametocompress))
    tar =, "w:bz2") # open first file
    for root, dirs, files in directorylist:
    totalpathnames = (pa.join(root,name) for name in files) # note using generator, not list
    for filetoadd in totalpathnames:
    sumsize += pa.getsize(filetoadd)
    self.logger.debug("Adding file {0}".format(filetoadd))
    if round(sumsize/1.0e6) > self.size:
    self.logger.debug("Input size is {0} MB".format(round(sumsize/1.0e6)))
    numberofiterations += 1
    sumsize = 0
    filenametocompress = self._createfilename(numberofiterations)
    tar =, "w:bz2") # open next file"Using compressed file: {0}".format(filenametocompress))
    #write the file out
    return numberofiterations

    def _createfilename(self,iter):
    return "{0}{1:04d}.tar.bz2".format(self.outputfiletemplate,iter)

    def exitgracefully(self, errorname):
    self.logger.debug("Exiting with error {0}".format(errorname))
    self.logger.debug("Error filename {0}".format(errorname.filename))
    return 0

    if __name__ == "__main__":
    #psid=parseSatID(filename=inputFilename, numberOfBytesToUse=numBytes)
    verboseLevel, numberOfBytes, filename, outputtmplt = parseInputs()
    psid=tarlimit(verboseLevel, numberOfBytes, filename, outputtmplt )
    dat =
    print("output {0} files.".format(dat))


  10. Pingback: Amazon AWS Glacier backup og filarkiv løsning er markedets billigste.

  11. Hi I am not able to proceed with python install on centos distro I had tried on more than three server and on all the server this ends up with below error any help would be appreciated.

    Python 2.4.3 (#1, Jan 9 2013, 06:47:03)
    [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2

    {‘__file__’:setup_script, ‘__name__’:’__main__’}
    File “”, line 45
    with open(“README.rst”) as f:
    SyntaxError: invalid syntax

    • forget it i think i used /usr/bin/python2.6 and got rid of the error hope its installed now let u know folks in case of any help

  12. Through the years, there have been numerous devices and options developed for males seeking to enhance the size of their manhood. Nonetheless, not all devices and options are created equally.

  13. My spouse and I stumbled over here by a different web address and thought
    I may as well check things out. I like what I see so i am just following you.
    Look forward to finding out about your web page

  14. Thank you for your blog post
    I have been looking into Amazon Glacier with excitment to back-up my NAS server using linux – I have not trusted cloud storage but planned to use ecryptfs with rsync.
    I was attracted by the affordability as my data does not change much, and my current archive is around 90 GB.
    Thanks to you, I will look for another solution because I thought that directory strutcure as honoured.
    Perhaps Amazon S3, but doubt it.

  15. Very nice and I will almost certainly adapt this along with some of the suggestions in earlier comments for my own needs. Three points, though:

    First, if you are going to encrypt each split tarball with a GPG key (ideally one generated for that purpose) then you’d be better served by dropping the gzip function since GPG includes compression by default (supporting zlib, bzip2 and zip). You can easily run that with the –encrypt-files option to encrypt the lot of them to filename.tar.NN.gpg after the archives are created and split, then later use –decrypt-files to batch decrypt them before rejoining the archive to a tarball.

    Second, if file and directory structure is considered essential (possibly including utf-8 file names and so on) then an alternative to using tar would be to create a virtual volume (e.g. a VirtualBox VDI) and prepare that for your needs. This would enable taking advantage of whole volume encryption (e.g. LUKS with cryptsetup) as well as an archive with an integrated filesystem of your choice. Obviously this adds to time and effort (some users may find this mitigated by using TrueCrypt style containers or whichever alternative they’ve opted for), but for some types of data it may be worth it (I wouldn’t use it for a music backup, but I would for certain personal or financial documents). There would still be the issue of managing retrieval costs and time to access if necessary, but at a few cents per GB for essential data that’s up to each individual to determine (I’m thinking about the sort of things which would be devastating for me to lose and if I were in that situation would spend hundreds of times that to get back … which pretty much answers that question for me).

    Finally, I’d probably include a step which automatically generated a db (sqlite3 is fine for this or even just a spreadsheet or .csv file) with the contents of each file, relevant access data (i.e. AWS identifiers) and so on, then encrypt that too and put it somewhere a little more accessible (e.g. a private S3 bucket as well as a local copy). That should avoid much of the trouble with worrying about that data which Amazon expects users to manage.

  16. What happens when you add new music to your collection? Do you have to upload the entire collection of tar.gz files again? Do you have a solution for uploading diffs?

  17. I got inspired from your blog post and did my own solution to the problem (fun project for those late hours of the night).

    Hope you don’t mind me sharing what I did on your thread since this seems to have become a focus point for most of us glacier users ;-)

    The tool supports par2, encryption, signature and splitting archives into segments (but not splitting individual files). It does incremental backup and can even be used without glacier before you commit 100% to the concept.

    I use it for mail backup as well as photos, home videos and documents.

    Let me know what you think and feel free to contribute :-) and thanks for being the spark that got me going :-D

  18. Hello! I have a lot of mistakes, like this:
    File “/usr/bin/glacier”, line 4, in
    __import__(‘pkg_resources’).run_script(‘boto==2.43.0′, ‘glacier’)
    File “build/bdist.linux-x86_64/egg/”, line 517, in run_script
    File “build/bdist.linux-x86_64/egg/”, line 1436, in run_script
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/EGG-INFO/scripts/glacier”, line 161, in
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/EGG-INFO/scripts/glacier”, line 149, in main
    list_vaults(region, access_key, secret_key)
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/EGG-INFO/scripts/glacier”, line 98, in list_vaults
    layer2 = connect(region, access_key = access_key, secret_key = secret_key)
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/EGG-INFO/scripts/glacier”, line 90, in connect
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/boto/glacier/”, line 41, in connect_to_region
    return region.connect(**kw_params)
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/boto/”, line 187, in connect
    return self.connection_cls(region=self, **kw_params)
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/boto/glacier/”, line 38, in __init__
    self.layer1 = Layer1(*args, **kwargs)
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/boto/glacier/”, line 98, in __init__
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/boto/”, line 569, in __init__
    host, config, self.provider, self._required_auth_capability())
    File “/usr/lib/python2.7/site-packages/boto-2.43.0-py2.7.egg/boto/”, line 993, in get_auth_handler
    ‘Check your credentials’ % (len(names), str(names)))
    boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
    what i should do?

  19. Pingback: My backup methods – Coding Buzz: Blog

Leave a Reply to Marc Richter Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>