The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Net::Amazon::Glacier - An implementation of the full Amazon Glacier RESTful 2012-06-01 API.

VERSION

Version 0.15

SYNOPSIS

Amazon Glacier is Amazon's long-term storage service and can be used to store cold archives with a novel pricing scheme. This module implements the full Amazon Glacier RESTful API, version 2012-06-01 (current at writing). It can be used to manage Glacier vaults, upload archives as single part or multipart up to 40.000Gb in a single element and download them in ranges or single parts.

Perhaps a little code snippet:

        use Net::Amazon::Glacier;

        my $glacier = Net::Amazon::Glacier->new(
                'eu-west-1',
                'AKIMYACCOUNTID',
                'MYSECRET',
        );

        my $vault = 'a_vault';

        my @vaults = $glacier->list_vaults();

        if ( $glacier->create_vault( $vault ) ) {

                if ( my $archive_id = $glacier->upload_archive( './archive.7z' ) ) {

                        my $job_id = $glacier->inititate_job( $vault, $archive_id );

                        # Jobs generally take about 4 hours to complete
                        my $job_description = $glacier->describe_job( $vault, $job_id );

                        # For a better way to wait for completion, see
                        # http://docs.aws.amazon.com/amazonglacier/latest/dev/api-initiate-job-post.html
                        while ( $job_description->{'StatusCode'} ne 'Succeeded' ) {
                                sleep 15 * 60 * 60;
                                $job_description = $glacier->describe_job( $vault, $job_id );
                        }

                        my $archive_bytes = $glacier->get_job_output( $vault, $job_id );

                        # Jobs live as completed jobs for "a period", according to
                        # http://docs.aws.amazon.com/amazonglacier/latest/dev/api-jobs-get.html
                        my @jobs = $glacier->list_jobs( $vault );

                        # As of 2013-02-09 jobs are blindly created even if a job for the same archive_id and Range exists.
                        # Keep $archive_ids, reuse the expensive job resource, and remember 4 hours.
                        foreach my $job ( @jobs ) {
                                next unless $job->{ArchiveId} eq $archive_id;
                                my $archive_bytes = $glacier->get_job_output( $vault, $job_id );
                        }

                }

        }

The functions are intended to closely reflect Amazon's Glacier API. Please see Amazon's API reference for documentation of the functions: http://docs.amazonwebservices.com/amazonglacier/latest/dev/amazon-glacier-api.html.

CONSTRUCTOR

new( $region, $access_key_id, $secret )

VAULT OPERATORS

create_vault( $vault_name )

Creates a vault with the specified name. Returns true on success, croaks on failure. Create Vault (PUT vault)

delete_vault( $vault_name )

Deletes the specified vault. Returns true on success, croaks on failure.

Delete Vault (DELETE vault)

describe_vault( $vault_name )

Fetches information about the specified vault.

Returns a hash reference with the keys described by http://docs.amazonwebservices.com/amazonglacier/latest/dev/api-vault-get.html.

Croaks on failure.

Describe Vault (GET vault)

list_vaults

Lists the vaults. Returns an array with all vaults. Amazon Glacier List Vaults (GET vaults).

A call to list_vaults can result in many calls to the Amazon API at a rate of 1 per 1,000 vaults in existence. Calls to List Vaults in the API are free.

Croaks on failure.

set_vault_notifications( $vault_name, $sns_topic, $events )

Sets vault notifications for a given vault.

An SNS Topic to send notifications to must be provided. The SNS Topic must grant permission to the vault to be allowed to publish notifications to the topic.

An array ref to a list of events must be provided. Valid events are ArchiveRetrievalCompleted and InventoryRetrievalCompleted

Return true on success, croaks on failure.

Set Vault Notification Configuration (PUT notification-configuration).

get_vault_notifications( $vault_name )

Gets vault notifications status for a given vault.

Returns a hash with an 'SNSTopic' and and array of 'Events' on success, croaks on failure.

Get Vault Notifications (GET notification-configuration).

delete_vault_notifications( $vault_name )

Deletes vault notifications for a given vault.

Return true on success, croaks on failure.

Delete Vault Notifications (DELETE notification-configuration).

ARCHIVE OPERATIONS

upload_archive( $vault_name, $archive_path, [ $description ] )

Uploads an archive to the specified vault. $archive_path is the local path to any file smaller than 4GB. For larger files, see MULTIPART UPLOAD OPERATIONS.

An archive description of up to 1024 printable ASCII characters can be supplied.

Returns the Amazon-generated archive ID on success, or false on failure.

Upload Archive (POST archive)

upload_archive_from_ref( $vault_name, $ref, [ $description ] )

DEPRECATED at birth. Will be dropped in next version. A more robust upload_archive will support file paths, refs, code refs, filehandles and more.

In the meanwhile...

Like upload_archive, but takes a reference to your data instead of the path to a file. For data greater than 4GB, see multi-part upload. An archive description of up to 1024 printable ASCII characters can be supplied. Returns the Amazon-generated archive ID on success, or false on failure.

delete_archive( $vault_name, $archive_id )

Issues a request to delete a file from Glacier. $archive_id is the ID you received either when you uploaded the file originally or from an inventory. Delete Archive (DELETE archive)

MULTIPART UPLOAD OPERATIONS

Amazon requires this method for files larger than 4GB, and recommends it for files larger than 100MB.

Uploading Large Archives in Parts (Multipart Upload)

SYNOPSIS

        use Net::Amazon::Glacier;

        my $glacier = Net::Amazon::Glacier->new(
                'eu-west-1',
                'AKIMYACCOUNTID',
                'MYSECRET',
        );

        my $part_size = $glacier->calculate_multipart_upload_partsize( -s $filename );

        my $upload_id = $glacier->multipart_upload_init( $vault, $part_size, $description );

        open ( A_FILE, '<', 'a_file.bin' );

        my $part_index = 0;
        my $read_bytes;
        my $parts_hash = []; # to store partial tree hash for complete method

        # Upload parts of A_FILE
        do {
                $read_bytes = read ( A_FILE, $part, $part_size );
                $parts_hash->[$part_index] = $glacier->multipart_upload_upload_part( $vault, $upload_id, $part_size, $part_index, \$part );
        } while ( ( $read_bytes == $part_size) && $parts_hash->[$part_index++] =~ /^[0-9a-f]{64}$/ );
        close ( A_FILE );

        my $archive_size = $part_size * ( $part_index ) + $read_bytes;

        # Capture archive id or error code
        my $archive_id = $glacier->multipart_upload_complete( $vault, $upload_id, $parts_hash, $archive_size  );

        # Check if we have a valid $archive_id
        unless ( $archive_id =~ /^[a-zA-Z0-9_\-]{10,}$/ ) {
                # abort partial failed upload
                # could also store upload_id and continue later
                $glacier->multipart_upload_abort( $vault, $upload_id );
        }

        # Other useful methods
        # Get an array ref with incomplete multipart uploads
        my $upload_list = $glacier->multipart_upload_list_uploads( $vault );

        # Get an array ref with uploaded parts for a multipart upload
        my $upload_parts = $glacier->multipart_upload_list_parts( $vault, $upload_id );

calculate_multipart_upload_partsize ( $archive_size )

Calculates the part size that would allow to uploading files of $archive_size

$archive_size is the maximum expected archive size

Returns the smallest possible part size to upload an archive of size $archive_size, 0 when files cannot be uploaded in parts (i.e. >39Tb)

multipart_upload_init( $vault_name, $part_size, [ $description ] )

Initiates a multipart upload. $part_size should be carefully calculated to avoid dead ends as documented in the API. Use calculate_multipart_upload_partsize.

Returns a multipart upload id that should be used while adding parts to the online archive that is being constructed.

Multipart upload ids are valid until multipart_upload_abort is called or 24 hours after last archive related activity is registered. After that period id validity should not be expected.

Initiate Multipart Upload (POST multipart-uploads).

multipart_upload_upload_part( $vault_name, $multipart_upload_id, $part_size, $part_index, $part )

Uploads a certain range of a multipart upload.

$part_size must be the same supplied to multipart_upload_init for a given multipart upload.

$part_index should be the index of a file of N $part_size chunks whose data is passed in $part.

$part can must be a reference to a string or be a filehandle and must be exactly the part_size supplied to multipart_upload_initiate unless it is the last past which can be any non-zero size.

Absolute maximum online archive size is 4GB*10000 or slightly over 39Tb. Uploading Large Archives in Parts (Multipart Upload) Quick Facts

Returns uploaded part tree-hash (which should be store in an array ref to be passed to multipart_upload_complete

Upload Part (PUT uploadID).

multipart_upload_complete( $vault_name, $multipart_upload_id, $tree_hash_array_ref, $archive_size )

Signals completion of multipart upload.

$tree_hash_array_ref must be an ordered list (same order as final assembled online archive, as opposed to upload order) of partial tree hashes as returned by multipart_upload_upload_part

$archive_size is provided at completion to check all parts make up an archive an not before hand to allow for archive streaming a.k.a. upload archives of unknown size. Beware of dead ends when choosing part size. Use calculate_multipart_upload_partsize to select a part size that will work.

Returns an archive id that can be used to request a job to retrieve the archive at a later time on success and 0 on failure.

On failure multipart_upload_list_parts could be used to determine the missing part or recover the partial tree hashes, complete the missing parts and recalculate the correct archive tree hash and call multipart_upload_complete with a successful result.

Complete Multipart Upload (POST uploadID).

multipart_upload_abort( $vault_name, $multipart_upload_id )

Aborts multipart upload releasing the id and related online resources of a partially uploaded archive.

Abort Multipart Upload (DELETE uploadID).

multipart_upload_list_parts ( $vault_name, $multipart_upload_id )

Returns an array ref with information on all uploaded parts of the, probably partially uploaded, online archive.

Useful to recover file part tree hashes and complete a broken multipart upload.

List Parts (GET uploadID)

A call to multipart_upload_part_list can result in many calls to the Amazon API at a rate of 1 per 1,000 recently completed job in existence. Calls to List Parts in the API are free.

multipart_upload_list_uploads( $vault_name )

Returns an array ref with information on all non completed multipart uploads. Useful to recover multipart upload ids. List Multipart Uploads (GET multipart-uploads)

A call to multipart_upload_list can result in many calls to the Amazon API at a rate of 1 per 1,000 recently completed job in existence. Calls to List Multipart Uploads in the API are free.

JOB OPERATIONS

initiate_archive_retrieval( $vault_name, $archive_id, [ $description, $sns_topic ] )

Initiates an archive retrieval job. $archive_id is an ID previously retrieved from Amazon Glacier.

A job description of up to 1,024 printable ASCII characters may be supplied. Net::Amazon::Glacier does it's best to enforce this restriction. When unsure send the string and look for Carp.

An SNS Topic to send notifications to upon job completion may also be supplied.

Initiate a Job (POST jobs).

initiate_inventory_retrieval( $vault_name, $format, [ $description, $sns_topic ] )

Initiates an inventory retrieval job. $format is either CSV or JSON.

A job description of up to 1,024 printable ASCII characters may be supplied. Net::Amazon::Glacier does it's best to enforce this restriction. When unsure send the string and look for Carp.

An SNS Topic to send notifications to upon job completion may also be supplied.

Initiate a Job (POST jobs).

initiate_job( ( $vault_name, $archive_id, [ $description, $sns_topic ] )

Effectively calls initiate_inventory_retrieval.

Exists for the sole purpose or implementing the Amazon Glacier Developer Guide (API Version 2012-06-01) nomenclature.

Initiate a Job (POST jobs).

describe_job( $vault_name, $job_id )

Retrieves a hashref with information about the requested JobID.

Amazon Glacier Describe Job (GET JobID).

get_job_output( $vault_name, $job_id, [ $range ] )

Retrieves the output of a job, returns a binary blob. Optional range parameter is passed as an HTTP header. Amazon Glacier Get Job Output (GET output).

If you pass a range parameter, you're going to want the tree-hash for your chunk. That will be returned in an additional return value, so collect it like this:

        ($bytes, $tree_hash) = get_job_output(...)

list_jobs( $vault_name )

Return an array with information about all recently completed jobs for the specified vault. Amazon Glacier List Jobs (GET jobs).

A call to list_jobs can result in many calls to the Amazon API at a rate of 1 per 1,000 recently completed job in existence. Calls to List Jobs in the API are free.

ROADMAP

  • Online tests.

  • Implement a "simple" interfase in the lines of

                    use Net::Amazon::Glacier;
    
                    # Bless and upload something
                    my $glacier = Net::Amazon::Glacier->new( $region, $aws_key, $aws_secret, $metadata_store );
    
                    # Upload intelligently, i.e. in resumable parts, split very big files.
                    $glacier->simple->upload( $path || $scalar_ref || $some_fh );
    
                    # Support automatic archive_id to some description conversion
                    # Ask for a job when first called, return while it is not ready,
                    # return content when ready.
                    $glacier->simple->download( $archive_id || 'description', [ $ranges ] );
    
                    # Request download and spawn something, wait and execute $some_code_ref
                    # when content ready.
                    $glacier->simple->download_wait( $archive_id || 'description' , $some_code_ref, [ $ranges ] );
    
                    # Delete online archive
                    $glacier->simple->delete( $archive_id || 'description' );
  • Implement a simple command line cli with access to simple interface.

                    glacier new us-east-1 AAIKSAKS... sdoasdod... /metadata/file
                    glacier upload /some/file
                    glacier download /some/file (this would spawn a daemon waiting for download)
                    glacier ls

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Net::Amazon::Glacier

You can also look for information at:

BUGS

Please report any bugs or feature requests to bug-net-amazon-glacier at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Net-Amazon-Glacier. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SEE ALSO

See also Victor Efimov's MT::AWS::Glacier, an application for AWS Glacier synchronization. It is available at https://github.com/vsespb/mt-aws-glacier.

AUTHORS

Originally written by Tim Nordenfur, <tim at gurka.se>. Maintained by Gonzalo Barco <gbarco uy at gmail com, no spaces> Support for job operations was contributed by Ted Reed at IMVU. Support for many file operations and multipart uploads by Gonzalo Barco. Bugs, suggestions and fixes contributed by Victor Efimov and Kevin Goess.

LICENSE AND COPYRIGHT

Copyright 2012 Tim Nordenfur.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.