AWS Services - Glacier

What is Amazon S3 Glacier?
Amazon S3 Glacier is a low-cost cloud storage service for data archiving and long-term backup. It is optimized for data that is infrequently accessed and for which retrieval times of minutes to hours are acceptable.
Key features:
- Extremely low storage cost compared to S3 Standard
- Designed for long-term retention (years/decades)
- Retrieval options:
- Expedited (1–5 minutes, higher cost)
- Standard (3–5 hours)
- Bulk (5–12 hours, lowest cost)
- Use cases:
- Compliance and regulatory archives
- Backups
- Digital preservation
Creating a Glacier Vault
Using AWS Console (GUI)
- Go to AWS Management Console → S3 Glacier.
- Click Create Vault.
- Choose a region.
- Enter a vault name (e.g.,
myvault). - Optionally configure notifications or tags.
- Click Create Vault.
Using AWS CLI
aws glacier create-vault --account-id - --vault-name myvault
Uploading Data to Glacier
Unlike S3 Standard storage, Glacier does not allow direct object upload from the console. You must use:
- AWS CLI
- AWS SDKs
- Lifecycle policies (transitioning S3 objects into Glacier)
Example: Multipart Upload with AWS CLI
1. Create a Large File (3 MB)
dd if=/dev/urandom of=largefile bs=3145728 count=1
2. Split File into 1 MB Chunks
split --bytes=1048576 --verbose largefile chunk
Produces: chunkaa, chunkab, chunkac.
3. Initiate Multipart Upload
aws glacier initiate-multipart-upload --account-id - --archive-description "multipart upload test" --part-size 1048576
--vault-name myvault
Response:
{
"location": "...",
"uploadId": "ssW0Kx..."
}
4. Assign Upload ID and Upload Parts
UPLOADID="ssW0Kx..."
aws glacier upload-multipart-part
--upload-id $UPLOADID --body chunkaa --range 'bytes 0-1048575/*' --account-id - --vault-name myvault
aws glacier upload-multipart-part --upload-id $UPLOADID --body chunkab --range 'bytes 1048576-2097151/*' --account-id -
--vault-name myvault
aws glacier upload-multipart-part --upload-id $UPLOADID --body chunkac --range 'bytes 2097152-3145727/*' --account-id -
--vault-name myvault
5. Generate Checksums for Each Part
openssl dgst -sha256 -binary chunkaa > hash1
openssl dgst -sha256 -binary chunkab > hash2
openssl dgst -sha256 -binary chunkac > hash3
6. Combine Hashes
cat hash1 hash2 > hash12
openssl dgst -sha256 -binary hash12 > hash12hash
cat hash12hash hash3 > hash123
openssl dgst -sha256 hash123
Result:
SHA256(hash123)= 9628195f...
Assign to variable:
TREEHASH=9628195f...
7. Complete Multipart Upload
aws glacier complete-multipart-upload --checksum $TREEHASH --archive-size 3145728 --upload-id $UPLOADID --account-id -
--vault-name myvault
Retrieving Data from Glacier
Data in Glacier cannot be instantly retrieved. You must initiate a retrieval job.
Example:
aws glacier initiate-job --account-id - --vault-name myvault --job-parameters '{"Type": "archive-retrieval", "
ArchiveId": "ARCHIVE_ID"}'
Then check job status:
aws glacier describe-job --account-id - --vault-name myvault --job-id JOB_ID
Once completed, download the archive:
aws glacier get-job-output --account-id - --vault-name myvault --job-id JOB_ID output_file.txt
Lifecycle Policies with S3 + Glacier
Often, instead of directly uploading to Glacier, organizations:
- Store objects in S3 Standard first
- Define a lifecycle policy to transition them into S3 Glacier or S3 Glacier Deep Archive after a number of days
Example use case:
- Keep logs in S3 Standard for 30 days
- Move them to Glacier for 1 year
- Finally, move them to Glacier Deep Archive for long-term retention
Additional Notes
- Glacier Vault Lock: Provides compliance controls (e.g., write-once-read-many, WORM).
- Pricing: Very cheap storage, but retrieval and early deletion (less than 90 days) have additional costs.
- Alternatives: For infrequently accessed but still “hot” data, consider S3 Glacier Instant Retrieval or S3 Glacier Flexible Retrieval storage classes instead of pure Glacier vaults.