AWS Simple Storage Service (S3)

This is part of a blog series giving a high level overview of the different services examined on the AWS Solution Architect Associate exam, to view the whole series click here.

S3 Summary

  • Storage service that is highly scalable, secure and performant
  • It is OBJECT BASED storage (suitable for files, but not suitable to store an operating system on)
  • There is unlimited storage, but individual files uploaded can be from 0 bytes to 5TB
  • S3 is a UNIVERSAL NAMESPACE, so bucket names need to be globally unique. The reason why is because it creates a web address (DNS name) with the buckets name in it e.g. https://bucketname.s3…
  • By default newly created buckets are private, but you can make them public if needed, for example - you would need to make it public for static web hosting purposes.
  • When you upload a file to S3, you receive a HTTP 200 code if the file upload is successful.
  • You can turn on MFA Delete to protect against accidental deletes
  • Bucket policies are bucket wide
  • Access Control Lists can be for individual files

S3 Object is made up of

  1. Key → Name of the object
  2. Value → The data
  3. Version ID
  4. Metadata
  5. Sub-resources (Access Control Lists & Torrent)

S3 Consistency

  • Delivers strong read-after-write consistency for PUTS and DELETES of objects, for both new objects and for updates to existing objects.
  • This means once there is a successful write, overwrite or delete — the next read request automatically receives the latest version of the object.

S3 Guarantees

S3 Guarantees 99.99% Availability

S3 Guarantees 99.99999999999(9 x11) Durability

S3 Tiered Storage (Storage Classes)

S3 Standard

  • 99.99% Availability
  • 99.99999999999(9 x11) Durability
  • Objects stored across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently.
  • Use case: General purpose storage for any type of frequently used data.
  • Latency of 1–4 milliseconds

S3-IA (Infrequently Accessed)

  • For data that is not accessed very frequently — but once it is accessed it needs to be retrieved rapidly.
  • Is cheaper than standard S3, but you do get charged a retrieval fee
  • Latency of 1–4 milliseconds

S3 One Zone-IA

  • 99.99% durability but only has 99.50% availability
  • Low cost option for data that is not accessed frequently and does not require the resilience of being stored across multiple availability zones
  • Use case: re-creatable infrequently accessed data that needs milliseconds access.
  • Latency of 1–4 milliseconds

S3 Intelligent Tiering

  • Optimises costs as it can automatically move objects to the most cost effective access tier, without performance impact or operational overhead.
  • Use case: automatic cost savings for data with unknown or changing access patterns
  • Latency of 1–4 milliseconds

S3 Glacier

  • For data archiving — it is secure, durable and low cost.
  • Retrieval times configurable from minutes to hours.
  • Latency of minutes to hours

S3 Glacier Deep Archive

  • AWS S3’s lowest cost storage class, used for archiving but has a slow retrieval time.
  • Latency of hours

S3 Pricing

In S3 you pay for the following things:

  • Storage
  • Requests
  • Storage Management Pricing
  • Data Transfer Pricing
  • Transfer Acceleration
  • Cross Region Replication Pricing

Saving Costs

  • S3 Standard is the most expensive
  • S3 Intelligent tiring is exactly the same price as the s3 standard buckets. However, it does give you access to the infrequently access — so you could save money!
  • However, if you have a lot of objects you are going to incur monitoring and automation charges.
  • S3 Glacier Deep Archive is the cheapest.
  • S3 Standard is the most expensive, if you are going to use it — why not use S3 Intelligent tiering, unless you have thousands or millions of objects.

Cross Region Replication

  • If you have this turned on for example for us-east-1 & eu-east-1 anytime something is uploaded to us-east-1 it will be automatically replicated eu-east-1.
  • Cross Region Replication REQUIRES versioning to be ENABLED on both SOURCE & DESTINATION bucket.
  • You can have this enabled for the entire bucket or just for specific prefixes
  • Files in an existing bucket are not replicated automatically once this is enabled — only subsequent updated files.
  • Delete markers ARE NOT replicated

Sharing S3 buckets Across Accounts

If you have two accounts within the same organisation you can use any of these to share the an S3 bucket with both accounts:

  • Bucket policy & IAM — applies to entire bucket, but programmatic access only
  • Using bucket ALCs & IAM — can apply to individual objects — programatic access only
  • Cross Account IAM roles — programatic and console access

S3 Transfer Acceleration

  • Uses CloudFront’s globally distributed edge locations to enable fast and secure transfers of files over long distances between your client and S3.
  • As data arrives at an edge location, it is routed to S3 over an optimised network path.
  • Really speeds up upload time as user is uploading their data to the edge location rather than directly to the S3 bucket.

S3 Security

Access Control lists

Can grant basic read and write permissions at an object level (not just whole bucket)

For example: use if there is a file in a bucket you don’t want everyone to have access to.

Bucket policy

This works at budget levels not individual file level. Applies to whole bucket!

S3 Signed URLS

  • Used to secure content so that only people you authorise are able to access it.
  • Different from CloudFront signed urls
  • Use when not using CloudFront and people have direct access to S3
  • Issues a request as the IAM user who creates the pre-signed URL (Same permissions)
  • S3 Signed URL has an LIMITED LIFETIME

Encryption

  1. Encryption in Transit — encrypting network traffic using SSL/TLS
  2. Encryption at Rest (Server Side) — happens server side, encrypting the data which is stored. Can be achieved by:
  • S3 Managed Keys (SSE-S3), AWS Managed Keys
  • AWS Key Management Service(SSE-KMS) AWS & you manage keys together
  • Customer provided keys — give AWS you own keys that you manage.

3. Encryption at Rest (Client Side) — you can encrypt the object yourself before you upload

S3 Versioning

  • Is a backup tool that stores all versions of an object (even writes & deletes)
  • When enabled on your bucket it cannot be disabled — only suspended
  • It is possible to integrate it with life cycle rules
  • If you mark a single file as public and then upload a new version of it — the new version is private
  • The size of your S3 bucket is the sum of all files and all versions of those files
  • If you delete a file it will still show up in versioning with the delete marker on it.
  • Has MFA capability for delete

S3 Lock Policies

  • Helps block objects from being deleted or modified for a custom-defined retention period or indefinitely.
  • Stores objects using a Write Once, Read Many (WORM) model.
  • Lock protection is maintained regardless of storage class and throughout the S3 Lifecycle transitions between storage classes.
  • Can be used to meet regulatory requirements as an extra layer of protection

Retention Period → period that protects an object version for a fixed amount of time. Once it expires the object can be overwritten. Unless there is a LEGAL HOLD placed on its version.

S3 has two types of retention mode:

  1. Governance Mode → Users can’t overwrite , delete or alter the object version locked without special permissions — but users can be granted this access.
  2. Compliance Mode → A protected object version can’t be overwritten or deleted by ANY user including the root user during its retention period

Legal Hold → Prevents object version from being overwritten or deleted. It doesn’t have a retention period, it is in effect until removed

Glacier Vault Lock → enforce compliance controls on individual S3 Glacier vaults using a vault lock policy.

Life cycle rules

Used to manage your objects, automate transitions to tiered storage and expire objects. Can be used in conjunction with versioning

Example use case:

You can use a lifecycle rule to say 30 days after creation move the file to glacier and then after another 60 move to glacier deep archive.

The you can also setup an expiry after 365 days to delete the files

S3 Performance

S3 Has extremely low latency

Performance limitations

  • KMS can slow down performance as you need to call GenerateDataKey when uploading files and decrypt when downloading.
  • KMS also has a per second quota, which could affect performance

Improving Performance

  • S3 Prefix is the part between the bucket name and the filename. You can get better performance by spreading your reads across different prefixes.
  • It is recommended for files that are over 100mb that you use multi-part uploads to improve performance, as it splits your file into parts and uploads them in Parallel.
  • For download this is call S3 Byte Range Fetches — Parallelises download by specifying byte ranges, which speeds up downloads and can download partial amounts of info.

S3 Select

Enables application to retrieve only a subset of data from an object by using simple SQL. Allows you to save money on data transfer and increase speed.

S3 Storage Gateway

  • Is a hybrid cloud storage service for connecting on-premises software applications with cloud based storage.
  • Allows your on-premise to access virtually unlimited cloud storage.
  • Can be downloaded as a Virtual Machine Image and installed in your datacenter.
  • Has low latency as it caches data in the local VM or gateway hardware appliance.

3 types of Storage Gateways:

  • File Gateway
  • Volume Gateway
  • Tape Gateway
  1. File Gateway
  • Stores objects directly in s3
  • Utilises standard storage protocols with NFS & SMB
  • Common use case is for on-premise backup to the cloud

2. Volume Gateway

  • Presents your applications with disk volumes using ISCSI block protocol
  • Stores/manages on-premise data in S3
  • It allows you to take point-in-time snapshots using AWS Backup and stores them in EBS (Only captures changed blocks)

Types of Volume Gateways:

  • Volume Gateway (Stored Volumes) — Store you primary data locally so there is low latency to the entire dataset and then asynchronously backs up that data to S3.
  • Volume Gateway (Cached Volumes) — Uses s3 as your primary storage while retaining frequently accessed data locally. Minimise need to scale your on-premise infrastructure

3. Tape Gateway

  • Durable, cost effective archiving
  • Is a way of replacing physical tapes with a virtual tape interface in AWS without changes existing backup workflows

Software Engineer