Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AWS S3

General Overview

  • Object storage service for scalable, durable data storage.
  • 99.999999999% (11 9's) durability; 99.99% availability for most classes.
  • Unlimited storage; pay for usage (storage, requests, data transfer).
  • Global via multi-Region access; integrates with AWS services (EC2, Lambda, etc.).
  • Data Model:
    • Bucket – top-level container (unique name, global namespace)
    • Object – file + metadata
    • Key – full path to object within a bucket
  • There is no concept of directory in General-purpose S3.
  • Objects size is limited at 5GB objects with more than 5GB, must use "multi-part" upload.
  • Objects can have key-value pairs of Metadata and can have key-value tags (useful for security/lifecycles)

Storage Classes

ClassUse CaseDurabilityAvailabilityRetrieval TimeMinimum Storage DurationRetrieval Fee
S3 StandardFrequent access11 9s99.99%InstantNoneNo
S3 Intelligent-TieringUnknown access patterns11 9s99.9–99.99%InstantNoneNo
S3 Standard-IAInfrequent access11 9s99.9%Instant30 daysYes *
S3 One Zone-IANon-critical infrequent data11 9s99.5%Instant30 daysYes
S3 Glacier Instant RetrievalRarely accessed, quick retrieval11 9s99.9%ms90 daysYes
S3 Glacier Flexible RetrievalArchive w/ minutes–hours access11 9s99.99%minutes–hours90 daysYes
S3 Glacier Deep ArchiveLong-term cold storage11 9s99.99%hours (12h typical)180 daysYes

* : Retrieval is priced per GB.

Glacier Retrieval Options

TierFlexible RetrievalDeep Archive
Expedited1-5 minutesN/A
Standard3-5 hours12 hours
Bulk5-12 hours48 hours

Versioning

Lifecycle rule actions

Transition rule actions

  • (R1) Transition current versions of objects between storage classes.
    • Storage class transitions (Target storage class).
    • Days after object creation.
  • (R2) Transition noncurrent versions of objects between storage classes.
    • Storage class transitions.
    • Days after objects become noncurrent.
    • Number of newer versions to retain.

Deletion/Expiration rule actions

  • (R3) Expire current versions of objects.
    • Days after object creation
  • (R4) Permanently delete noncurrent versions of objects.
    • Days after objects become noncurrent
    • Number of newer versions to retain - Optional
  • (R5) Delete expired object delete markers or incomplete multipart uploads.
    • Delete expired object delete markers
    • Delete incomplete multipart uploads

Object deletion in a versioned bucket.

  • Delete an object with Show versions off -> Soft Delete -> Delete Marker created and is the current version shadowing all other versions.
  • Delete an object with Show versions on -> Permanent Delete for the chosen version -> if current is deleted the latest non current becomes current.
  • No promotion is supported. If an old version is wanted it should be copied over the latest version to create a new one with the content of the old one.
  • Lifecycle rule actions (R3) creates a delete marker and promotes it as current version.
  • The Expiration rule (R3) only applies to actual object versions, not delete markers.

Replication

Cross-Region Replication (CRR) vs Same-Region Replication (SRR)

FeatureDetails
PrerequisitesVersioning enabled on both source and destination
Replication scopeAll objects, prefix, or tags
What's replicatedNew objects after enabling, metadata, ACLs, tags
Not replicatedExisting objects (need S3 Batch), lifecycle actions, objects in Glacier/Deep Archive
Delete behaviorDelete markers can be replicated (optional), version deletes not replicated
Replication Time Control (RTC)99.99% within 15 minutes (SLA)
Batch ReplicationReplicate existing objects, failed replications

Two-way replication

  • Enable bidirectional replication between buckets
  • Prevents replication loops automatically

Security

Encryption at Rest

TypeKey ManagementPerformance
SSE-S3AWS managed (AES-256)No impact
SSE-KMSAWS KMS keysKMS API limits apply
SSE-CCustomer-provided keysCustomer manages keys
Client-sideEncrypt before uploadCustomer responsibility
  • Bucket default encryption: Applied to new objects without specified encryption
  • Enforce encryption: Use bucket policy to deny unencrypted uploads

Encryption in Transit

  • SSL/TLS (HTTPS) endpoints available
  • Enforce with bucket policy: aws:SecureTransport condition

Access Control

Priority order: Explicit DENY → Explicit ALLOW → Implicit DENY

MethodScopeUse Case
IAM PoliciesUser/role levelControl who can access S3
Bucket PoliciesBucket levelCross-account, public access, IP restrictions
ACLs (legacy)Bucket/object levelSimple permissions (avoid for new implementations)
Access PointsSubset of bucketSimplify permissions for shared datasets
Presigned URLsObject levelTemporary access without credentials

Block Public Access (BPA)

  • Four settings: Block public ACLs, Ignore public ACLs, Block public policies, Restrict public buckets
  • Applied at account or bucket level
  • Overrides bucket policies and ACLs

S3 Access Points

  • Named network endpoints with dedicated policies
  • Each access point has own DNS name
  • Supports VPC-only access
  • Simplifies managing access for shared datasets
  • Can restrict to specific VPC/VPCE

Event Notifications

Destinations: SNS, SQS, Lambda, EventBridge

Events:

  • Object created (PUT, POST, COPY, CompleteMultipartUpload)
  • Object deleted, restored
  • Replication events
  • Lifecycle events
  • Intelligent-Tiering changes

EventBridge advantages:

  • Advanced filtering (JSON rules)
  • Multiple destinations
  • Archive, replay events
  • 18+ AWS service targets

S3 Directory Buckets

  • New bucket type optimized for high performance
  • Used with S3 Express One Zone storage class
  • Single-digit millisecond latency
  • Up to 100GB/s throughput per bucket
  • Consistent hashing for predictable performance
  • Different naming: bucket-name--azid--x-s3

Performance

Multipart Upload

  • Required for objects > 5GB
  • Recommended for objects > 100MB
  • Parts: 1-10,000 parts, 5MB-5GB each (except last)
  • Benefits: Parallel uploads, pause/resume, start before knowing final size

Transfer Acceleration

  • Uses CloudFront edge locations
  • URL: bucket-name.s3-accelerate.amazonaws.com
  • Up to 50-500% faster for global users
  • Additional cost per GB
  • Test speed: AWS provides comparison tool

Performance Baseline

  • 3,500 PUT/COPY/POST/DELETE requests per second per prefix
  • 5,500 GET/HEAD requests per second per prefix
  • No limit on prefixes per bucket
  • Spread objects across prefixes for higher throughput

Byte-Range Fetches

  • Request specific byte ranges of object
  • Parallelize downloads
  • Resilient to network failures (retry smaller range)

S3 Select & Glacier Select

  • Retrieve subset of data using SQL
  • Filter at S3 side (up to 400% faster, 80% cheaper)
  • Works with CSV, JSON, Parquet
  • Supports compression (GZIP, BZIP2)