AWS Databases
This is part of a blog series giving a high level overview of the different services examined on the AWS Solution Architect Associate exam, to view the whole series click here.
Relational Database Service (RDS)
- Allows you to create and scale relational databases in the cloud
- RDS runs on virtual machines (can’t log in to the OS or SSH in)
- AWS handles admin tasks for you like hardware provisioning, patching & backups.
- RDS is not serverless — (one exception Aurora Serverless)
- Allows you to control network access to your database
- Offers encryption at rest — done with KMS (data stored, automated backups, read replicas and snapshots all encrypted)
Supported AWS Relational Database Platforms
- Aurora
- Postgres SQL
- MySQL Server
- SQL Server
- Oracle
- Maria DB
RDS Main Features
Multi AZ Recovery
- Have a primary and secondary database, if you lose the primary database, AWS would detect and automatically update the DNS to point at the secondary database.
- Used for DISASTER RECOVERY, it doesn’t improve performance.
Read Replicas
- Every time you write to the main database, it is replicated in the secondary database.
- If you lose the primary database there is no automatic failover, you need to manually update the URL to it yourself
- IMPROVES PERFORMANCE
- Used for scaling
- Automatic backups must be turned on
- Up to 5 read replicas of any database
- It is possible to have read replicas of read replicas - but this can introduce latency.
- Each read replica has its own DNS
- Can have multi AZ
RDS Backups
Automated Backups
- Allows you to recover your database to any point in time within the specified retention period (Max 35 days)
- Takes daily snapshots and stores transition logs
- When recovering AWS will choose the most recent backup
- Enabled by default
- Backup data is stored in S3
- May experience latency when backup is being taken
- Backups are deleted once you remove the original RDS instances
Database Snapshot
- User-initiated, must be manually done by yourself
- Stored until you explicitly delete them, even after you delete the original RDS instance they are still persisted. However this is not the case with automated backups.
Data Warehousing
- Creates a central place for data and information to be analysed
- Can consolidate data from multiple sources
- Used for business intelligence tools typically for business analysts, data scientists/engineers.
- Used to pull very large complex datasets usually used by management to do queries on data
- RedShift is AWS’s data warehouse solution
RedShift
- Powerful data warehouse, that can combine/query exabytes of data.
- Can work with structured or semi-structured data
- Can save query results directly back into your S3 data lake
- Can be single node or multi node
- Has column compression — compress columns instead of rows because of similar data.
- One day backups are enable by default (max days = 35)
- Only Redshift can delete these automated snapshots, you can’t delete them manually.
- Pricing — compute node hours, backups and data transfer
- Encrypted in transport using SSL
- Encrypted at rest using KMS or HSM
- Only available in one AZ
- Can restore to a new AZ
ElastiCache
- Allows you to deploy, operate & scale in-memory data stores in the cloud.
- Improves the performance of web applications, as it allows you to retrieve data fast from memory with high throughput and low latency.
- Fully managed hardware provisioning, software patching, setup etc.
- Scalable
- There are two types of in-memory caching engines:
- Memcached — designed for simplicity, so used with you need the simplest model possible.
- Redis — works for a wide range of use cases and have multi AZ. You can also complete backups/restores of redis.
Services capable of caching
- CloudFront
- API Gateway
- ElasticCache
- Dynamo DB Accelerator
Caching is a balancing act between up-to-date accurate information and latency.
The further up you cache in your architecture the better e.g. at CloudFront level instead of waiting to DB level.
DynamoDB
- Fast flexible NoSQL database
- Allows for storage of large text and binary, but there is a limit of 400KB item size
- Delivers single digit millisecond latency at any scale
- Fully managed serverless database — no servers to provision, patch, or manage.
- Stored on SSD Storage
- Spread across 3 geographically distinct datacenters
- DynamoDB supports eventually consistent and strongly consistent reads. (eventual consistency is default)
- Streams → time ordered sequence of item level modifications in a table (stored up to 24 hours)
Eventual Consistency (best read performance)→ Consistency across data within a second, meaning the response might not reflect the results of a just completed write operation, but if you repeat the read request again it should return the updated data.
Strong Consistency → Returns the latest data. Results should reflect all writes that received a successful response prior to that read!
Global Tables
- Fully managed, multi-active & multi-region database
- Replicate your DynamoDB tables across selected regions
- Used for globally distributed apps
- Based on DynamoDB streams
- Can be used for Disaster Recovery or high availability
Security in DynamoDB
- Encryption at rest using KMS
- Can use site to site VPN, direct connect and IAM policies and roles
- Can implement fine grain access
- Can monitor on Cloud Watch and Cloud trail
DynamoDB Accelerator (DAX)
- Managed, highly available in memory cache for DynamoDB
- Has up to 10 times performance improvement
- Request time reduced to microseconds
- DAX manages all in-memory acceleration, so you don’t need to mange things like cache invalidations
- Compatible with Dynamo API calls
Aurora
- MySQL & PostgresSQL compatible relational database.
- Provides 5x better performance than MySQL
- Provides 3x better performance than Postgres SQL
- Distributed, fault-tolerant, self-healing storage system
- 2 copies of your data is contained in each Availability Zone (AZ) — minimum of 3 AZ’s and 6 copies.
- Can handle the loss of up to 2 copies without affecting write ability.
- Can handle lose of up to 3 copies of data without affecting read ability.
- Automated backups always enabled — doesn’t impact performance.
Aurora Serverless
- On demand autoscaling configuration of Aurora
- Automatically starts up, shuts down, and scales based on app needs
- Used for simple, cost effective infrequently used, intermittent or unpredictable workloads
- Only pay for invocation.
Database migration service (DMS)
- Transfer a database to another (on-premise or in cloud or both )
- Runs replication software
- Source stays functioning the whole time during the migration
Types of migrations
- Supports Homogenous Migrations — Identical e.g. oracle to oracle
- Supports Hetrogenous Migrations — Different e.g. SQLServer to Aurora. If you do this you will need to use a Schema Conversion Tool (SCT)
Elastic Map Reduce (EMR)
- Big data platform for processing large amounts of data
- Run petabyte scale analysis
- 3x faster than apache spark
- Makes it easy to set up, operate, & scale big data environments
- Workloads run on clusters of EC2 instances call nodes
- Different software components are installed in each node
- Data is stored on S3 by default
- Can configure replication on S3 on 5 min intervals — only on creation!
Node Types
- Master Node → Manages cluster, tracks subtasks and monitors health.
- Core Node → Has software components to run tasks & store data.
- Task Node → Has software component, only runs tasks, can’t store data.