Backup strategies for Amazon DynamoDB
When discussing databases, one of the most critical questions is, "How will we back up and restore our data?" Backups are at the heart of any disaster recovery strategy and are primarily managed by the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Want to make sure your backup strategy meets your needs with minimal administration, does not disrupt business operations, and is cost-effective? In this article, the authors review the different backup strategies you can use with Amazon DynamoDB and the best use cases for each.
Recovering DynamoDB to a point in time
DynamoDB Point-in-Time Recovery (PITR) is a fully managed continuous backup feature built into DynamoDB. When enabled, PITR allows a table to be restored to any point in time within the last 35 days, with the accuracy of a second. PITR backups are system-level and stored in an AWS-managed account. An unauthorized user cannot delete this backup even if your account is taken over. You can enable PITR from the AWS management console, AWS SDKs, or the AWS command line interface (AWS CLI), as in the following example:
aws dynamodb update-continuous-backups --table-name <SOURCE-TABLE> --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true
Note that once enabled, PITR is not retroactive. The earliest available restore point is the time when PITR was activated.
Although PITR allows you to restore a table to a point in time using the DynamoDB console or the AWS Command Line Interface (AWS CLI), some settings at the source table level are not automatically applied to the newly created table. These include automatic scaling, streams, and expiry time (TTL). Refer to Automate update of table settings on the restored Amazon DynamoDB table for an event-driven solution that automatically applies Amazon DynamoDB table settings to the restored table using AWS CloudFormation templates.
PITR significantly minimizes the risk of data loss and is helpful for data loss-sensitive workloads as it protects against accidental table deletions and writes. Detail per second makes it easier to achieve strict RPOs. When looking at alternatives, such as scheduled backups, you can only recover to the last backup point, which can mean many hours of data loss. For a discussion of PITR, see Amazon DynamoDB Continuous Backups and Point-In-Time Recovery (PITR).
DynamoDB on-demand backups
On-demand backups allow you to instruct DynamoDB to initiate backups of an entire table without affecting table performance. You can restore individual backups to new tables using the service APIs or the console. When restoring to a new table, DynamoDB allows you to retain or change global secondary indexes (GSI), local secondary indexes (LSI), or encryption settings.
To create an on-demand backup, you can use the console, the AWS SDK, or the AWS CLI for the programming language of your choice. Below is a basic example of the AWS CLI command to create an on-demand backup:
aws dynamodb create-backup --table-name <SOURCE-TABLE> --backup-name <BACKUP-NAME>
Many workloads require scheduled backups at a specific time each week or day, which, at first glance, may not appear to be a feature of on-demand backups. For DynamoDB scheduled backups, you can use AWS Backup, a fully managed and centralized data protection service. You can use AWS Backup to create and manage backup schedules for DynamoDB tables. AWS Backup enables inter-account backups that provide additional protection by allowing you to copy backups to other AWS accounts. In addition, if you have backups that need to be kept for a long time due to compliance requirements, you can use cold storage to reduce costs.
Below are some scenarios that fit well with on-demand backup:
- Compliance requires you to store data for more than 35 days, for example, storing data for seven years as required by the US Securities and Exchange Commission.
- You need to copy a table between AWS accounts, such as a developer or test account.
- You want to copy or move your data to another AWS region.
How do you choose between PITR and on-demand backups?
Let's consider a few different DynamoDB workloads:
- Workload 1 - a DynamoDB table that powers a web application, with an RPO of 45 minutes and a retention requirement of 30 days.
- Workload 2 - a DynamoDB table that powers a financial services application with an RPO of 15 minutes and a data retention compliance requirement of 7 years.
- Workload 3 - a DynamoDB table that powers a research application. The data in the table is immutable as it serves as an overview table to support the simulation runs performed by the research application. Because the data is unchangeable, there is no RPO, but restoring the data is expensive, so a backup is still required.
For the first load, PITR can meet all backup requirements. Since PITR has detail per second and a 35-day retention period, the requirements of a 45-minute RPO and a 30-day retention period are easily met. The second burden is slightly more complicated. Including PITR will meet the RPO requirements but not the 7-year retention requirement. Here, you can use AWS Backup with PITR to meet this requirement. You can use AWS Backup to schedule on-demand backups and keep them for seven years, storing them in cold storage to save on costs. The third workload can be backed up with a single on-demand backup because the data is immutable.
You can extract some general guidelines from these examples:
- Most tables, especially those with a low RPO requirement, use PITR.
- If you have a low RPO but need to keep copies for more than 35 days, use on-demand backups from AWS Backup in combination with PITR.
Conclusion
In this article, you have learned about the different methods of backing up DynamoDB tables to help you meet your backup and compliance requirements. To learn more, see Working with On-Demand Backup and Restore and Working with point-in-time recovery.
The authors encourage you to use this article as a starting point to evaluate your DynamoDB backup strategy and invite you to leave questions or comments in the comments section.