Cloud & Server Experts

Proactive insights from the Amazon DevOps Guru for RDS

May 31, 2023

Introducing the new Amazon DevOps guru for RDS functionality: Proactive Insights. DevOps Guru for RDS is a fully managed machine learning (ML)-based service that uses data collected by RDS Performance Insights to detect and alert customers to unusual behavior in Amazon Aurora databases. Since its release, DevOps Guru for RDS has provided customers with information to respond to performance issues and take corrective action quickly. Now, Proactive Insights adds recommendations for operational issues that can prevent potential problems in the future.

Proactive Insights requires no additional configuration for customers already using DevOps Guru for RDS for both Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition.

The following are sample use cases of operational issues currently available for Proactive Insights, with more insights and reflections to come over time:

Long InnoDB History for Aurora MySQL-Compatible engines - triggered when the length of the InnoDB history list becomes very large.
Temporary tables created on disk for Aurora MySQL-Compatible engines - triggered when the ratio of temporary tables created to all temporary tables exceeds a threshold.
Idle In Transaction for Aurora PostgreSQL-Compatible engines - triggered when sessions connected to the database are not actively working but may be blocking database resources.

To get started, go to the Amazon DevOps Guru Dashboard, where you can see a summary of the overall system status, including current proactive insights. In the screenshot below, the number three indicates three current proactive insights. Select this number to see a list of relevant Proactive Insights, including RDS or other Proactive Insights supported by Amazon DevOps Guru.

Proactive insights from the Amazon DevOps Guru for RDS

Current issues (including reactive and proactive insights) are also highlighted in your database instance on the Database list page in the Amazon RDS console.

The authors discuss DevOps Guru use cases for RDS Proactive Insights in detail in the following sections.

Long InnoDB History for Aurora MySQL-Compatible engines

The InnoDB history list is a global list of rollback logs for approved transactions. MySQL uses the history list to clean up records and log pages when transactions no longer require history. If the InnoDB history list becomes too long, indicating a large number of old versions of rows, queries, and even the database shutdown process, it can become slower.

DevOps Guru for RDS now detects when the history list length exceeds 1 million records and warns users to close (by approval or rollback) any unnecessary long-running transactions before triggering database changes that entail a shutdown (this includes restarting and updating the database version).

From the DevOps Guru console, go to Insights, select Proactive, and then select 'RDS InnoDB History List Length Anomalous' Proactive Insight with the current state. You will notice that Proactive Insights provides 'Insight overview', 'Metrics' and 'Recommendations'.

The Insight overview provides the basic information for this insight. In the case in question, the list of row change histories has increased significantly, which affects the performance of queries and session closures.

Proactive insights from the Amazon DevOps Guru for RDS

The Metrics panel provides a graphical representation of the history list and timeline's length, allowing it to be correlated with any unusual application actions that may have occurred in this window.

Proactive insights from the Amazon DevOps Guru for RDS

The Recommendations section suggests actions that can be taken to mitigate this problem before it becomes a larger one. The rationale for each recommendation is also provided in the "Why does DevOps Guru recommend this?" section.

Proactive insights from the Amazon DevOps Guru for RDS

Temporary tables created on disk for Aurora MySQL-compatible engines

Sometimes, it is necessary for a MySQL database to create an internal temporary table during query processing. The internal temporary table can be stored in memory and processed by the TempTable, MEMORY engine, or stored on disk by the InnoDB memory engine. Increasing the number of temporary tables created on disk instead of in memory can impact database performance.

DevOps Guru for RDS now monitors the rate at which the database creates temporary tables and the percentage of those temporary tables that use the disk. When these values exceed the recommended levels over time, DevOps Guru for RDS provides insight into the situation before it becomes critical.

From the DevOps Guru console, go to Insights, select Proactive, and then select "RDS Temporary Tables On Disk Anomalous" Proactive Insight with status in progress. You will notice that this proactive insight provides 'information overview', 'metrics' and 'recommendations'.

The Insight overview provides essential information on the topic. In the case in question, more than 58% of all temporary tables created per second were using disk, with a consistent rate of two temporary tables being created on disk every second, indicating a decline in query performance.

Proactive insights from the Amazon DevOps Guru for RDS

The Metrics panel shows a graphical representation of the information specific to this insight. You will see the evolution of the number of temporary tables created on disk per second, the percentage of temporary tables on disk (out of the total number of temporary tables created in the database), and the overall rate at which temporary tables are created (per second).

Proactive insights from the Amazon DevOps Guru for RDS

The Recommendations section suggests actions to avoid this situation, if possible, such as dropping BLOB and TEXT data types, tuning the database parameters tmp_table_size and max_heap_table_size, reducing the dataset, indexing columns, and more.

Proactive insights from the Amazon DevOps Guru for RDS

An additional explanation of this use case can be found by clicking the "View troubleshooting document" link.

Idle In Transaction for Aurora PostgreSQL-Compatible engines

A connection idle in a transaction for too long can affect performance by holding up locks, blocking other queries or preventing VACUUM (including autovacuum) from cleaning up dead rows. The PostgreSQL database requires periodic maintenance, known as 'vacuuming'. Autovacuum in PostgreSQL automates the execution of VACUUM and ANALYZE commands. This process collects table statistics and removes dead rows. The lack of vacuuming hurts database performance. It leads to increased load on tables and indexes (disk space that has been used by a table or index and is available for reuse by the database but has not been reclaimed), leads to outdated statistics, and can even end up in transaction bypass (when the number of unique transaction IDs reaches a maximum of around two billion).

DevOps Guru for RDS monitors the time sessions spend in the Aurora PostgreSQL database's idle state in the transaction state and initially generates a warning notification and then an alert notification if the idle state in the transaction state continues (current thresholds are 1800 seconds for the warning and 3600 seconds for the alert).

From the DevOps Guru console, go to Insights, select Proactive, and then select "RDS Idle In Transaction Max Time Anomalous" Proactive Insight with status pending. You will notice that Proactive Insights provides 'Information Overview', 'Metrics' and 'Recommendations'.

In your case, the connection was in an 'Idle in Transaction' state for more than 1,800 seconds, which may have affected database performance.

Proactive insights from the Amazon DevOps Guru for RDS

The Metrics panel displays the start time of long-running 'idle in transaction' connections.

Proactive insights from the Amazon DevOps Guru for RDS

As with the other observations, recommended actions are listed, and a link to the troubleshooting document is provided for even more information on this use case.

Proactive insights from the Amazon DevOps Guru for RDS

Conclusions

With Proactive Insights, DevOpsGuru for RDS enhances its capabilities to help you monitor your databases, alerting you to potential operational issues before they become more advanced. To get started, you'll need to ensure you've enabled Performance Insights on the database instances you want to monitor, and you'll also need to make sure and confirm that DevOps Guru can monitor those instances (for example, by enabling it at the account level, by monitoring specific CloudFormation stacks or using AWS tags for specific Aurora resources). The Proactive Insights service is available in all regions supported by the DevOps Guru for RDS service. To learn more about Proactive Insights, join AWS developers on 15 March or 12 April for a free hands-on immersion day (available in three time zones).

Hostersi Sp. z o.o.

Jankowicka 7
44-200 Rybnik
Poland

VAT ID (NIP): PL6423007308
REGON: 240692928
National Court Register (KRS): 275333