amazon rds – MySQL in AWS RDS 100% CPU in some queries

I have a MySQL managed instance (5.7.19) in AWS. In general, things work quite well. I am constantly around 4% CPU usage, and well below my IOPS limits for the explosion-capable instance (t2.micro). However, if I make a query that does not use an index on a table that is probably dropped from the RAM and is on the disk, MySQL will "hang" for about a minute. Reading IOPS increases, but usually not enough to submerge in my credit group. The CPU will clog at 100% until the query is completed. Other connections of the normally running service will be queued (I'll start to see more than 60 connections), and many of them will end up expiring.

Here is an example query that blocked the database for almost a minute:

SELECT * FROM mydb.PurchaseDatabase WHERE Time between & # 39; 2018-11-20 00: 00: 00 & # 39; and & # 39; 2018-11-23 00: 00: 00 & # 39; and ItemStatus = 0 and ItemID = "exampleitem";

Here are the metrics of the RDS panel when I made this query:

RDS control panel metrics

If I do the query a second time, it completes almost instantaneously (now it is likely to be in the RAM of the recent query in the same table). Similar queries are also fast again (0.173 seconds, for example). I allowed the slow registration of queries as well as the registry of errors and I made the query a day later with the same delay of 30 seconds (the table had been paginated or what was out of the RAM). However, nothing was written in the slow query table. I checked the error logs and I can see messages like this when I do the slow queries:

2018-11-28T06: 21: 05.498947Z 0 [Note] InnoDB: page_cleaner: the expected loop of 1000ms took 37072ms. The adjustments may not be optimal. (empty = 4 and evicted = 0, during the time.)

I think this could be another symptom of the underlying problem, which is that my instance has problems reading / writing from the disk. I am using storage supported by SSD and my burst balance in the EBS volume is not affected by these slow queries. I have many credits before and after the consultations.

Then, foolishly, I decided to try to help the database by erasing old records. I made an elimination as such:

DELETE * FROM mydb.PurchaseDatabase WHERE Time between & # 39; 2018-01-01 00: 00: 00 & # 39; and & # 39; 2018-07-31 00: 00: 00 & # 39; and ItemStatus = 0;

What affected about 50k of the rows of the 190k table. This query & # 39; returned & # 39; to MySQL Workbench in 0.505 seconds, but actually deleted the database for almost 8 minutes! During this time, the RDS instance could not even write to the logs or Cloudwatch.

Cloudwatch can not obtain metrics during slow query

It took 8 minutes to release about 6 MB of rows from the database (the CPU is 100% assigned during this time). I'm way below the CPU usage in general, and the IOPS for the size of my instance. T2.micro really is not able to handle this type of workload? Is there anything I can do to better control what is happening? I also tried to write performance records, but they actually failed to write during this 8 minute idle time, so I can never see the problem.

After this time of inactivity, the error log contained this warning:

2018-11-28T18: 35: 59.497627Z 0 [Warning] InnoDB: A long wait for a traffic light:
–The thread 47281889883904 waited for the line of 1982 of srv0srv.cc for 250.00 seconds the traffic light:
Block X in RW-latch in 0x2b00a8fcf258 created in the file dict0dict.cc line 1184
a writer (thread ID 47281896187648) has reserved it in exclusive mode
number of readers 0, waiter flag 1, lock_word: 0
The last time the row0purge.cc file was read line 862
The last time I wrote blocked in the file /local/mysql-5.7.19.R1/storage/innobase/dict/dict0stats.cc line 2375

Note: This deletes the entire database, not just the PurchaseDatabase table. The connection queue is filled with queries without service until, finally, the queue is full and no more connections are accepted. The old connections eventually run out.

I guess this is some kind of EBS / RDS interaction, but I can not see how you are supposed to explode at 3000 IOPS, but I can not even manage a 30 IOP reading burst. Any suggestions would be much appreciated, since I am concerned that these problems may begin to appear during the normal workload, since I do not understand the root cause.

In this example, PurchaseDatabase is created with the following declaration.

CREATE TABLE PurchaseDatabase (ID BIGINT, TitleID VARCHAR (127), TransactionID BIGINT UNSIGNED, SteamID BIGINT UNSIGNED, State TINYINT UNSIGNED, Time DATETIME, TimeCreated DATETIME, ItemID VARCHAR (127), Quantity INT UNSIGNED, Price NOT SIGNED, PRIMARY KEY (ID, TitleID);

Also keep in mind that this problem is not caused only by the PurchaseDatabase table. Any query in any of my tables can cause this same problem.