I have a large table with over 60 million rows, which has relational data inside of it. I need to update each of these rows to the newest date, of the newest member, including members that that member is a part of. The problem I’m having is my queries are too slow, especially when I need to run them in a loop until everything is updated.
Example:
id |
type |
refid |
reftype |
timestamp |
1 |
Team |
1 |
Person |
2021-04-03 |
1 |
Team |
2 |
Person |
2021-03-31 |
2 |
Team |
2 |
Person |
2021-04-01 |
2 |
Team |
3 |
Person |
2021-04-02 |
1 |
Group |
1 |
Team |
2021-04-03 |
2 |
Group |
2 |
Team |
2021-04-05 |
3 |
Team |
4 |
Person |
2021-03-31 |
In the above data, all rows except the last one would be updated to “2021-04-05”, because Group 2 contains Team 2, and Team 2 contains Person 2, which is also a member of Team 1, which is a member of Group 1. Same thing would have happened if Person 2 was the newest date. So you can see the confusion and the massive growth of possible connections. In the end I just need to expire anything over a week.
Currently I run a loop of the two queries below until both return 0, but this can take a ton of passes and take over 10+ minutes each time for the second query. I have combined indexes, but It seems the need to process the sheer 16GB data size takes a while. Am I missing something, or is there a better way? Appreciate any help!
update table_name
left outer JOIN
table_name AS t2 ON t2.refid = table_name.refid AND t2.reftype = table_name.reftype AND t2.TIMESTAMP >= DATE('2021-04-01')
set table_name.timestamp = t2.timestamp
WHERE
t2.timestamp IS NOT NULL
AND date(t2.timestamp) > date(table_name.timestamp)
AND table_name.TIMESTAMP < DATE('2021-04-01');
followed by
update table_name
left outer JOIN
table_name AS t2 ON t2.id = table_name.id AND t2.type = table_name.type AND t2.TIMESTAMP >= DATE('2021-04-01')
set table_name.timestamp = t2.timestamp
WHERE
t2.timestamp IS NOT NULL
AND date(t2.timestamp) > date(table_name.timestamp)
AND table_name.TIMESTAMP < DATE('2021-04-01');