This improves the performance for raw data deletion on the log tables further after already making partially suer the index is used in https://github.com/matomo-org/matomo/pull/14840
Before this change, when we delete raw data from log tables, queries like these would be executed:
SELECT idvisit FROM `log_visit` WHERE idvisit > 290 AND visit_last_action_time < '2019-06-02 03:04:05' AND idsite IN (1,5,7) ORDER BY idvisit ASC LIMIT 1000 SELECT idvisit FROM `log_visit` WHERE idvisit > 1290 AND visit_last_action_time < '2019-06-02 03:04:05' AND idsite IN (1,5,7) ORDER BY idvisit ASC LIMIT 1000 Where idvisit > 2290 ... ...
However, this means for raw data deletion, MySQL needs to look at each visit within that time range , read the idvisit, store them in memory or tmp table, and order them afterwards. That's not quite efficient when we are deleting because we could simply always execute the same query:
SELECT idvisit FROM `log_visit` WHERE visit_last_action_time < '2019-06-02 03:04:05' AND idsite IN (1,5,7) LIMIT 1000
Now Mysql can just randomly look only at 1000 visits which is fast because of the
idsite, visit_last_action_time index instead of looking at potentially many millions of visits and sorting them etc. It's a lot more efficient and reduces IO quite a bit. Especially considering we're executing this query VERY often when there are millions of visits to be deleted where we looked say 10,000 times over all visits in that time range just to delete 10M visits (10,000 times executing the query to delete 1000 visits each time).
forAllLogs() method is also called from
VisitorGeolocator where it is not deleting data and therefore it actually needs the
idvisit > ? ORDER BY idvisit logic as it otherwise can't make sure to apply the callback logic to each method.
Did my best to keep this logic difference somewhat simple. Would otherwise need two different methods for this but that doesn't make things better in the end.