Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This improves the performance for raw data deletion on the log tables further after already making partially suer the index is used in #14840
Before this change, when we delete raw data from log tables, queries like these would be executed:
However, this means for raw data deletion, MySQL needs to look at each visit within that time range , read the idvisit, store them in memory or tmp table, and order them afterwards. That's not quite efficient when we are deleting because we could simply always execute the same query:
Now Mysql can just randomly look only at 1000 visits which is fast because of the
idsite, visit_last_action_time
index instead of looking at potentially many millions of visits and sorting them etc. It's a lot more efficient and reduces IO quite a bit. Especially considering we're executing this query VERY often when there are millions of visits to be deleted where we looked say 10,000 times over all visits in that time range just to delete 10M visits (10,000 times executing the query to delete 1000 visits each time).Noticed the
forAllLogs()
method is also called fromVisitorGeolocator
where it is not deleting data and therefore it actually needs theidvisit > ? ORDER BY idvisit
logic as it otherwise can't make sure to apply the callback logic to each method.Did my best to keep this logic difference somewhat simple. Would otherwise need two different methods for this but that doesn't make things better in the end.