New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow query from purgeInvalidatedArchiveTable (query is 3-10min long on bigger instances) #6508
Comments
Currently we purge invalidated data by looking over each archive table: https://github.com/piwik/piwik/blob/master/core/DataAccess/ArchivePurger.php#L28-29 Instead maybe we could keep track of which dates were invalidated, and only run the query on tables for the months where some data was invalidated? |
Do we have a sample SQL to actually test this? Maybe it can be split in two queries to make it faster? Not sure how many archives are usually invalidated... I think this is only executed daily so 3-10min sounds actually ok maybe. All this 'done' stuff seems to be a bit weird anyway. Maybe it makes sense to refactor it at some point. |
The reason it's so slow is that it takes 3-10min per monthly archive table. So if a user has 5 years with PIwik, it will take 60 * 5min = 5 hours. I'd suggest to keep track of which months were invalidated (eg. in option table) and only run the query on tables for the months where some data was invalidated? |
different question: In this query why is there a check if there is also a OK or temporary archive? Invalidated sounds to me as in "it is invalid" meaning it can be removed no matter what. If needed it will be reprocessed. I presume there is a reason for this as it wouldn't be there otherwise. Maybe we can at least fix the naming as invalid sounds confusing in case an invalid archive is still used. |
Ok sorry the code does not make this clear enough. basically users will import logs and at the end of log import it marks archives as invalidated. Now imagine the scheduled task runs after log import and before the next Now I'm wondering how to best solve this issue... thinking. |
…hiveInvalidator to include the API CoreAdminHome.invalidateArchivedReports and InvalidatedReports to store the info of which report was invalidated
The purge query is here: https://github.com/piwik/piwik/blob/master/core/DataAccess/Model.php#L25-25 Assumptions
Notes
I started doing refactoring and I think I know how to fix it. I've self assigned issue! |
…able and also only for the websites IDs that were invalidated
The goal of this issue is to make the query in purgeInvalidatedArchiveTable faster.
In Piwik 2.8.0 we're seeing that the nice SQL query introduced in #6116 is sometimes a bit slow.
This query is triggered for example when log import is used.
Maybe we could make this query faster?
Mysql slow query log report:
The text was updated successfully, but these errors were encountered: