New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
purge all old archives regardless of done value #15800
Conversation
…he newest usable one
e66443a
to
39e915d
Compare
. ArchiveWriter::DONE_OK_TEMPORARY . ") | ||
AND idsite IN (" . implode(',', $idSites) . ") | ||
GROUP BY idsite, date1, date2, period, name"; | ||
AND `value` NOT IN (" . ArchiveWriter::DONE_ERROR . ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw @diosmosis out of curiosity, do you maybe know why we don't delete an ERROR archive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be deleted somewhere else... I guess we can get rid of everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged it for now @diosmosis . @mattab do you maybe remember? I reckon it's otherwise not too important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I know what it's for, this is the SELECT, so we're looking for archives that are usable and deleting everything older. If we SELECT for DONE_ERROR archives, then the latest might be DONE_ERROR and we end up deleting the usable archives. We can include DONE_ERROR here but we'd have to a little extra processing below.
I think they are currently deleted in another task via Model::getTemporaryArchivesOlderThan()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks 👍
@diosmosis I just ran the task for us and I think it didn't clean up anything maybe because there are no reports marked as to be invalidated and it seems it would only remove the ones that were invalidated Not sure this is maybe still an issue? But I suppose then the only solution would be to iterate over all archive tables for this which we maybe don't want as it could be quite resource intensive? Not sure we're doing it already. Maybe it would help to invalidate all previous archives for the same site, date & period when we write a new OK archive? |
@tsteur The fix is to run the command manually, otherwise we'd have to somehow add one date per existing archive table to ArchivesToPurgeDistributedList. Could do that in an update? |
I guess in the purge task, we can always do today/yesterday, that would solve the biggest source of dupes. |
That sounds good to do it in an update 👍 If I see this right in our DB it is only march and april that seems impacted by this so would only need to add two dates for these months maybe? Currently executing this command to potentially fix it for now for us:
|
👍 that should work and be quite simple. I guess today would be enough. Yesterday could be done maybe if it belongs to a different month? As |
👍 |
* Avoid possible error subtable already exists but not loaded (#15779) * Make sure to always set JSON_PIWIK to native JSON when possible (#15785) * make sure to always set JSON_PIWIK to native JSON when possible * rebuilt piwik.js * Force POST for bulk requests, fix alwaysUseSendBeacon not respected for bulk requests (#15784) * Force POST for bulk requests, fix alwaysUseSendBeacon not respected for bulk requests * rebuilt piwik.js * Make sure to clean up tracking failures before sending email notification (#15798) Feedback from a customer... Eg the daily `cleanupTrackingFailures()` action might be only executed after the weekly `notifyTrackingFailures` therefore we should try to clean up failures first and then check if any are left. Avoids the case where a user opens hours later the email they receive and then there are no tracking failures reported. This could still happen but it's a bit less likely. * 3.13.5-b1 * Faster segmented suggested values when browser archiving is disabled (#15786) * Faster segmented suggested values when browser archiving is disabled * make sure no segment is set * remove wrong var type * fix/add tests * add more segment values * detect if we should flatten or not * add docs * Fix problem when comparing segments or opening popovers (#15809) refs #15805 * purge all old archives regardless of done value (#15800) * purge all old archives regardless of done value, we only care about the newest usable one * Fix test and start on new one. * Add coverage for change in tests. * there is no longer an inner join so should not need the idsite check * Add more parameters to the computeNbUnique event (#15808) * 3.13.5-b2 * One click update in two parts so new code is loaded for second. (#15770) * One click update in two parts so new code is loaded for second. * remove no longer needed code Co-authored-by: Thomas Steur <tsteur@users.noreply.github.com> Co-authored-by: Matthieu Aubry <mattab@users.noreply.github.com> Co-authored-by: Stefan Giehl <stefan@matomo.org>
* purge all old archives regardless of done value, we only care about the newest usable one * Fix test and start on new one. * Add coverage for change in tests. * there is no longer an inner join so should not need the idsite check
* purge all old archives regardless of done value, we only care about the newest usable one * Fix test and start on new one. * Add coverage for change in tests. * there is no longer an inner join so should not need the idsite check
we only care about the newest usable one