Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purge all old archives regardless of done value #15800

Merged
merged 4 commits into from Apr 14, 2020
Merged

Conversation

diosmosis
Copy link
Member

we only care about the newest usable one

@diosmosis diosmosis added the Needs Review PRs that need a code review label Apr 9, 2020
@diosmosis diosmosis added this to the 3.13.5 milestone Apr 9, 2020
. ArchiveWriter::DONE_OK_TEMPORARY . ")
AND idsite IN (" . implode(',', $idSites) . ")
GROUP BY idsite, date1, date2, period, name";
AND `value` NOT IN (" . ArchiveWriter::DONE_ERROR . ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw @diosmosis out of curiosity, do you maybe know why we don't delete an ERROR archive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be deleted somewhere else... I guess we can get rid of everything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged it for now @diosmosis . @mattab do you maybe remember? I reckon it's otherwise not too important

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I know what it's for, this is the SELECT, so we're looking for archives that are usable and deleting everything older. If we SELECT for DONE_ERROR archives, then the latest might be DONE_ERROR and we end up deleting the usable archives. We can include DONE_ERROR here but we'd have to a little extra processing below.

I think they are currently deleted in another task via Model::getTemporaryArchivesOlderThan().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks 👍

@tsteur tsteur merged commit 3c2456a into 3.x-dev Apr 14, 2020
@tsteur tsteur deleted the archive-purging-bug branch April 14, 2020 21:47
@tsteur
Copy link
Member

tsteur commented Apr 15, 2020

@diosmosis I just ran the task for us and I think it didn't clean up anything maybe because there are no reports marked as to be invalidated

image

and it seems it would only remove the ones that were invalidated

image

Not sure this is maybe still an issue? But I suppose then the only solution would be to iterate over all archive tables for this which we maybe don't want as it could be quite resource intensive?

Not sure we're doing it already. Maybe it would help to invalidate all previous archives for the same site, date & period when we write a new OK archive?

@diosmosis
Copy link
Member Author

@tsteur The fix is to run the command manually, otherwise we'd have to somehow add one date per existing archive table to ArchivesToPurgeDistributedList. Could do that in an update?

@diosmosis
Copy link
Member Author

Maybe it would help to invalidate all previous archives for the same site, date & period when we write a new OK archive?

I guess in the purge task, we can always do today/yesterday, that would solve the biggest source of dupes.

@tsteur
Copy link
Member

tsteur commented Apr 15, 2020

That sounds good to do it in an update 👍 If I see this right in our DB it is only march and april that seems impacted by this so would only need to add two dates for these months maybe?

Currently executing this command to potentially fix it for now for us:

console core:purge-old-archive-data "2020-04-15" "2020-03-15" --exclude-ranges --skip-optimize-tables --include-year-archives  '

@tsteur
Copy link
Member

tsteur commented Apr 15, 2020

I guess in the purge task, we can always do today/yesterday, that would solve the biggest source of dupes.

👍 that should work and be quite simple. I guess today would be enough. Yesterday could be done maybe if it belongs to a different month? As purgeInvalidatedArchivesFrom looks at the entire month anyway

@diosmosis
Copy link
Member Author

I guess today would be enough. Yesterday could be done maybe if it belongs to a different month?

👍

diosmosis added a commit that referenced this pull request Apr 16, 2020
* Avoid possible error subtable already exists but not loaded (#15779)

* Make sure to always set JSON_PIWIK to native JSON when possible (#15785)

* make sure to always set JSON_PIWIK to native JSON when possible

* rebuilt piwik.js

* Force POST for bulk requests, fix alwaysUseSendBeacon not respected for bulk requests (#15784)

* Force POST for bulk requests, fix alwaysUseSendBeacon not respected for bulk requests

* rebuilt piwik.js

* Make sure to clean up tracking failures before sending email notification (#15798)

Feedback from a customer... Eg the daily `cleanupTrackingFailures()` action might be only executed after the weekly `notifyTrackingFailures` therefore we should try to clean up failures first and then check if any are left. Avoids the case where a user opens hours later the email they receive and then there are no tracking failures reported. This could still happen but it's a bit less likely.

* 3.13.5-b1

* Faster segmented suggested values when browser archiving is disabled (#15786)

* Faster segmented suggested values when browser archiving is disabled

* make sure no segment is set

* remove wrong var type

* fix/add tests

* add more segment values

* detect if we should flatten or not

* add docs

* Fix problem when comparing segments or opening popovers (#15809)

refs #15805

* purge all old archives regardless of done value (#15800)

* purge all old archives regardless of done value, we only care about the newest usable one

* Fix test and start on new one.

* Add coverage for change in tests.

* there is no longer an inner join so should not need the idsite check

* Add more parameters to the computeNbUnique event (#15808)

* 3.13.5-b2

* One click update in two parts so new code is loaded for second. (#15770)

* One click update in two parts so new code is loaded for second.

* remove no longer needed code

Co-authored-by: Thomas Steur <tsteur@users.noreply.github.com>
Co-authored-by: Matthieu Aubry <mattab@users.noreply.github.com>
Co-authored-by: Stefan Giehl <stefan@matomo.org>
jonasgrilleres pushed a commit to 1024pix/pix-analytics that referenced this pull request Sep 22, 2020
* purge all old archives regardless of done value, we only care about the newest usable one

* Fix test and start on new one.

* Add coverage for change in tests.

* there is no longer an inner join so should not need the idsite check
jbuget pushed a commit to 1024pix/pix-analytics that referenced this pull request Sep 26, 2020
* purge all old archives regardless of done value, we only care about the newest usable one

* Fix test and start on new one.

* Add coverage for change in tests.

* there is no longer an inner join so should not need the idsite check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants