@diosmosis opened this Pull Request on September 8th 2020 Member

@tsteur here is the extra check. I didn't add a check when inserting since we're using load data infile so removing duplicates is complicated w/o a UNIQUE index.

You can look at the queue consumer test to verify.

@tsteur commented on September 13th 2020 Member

@diosmosis generally looks good to merge. Can you look at the failing tests before merging?

There's a test failing because now the output of the archiving changes.

Not sure if https://travis-ci.org/github/matomo-org/matomo/jobs/726843740#L717 is related to this change

@diosmosis commented on September 13th 2020 Member

Not sure if https://travis-ci.org/github/matomo-org/matomo/jobs/726843740#L717 is related to this change

That's random, we should probably change how we assert there, we just need to check it's different than before. Will look into it.

@tsteur commented on September 14th 2020 Member

btw was just checking the invalidations on the demo. There seem to be heaps of invalidations at the same second. It's idSite 60 which is a roll up.

this is for year
image

not sure if that's a different issue or not. It should be fixed though kind of with this issue but ideally it wasn't creating that many entries in the first place maybe. That's demo, not demo2.

This is for months
image

@diosmosis commented on September 14th 2020 Member

@tsteur that's strange... not sure what would invalidate so much... I'll debug core:archive on demo.

@diosmosis commented on September 14th 2020 Member

can't reproduce on demo by executing core:archive manually... will try something else

@diosmosis commented on September 14th 2020 Member

I think this has something to do w/ invalidating today, then invalidating yesterday. There are two week period duplicates w/ the same timestamp and 6 month period duplicates also w/ the same timestamp. Not sure why there are two weeks and 6 months though...

@diosmosis commented on September 15th 2020 Member

@tsteur finally found the cause of the issue, it's due to RollUpReporting handling Archiving.getIdSitesToMarkArchivesAsInvalidated. The same site gets invalidated for every child site that's invalidated. Not sure what the best fix is yet.

@tsteur commented on September 15th 2020 Member

@diosmosis ok thanks. I thought I had a fix for this but didn't quite work out (in the rollupchildreninvalidate branch in roll ups repo). Generally, if a site had new traffic, and it is part of a roll up, then we also have to invalidate the roll ups. In all other cases there should be no need to invalidate the parent roll up. The only way I can think of to fix this right now is the hope that all duplicate entries would be removed eventually with this PR since we'd fetch the highest invalidation ID and skip and remove lower IDs. But this would only happen if another archiving job runs while archiving this one. Before inserting the invalidations could we maybe at least remove duplicates if that's an issue? Don't know if these would be inserted all at once or over different bulk inserts from different invalidations
image

@diosmosis commented on September 16th 2020 Member

But this would only happen if another archiving job runs while archiving this one.

This shouldn't be the case? When we get the highest ID, we start it and when successfully started, we remove all lower identical invalidations. All in the same archiving run.

@diosmosis commented on September 16th 2020 Member

Don't know if these would be inserted all at once or over different bulk inserts from different invalidations

There's an event in the invalidate method, so in each separate invalidate call, demo adds the same idSite, resulting in the same periods being invalidated. We could do a complex query before the batch insert to check whether every one we're looking for is already present, but that would make the method even more complex... would be easy to test though so I can add this.

@tsteur commented on September 16th 2020 Member

👍 The query shouldn't be needed. Was only meaning if it's in PHP in memory already.

@diosmosis commented on September 16th 2020 Member

Was only meaning if it's in PHP in memory already.

Ah I see... we could use the transient cache w/ a ttl (since of course core:archive would just keep using the cache). It would make debugging a bit cleaner so I can see why it would be useful.

This Pull Request was closed on September 20th 2020
Powered by GitHub Issue Mirror