Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependent archives don't get invalidated when invalidating reports making them show outdated data #18772

Open
tsteur opened this issue Feb 10, 2022 · 6 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Data Integrity & Accuracy
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Feb 10, 2022

refs DEV-2418 for more details.

refs #18773 which might fix this issue as well depending how it is fixed. If #18773 fixes it by fixing the invalidation, then this is also fixed.

I noticed when archiving a week, then in https://github.com/matomo-org/matomo/blob/4.7.1/core/Archive.php#L629-L643 we archive up to 2 days into the future. That means if today is 2022-02-09, then when archiving this week we also archive 2022-02-10 and 2022-02-11. I noticed this in the archive tables:

image

image

In above screenshots we can see that the archive was created eg on 2nd Feb for a date in the future.

That means we create archives for in the future and of course these archives will have 0 data.

I assume that these archives in the future are also created for regular archives, not only dependent archives. However, I believe it works for these regular archives because they are invalidated again regularly and therefore the data is re-archived one or two days later. The logic for invalidating archives currently doesn't know which dependent archives exist with what segments. Meaning I'm assuming these get never invalidated.

I'm meaning calls like https://github.com/matomo-org/matomo/blob/4.7.1/plugins/Goals/Archiver.php#L124-L127
$this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::NEW_VISITOR_SEGMENT);. These calls also happen in non-core plugins specifically MediaAnalytics for example.

The goals of this issue is when invalidating any period (including any older period), then the dependent archives are also invalidated. Meaning if invalidating eg the "All Visits" segment, then it should also invalidate archives from dependent archives where it uses segments like media_spent_time>0 and therefore archives for the flag of donec276dccc46ea23c1bffdd046127f41d6.% needs to be invalidated too.

Maybe we need to have some event in something like https://github.com/matomo-org/matomo/blob/4.7.1/core/DataAccess/Model.php#L119 to also trigger archives for segments that are used by processDependentArchive.

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label Feb 10, 2022
@tsteur tsteur added this to the 4.9.0 milestone Feb 10, 2022
@tsteur tsteur modified the milestones: 4.9.0, 4.10.0 Mar 2, 2022
@justinvelluppillai
Copy link
Contributor

@bx80 can you confirm whether this would be fixed by #18790 ?

@sgiehl
Copy link
Member

sgiehl commented Apr 12, 2022

@justinvelluppillai that shouldn't have anything to do with #18790

@justinvelluppillai
Copy link
Contributor

thanks @sgiehl . Will leave this open for now then.

@mattab mattab modified the milestones: Backlog (Help wanted), Impact Backlog, 4.12.5 Nov 11, 2022
@justinvelluppillai justinvelluppillai modified the milestones: 4.12.5, 4.13.2 Nov 28, 2022
@mattab mattab modified the milestones: 4.13.2, 5.1.0 Dec 5, 2022
@mattab mattab added the 5.1.0 label Jan 4, 2023
@mattab mattab removed the 5.1.0 label Aug 4, 2023
@Stan-vw
Copy link
Contributor

Stan-vw commented Sep 28, 2023

@tsteur @sgiehl The ticket is pretty hard to read so apologies if I missed something. I understand that we're archiving a number of reports for future data, hence we're wasting CPU on reports that just show zeroes. Am I correct in understanding we would save up a lot of CPU with this, or would this hardly make a difference?

Let me know if I'm missing any other value that would be added by working on this ticket.

I'm just wondering if we need to do this in 5.1.0 or whether it should simply be pushed much further, let's say 5.5.0 or such.

@sgiehl
Copy link
Member

sgiehl commented Sep 28, 2023

I would need to think that through and check the code to be able to say may. But it might possibly be already fixed with #18773. So maybe it would for now be enough to plan in some time to check if that issue is still reproducible or if it is already fixed.

@tsteur
Copy link
Member Author

tsteur commented Sep 28, 2023

The bug would need to be confirmed. If confirmed, the problem is that we're at times showing 0 data when there actually is data. And at times we may be showing outdated/inaccurate data. That applies to reports that rely on dependent data like "Goals new visitors / returning visitors data" and some media analytics metrics (eg it might fix PG-599).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Data Integrity & Accuracy
Projects
None yet
Development

No branches or pull requests

5 participants