Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more documentation on some archiving behaviour #16955

Closed
tsteur opened this issue Dec 15, 2020 · 11 comments
Closed

Provide more documentation on some archiving behaviour #16955

tsteur opened this issue Dec 15, 2020 · 11 comments
Assignees
Labels
c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Dec 15, 2020

@diosmosis just looking at some DB performance insights where a lot of new options are showing up in the top queries.

Then looked where they are used and wasn't really understanding what they do and whether they even need to be cached in an option.

Might be good to see if we can make some tweaks or remove some code and also maybe add some inline comments explaining what happens as I tried to understand it and wasn't sure what's happening there.

@tsteur tsteur added the c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. label Dec 15, 2020
@tsteur tsteur added this to the 4.0.x milestone Dec 15, 2020
@diosmosis diosmosis self-assigned this Dec 15, 2020
@diosmosis
Copy link
Member

@diosmosis
Copy link
Member

Also maybe the transient cache here is not needed as maybe the Option class itself caches it already?

It is, but this is done for the entire process and core:archive runs for a long time. The cache entry has a ttl if you look at where it's set: https://github.com/matomo-org/matomo/blob/4.x-dev/core/CronArchive.php#L1042

I'm not entirely sure where to document this.

@diosmosis
Copy link
Member

Seeing also this min visit time for site option in https://github.com/matomo-org/matomo/blob/4.x-dev/core/ArchiveProcessor/Loader.php#L427-L442 . Not sure why this one is needed in https://github.com/matomo-org/matomo/blob/4.x-dev/core/ArchiveProcessor/Loader.php#L413-L422 and how that logic works?

Like we could just query the $this->rawLogDao->hasSiteVisitsBetweenTimeframe directly to see if there's been any visit during the archive date range?

This was added a while ago, it was part of an optimization, but I can't remember the exact reason for it... The time it takes to run ArchiveCronTest doesn't change w/ this code there/not-there, so I'll remove it.

@diosmosis
Copy link
Member

@tsteur noticing now the last invalidation time needs to be per site here... or perhaps we need to use the last core archive finish time now (or start time?), since for a global segment we'd need to invalidate for every site, but that wouldn't happen if we update the last invalidation time like this... actually it might if we do it per site...

@tsteur
Copy link
Member Author

tsteur commented Dec 18, 2020

@diosmosis not sure I understand https://github.com/matomo-org/developer-documentation/pull/402/files#diff-89a35a798a82b03c6fd7dbc734fb0494f586e285c2cb55e3785d50b350061d21R138-R139

wouldn't we simply invalidate back a few months (depending on config) since segment creation / update date?

Last invalidation time would be usually today or yesterday?

And in the code in https://github.com/matomo-org/matomo/blob/4.x-dev/core/CronArchive/SegmentArchiving.php#L157-L171 it seems to be responsible to simply skip invalidating when eg we last archived today and segment was last edited yesterday.

I wonder if this could even cause issues when we skip archiving data for segments for today and meanwhile lastInvalidationTime is today and then it would skip it.

wouldn't we still always want to check the processNewSegmentsFrom setting in https://github.com/matomo-org/matomo/blob/4.x-dev/core/CronArchive/SegmentArchiving.php#L172-L215 ?

Sorry I'm still not quite understanding this one.

@diosmosis
Copy link
Member

wouldn't we simply invalidate back a few months (depending on config) since segment creation / update date?

We need be able to detect when the segment is new/updated, and of course we don't want to end up invalidating in the past every time we encounter the segment.

IIRC this is what we did before the refactor.

I wonder if this could even cause issues when we skip archiving data for segments for today and meanwhile lastInvalidationTime is today and then it would skip it.

This could be a problem. As said above, we need to do the invalidation once per create/update, and we need to know when we've done it. Currently it's "has the segment been created/changed after the last time we invalidated? if yes, then we haven't invalidated in the past for it and need to".

wouldn't we still always want to check the processNewSegmentsFrom setting in https://github.com/matomo-org/matomo/blob/4.x-dev/core/CronArchive/SegmentArchiving.php#L172-L215 ?

This is still checked? This is how far back we go, it's unrelated to checking whether we've done it before.

@tsteur
Copy link
Member Author

tsteur commented Dec 18, 2020

This could be a problem. As said above, we need to do the invalidation once per create/update, and we need to know when we've done it. Currently it's "has the segment been created/changed after the last time we invalidated? if yes, then we haven't invalidated in the past for it and need to".

Now I get it 👍 can we maybe similar to custom reports etc invalidate when segment is created or updated? Maybe this way we can get rid of all this and it would always work independent of any race condition and what segments have been invalidated etc

@diosmosis
Copy link
Member

Now I get it 👍 can we maybe similar to custom reports etc invalidate when segment is created or updated?

This is a MUCH better idea. I'll make a PR.

@tsteur
Copy link
Member Author

tsteur commented Dec 18, 2020

👍

@mattab mattab modified the milestones: 4.0.x, 4.1.0, 4.2.0 Dec 21, 2020
@mattab mattab modified the milestones: 4.2.0, 4.3.0 Feb 22, 2021
@mattab mattab modified the milestones: 4.3.0, 4.4.0 May 26, 2021
@mattab mattab modified the milestones: 4.4.0, 4.5.0 Jul 28, 2021
@tsteur
Copy link
Member Author

tsteur commented Aug 20, 2021

@diosmosis I reckon this one we can close?

@diosmosis
Copy link
Member

Yes, will close. Don't think I wrote everything down, but it should be enough to get someone started on the system.

@justinvelluppillai justinvelluppillai added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

No branches or pull requests

4 participants