Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiver should skip creating a new "temporary" archive when there was no new visit/traffic since the last "temporary archive" #13387

Closed
mattab opened this issue Sep 5, 2018 · 16 comments · Fixed by #14937
Assignees
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Sep 5, 2018

Challenge: make archiving faster when there are hundreds of websites

Solution:

  • We only want to trigger the archiving for sites, that had at least one tracking request since the last archiving run. This will only help if one has many sites with 0 visits. It does not help for many sites with low traffic.
  • Applies to any archive for today / this week / this month / this year, or a custom date range that includes today.

This should probably be implemented in two places.

  • in core archiver, while preparing to create the archive, whenever there is no visit since the last archive timestamp, we should realise the last processed temporary archive is still valid, and skip creating a new one
  • in core:archive we would also be smart and automatically skip websites, and all segments, whenever there was no new visit. (we would need to skip websites and all segments only when we are sure that they were all successfully processed.)

from #5922 (comment)

@mattab mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Sep 5, 2018
@mattab mattab added this to the 3.7.0 milestone Sep 5, 2018
@mattab mattab modified the milestones: 3.7.0, 3.8.0 Oct 8, 2018
@diosmosis diosmosis self-assigned this Feb 10, 2019
@diosmosis
Copy link
Member

@mattab / @tsteur:

  • in core:archive we would also be smart and automatically skip websites, and all segments, whenever there was no new visit.

Would this cause problems for plugins that force archiving even if no visits?

@mattab
Copy link
Member Author

mattab commented Feb 12, 2019

  • in core:archive we would also be smart and automatically skip websites, and all segments, whenever there was no new visit.

Would this cause problems for plugins that force archiving even if no visits?

when plugins force archiving in this way, they would have good reasons to do so, so their forcing would overwrite any logic we add around skipping websites/segments/etc

@diosmosis
Copy link
Member

diosmosis commented Feb 12, 2019

@mattab I think the first bullet point actually solves that as well, doesn't it? Ie, if the archiver sends requests for segments, we'll see there are no visits and avoid archiving. Adding this logic to CronArchive.php while keeping some logic in PluginsArchiver.php for plugins that force archiving would be rather difficult I think.

@diosmosis
Copy link
Member

@mattab can you reply to my last comment?

@mattab
Copy link
Member Author

mattab commented Feb 18, 2019

The first bullet point (core archiver) fixes part of the problem, which may be enough for now, but thought there would be a lot of improvements hidden in the other second bullet point (core:archive / CronArchive). for example imagine a Matomo with 1,000 sites and 10 global segments so 10,000 segments. If we only do the first bullet point, we still need to send 10,000 requests * 5 periods = 50K requests.. which would take a long time, possibly hours? what do you think (not sure if my numbers are correct)?

@diosmosis
Copy link
Member

The numbers seem right, but since we don't know if a plugin will force archiving, we can't really skip those requests... not w/o taking the forcing outside of the core archiving logic and putting it in CronArchive. I think that would be rather non-trivial.

@diosmosis
Copy link
Member

Maybe @tsteur has an idea or some thoughts?

@tsteur
Copy link
Member

tsteur commented Feb 24, 2019

maybe it could be implemented only when type != rollup? or not sure if other plugins / website types force it...

Ideally otherwise we just set a flag for each site when starting or finished archiving like option.hadVisitsSinceLastArchiving= false and as soon as there is a request, we could set option.hadVisitSinceLastArchiving= true. Don't think we have such a flag currently as visits may be imported in the past etc.

Can you otherwise mention some plugins that force archiving? could they listen/reuse the same event in CronArchive? Would need to look how those plugins do it & why

@diosmosis
Copy link
Member

@tsteur I think the event that needs to be respected is Archiving.getIdSitesToArchiveWhenNoVisits. I was able to implement the first bullet in a PR, but the second seems harder since we would want to avoid sending requests in CronArchive when there are no visits. However, looking at the event, maybe we can post it in CronArchive too...

@tsteur
Copy link
Member

tsteur commented Feb 24, 2019

Posting it again may help 👍 Not sure for what we added this event?

@diosmosis
Copy link
Member

I don't think I was here for it, but I thought it was for something like SEO or importing stats from somewhere else? Where we'd want to archive, but not be dependent on visits in Matomo.

@tsteur
Copy link
Member

tsteur commented Feb 25, 2019

No idea, I can't see it in any of our plugins... also can't find it in an issue or changelog

@diosmosis
Copy link
Member

@mattab / @tsteur implemented the change in the code and think again it might be a pointless addition. Looks like it's already in CronArchive.php: https://github.com/matomo-org/matomo/blob/3.x-dev/core/CronArchive.php#L1261-L1292

This code will check if there are visits between the last successful archiving time, which is pretty close to the latest archive for today. I think the only case it will save time is if archiving fails, but some archives get created. What do you think, still useful? You can see my changes here: 23b5a18

@mattab mattab modified the milestones: 3.9.0, 3.10.0 Mar 18, 2019
@mattab mattab modified the milestones: 3.10.0, 3.11.0 May 28, 2019
@mattab mattab modified the milestones: 3.11.0, 3.12.0 Jul 23, 2019
@diosmosis diosmosis removed their assignment Sep 30, 2019
@diosmosis
Copy link
Member

Refs #14639

@diosmosis
Copy link
Member

@tsteur / @mattab now that temporary archives no longer exists, this issue is to change CronArchive to skip archiving if there has been no visit since the oldest archive for that date (unless forced through a CLI option). Is that correct?

@tsteur
Copy link
Member

tsteur commented Oct 1, 2019

yes 👍

@diosmosis diosmosis self-assigned this Oct 2, 2019
@mattab mattab modified the milestones: 3.12.0, 3.13.0 Oct 22, 2019
@mattab mattab modified the milestones: 3.13.0, 3.13.1 Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
3 participants