@mattab opened this Issue on September 5th 2018 Member

Challenge: make archiving faster when there are hundreds of websites

Solution:

  • We only want to trigger the archiving for sites, that had at least one tracking request since the last archiving run. This will only help if one has many sites with 0 visits. It does not help for many sites with low traffic.
  • Applies to any archive for today / this week / this month / this year, or a custom date range that includes today.

This should probably be implemented in two places.

  • in core archiver, while preparing to create the archive, whenever there is no visit since the last archive timestamp, we should realise the last processed temporary archive is still valid, and skip creating a new one
  • in core:archive we would also be smart and automatically skip websites, and all segments, whenever there was no new visit. (we would need to skip websites and all segments only when we are sure that they were all successfully processed.)

from https://github.com/matomo-org/matomo/issues/5922#issuecomment-90352741

@diosmosis commented on February 12th 2019 Member

@mattab / @tsteur:

  • in core:archive we would also be smart and automatically skip websites, and all segments, whenever there was no new visit.

Would this cause problems for plugins that force archiving even if no visits?

@mattab commented on February 12th 2019 Member
  • in core:archive we would also be smart and automatically skip websites, and all segments, whenever there was no new visit.

Would this cause problems for plugins that force archiving even if no visits?

when plugins force archiving in this way, they would have good reasons to do so, so their forcing would overwrite any logic we add around skipping websites/segments/etc

@diosmosis commented on February 12th 2019 Member

@mattab I think the first bullet point actually solves that as well, doesn't it? Ie, if the archiver sends requests for segments, we'll see there are no visits and avoid archiving. Adding this logic to CronArchive.php while keeping some logic in PluginsArchiver.php for plugins that force archiving would be rather difficult I think.

@diosmosis commented on February 18th 2019 Member

@mattab can you reply to my last comment?

@mattab commented on February 18th 2019 Member

The first bullet point (core archiver) fixes part of the problem, which may be enough for now, but thought there would be a lot of improvements hidden in the other second bullet point (core:archive / CronArchive). for example imagine a Matomo with 1,000 sites and 10 global segments so 10,000 segments. If we only do the first bullet point, we still need to send 10,000 requests * 5 periods = 50K requests.. which would take a long time, possibly hours? what do you think (not sure if my numbers are correct)?

@diosmosis commented on February 18th 2019 Member

The numbers seem right, but since we don't know if a plugin will force archiving, we can't really skip those requests... not w/o taking the forcing outside of the core archiving logic and putting it in CronArchive. I think that would be rather non-trivial.

Powered by GitHub Issue Mirror