@tsteur opened this Issue on March 1st 2020 Member

refs https://github.com/matomo-org/matomo/pull/15616

While #15616 fixes some invalidation issues when a report is viewed from the UI, this issue is about the cronarchiver itself which invalidates reports here: https://github.com/matomo-org/matomo/blob/3.13.3/core/CronArchive.php#L986

This means we're invalidating all reports every time a site or segment is archived.

  • Meaning we potentially invalidate the same reports over and over again every few seconds or minutes (depending how long it takes to finish 1 archive)
  • causing the tracker cache to be invalidated each time as well
  • causing the tracker cache to be recreated on multiple tracking requests that come in at the same time
  • causing on high traffic sites multiple tracking requests to invalidate the report https://github.com/matomo-org/matomo/blob/3.x-dev/core/Archive/ArchiveInvalidator.php#L117-L119
  • As soon as one archive finishes in the CronArchiver and the same thing happens again when the CronArchiver starts archiving the next period, or segment, or site. It's even worse if multiple archivers run at the same time and do these things in parallel.

We've probably always had this issue for Matomo installations that were tracking requests for previous days. But it became more visible with recent optimisations in CronArchiver such as not launching the archive request for sites that had no tracking request since the last archive. This resulted in the need to mark today's archives as done immediately, and invalidate them as soon as a tracking request comes in. Before this, tracking requests for today did not cause any invalidation but now it does. There were also a few other archive improvements which caused these side effects now by the looks.

Not sure how we can improve things. We could call $this->invalidateArchivedReportsForSitesThatNeedToBeArchivedAgain() only max every time_before_PERIOD_archive_considered_outdated seconds but this might not be a proper fix. Not sure how we can fix this.

Ideally, we'd simply no longer invalidate reports for today. This makes it slower when someone has thousands of sites, where many sites have no traffic. That's quite edge case though and maybe doesn't justify these problems.

@diosmosis commented on March 11th 2020 Member

This should be fixed by #15616

This Issue was closed on March 11th 2020
Powered by GitHub Issue Mirror