@tsteur opened this Issue on November 6th 2019 Member

We are using the lastRunKey https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L280-L284 to determine when an archiver was last run for day or period.

We basically:

  1. $success = trigger archiving for day/periods
  2. if ($success) { $lastRunKey=time(); }

On the next archiver run we create the date parameter when archiving based on lastRunKey=time(). However, the problem is that in step 1 potentially not all periods were actually archived. Many could have been skipped because we are for example already archiving a different period (when we want to archive for example month, but a week period is currently being processed, than we skip month but AFAIK still return $success = true).

This could lead to randomly some periods being skipped in some edge conditions. Like it might use &period=day&date=last2 when it was actually last finished correctly only 3 days ago and then maybe not fully process 3 days ago.

I haven't actually debugged this, but seeing the code there I very much would expect this behaviour and this was introduced by recent changes we added this/last year where we started skipping some periods for example if another period is in progress. We might skip entire segments etc.

From what I see we already return $success= false here https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L1006-L1008 which should prevent the lastRunKey from being written, but this seems not yet implemented for long running segment archives when dynamically aborting a request shortly before it is supposed to be executed in https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L1013-L1017

This code is triggered when

  • Say you have 20 segments
  • Each segment takes 1 hour to archive
  • We archive max 1 segment at once for example (or 2 or 3 ...)
  • This means we execute segment 1
  • 1 hour later we want to launch the next segment archive
  • We need to check if some other period is in progress/segment and shortly before we launch that segment abort that request... currently we would still return $success=true AFAIK when it should be $success=false

A possible problem is that the more segments you have for example, the more likely it is that at some point a segment or period will be skipped, therefore $success = false and therefore on every archive run we do not update the lastRunKey leading eventually to things like last100 or last200 making archiving very slow.

I reckon best be to remove all logic around lastRunKey ideally but not sure what better logic to use and what is feasible.

This is also related to https://github.com/matomo-org/matomo/pull/14937/files#r343304194 and https://github.com/matomo-org/matomo/issues/15117

instead of detecting one lastProcessedTime for all periods and dates, ideally we would detect the lastProcessedTime on demand shortly before starting the archiving of a period/segment... by default we would use last3 or so (or whatever the default is) and if any periods were invalidated since then, we would need to adjust behaviour and use better date or launch multiple archive queries. Instead of last52 we could eg use last3 and then another query where we specifically archive the invalidated date.

I know I'm not explaining it well, it's a difficult topic. Happy to provide more insights.

Powered by GitHub Issue Mirror