We are using the lastRunKey https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L280-L284 to determine when an archiver was last run for day or period.
On the next archiver run we create the
date parameter when archiving based on
lastRunKey=time(). However, the problem is that in step 1 potentially not all periods were actually archived. Many could have been skipped because we are for example already archiving a different period (when we want to archive for example month, but a week period is currently being processed, than we skip month but AFAIK still return $success = true).
This could lead to randomly some periods being skipped in some edge conditions. Like it might use
&period=day&date=last2 when it was actually last finished correctly only 3 days ago and then maybe not fully process 3 days ago.
I haven't actually debugged this, but seeing the code there I very much would expect this behaviour and this was introduced by recent changes we added this/last year where we started skipping some periods for example if another period is in progress. We might skip entire segments etc.
From what I see we already return
$success= false here https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L1006-L1008 which should prevent the lastRunKey from being written, but this seems not yet implemented for long running segment archives when dynamically aborting a request shortly before it is supposed to be executed in https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L1013-L1017
This code is triggered when
A possible problem is that the more segments you have for example, the more likely it is that at some point a segment or period will be skipped, therefore
$success = false and therefore on every archive run we do not update the
lastRunKey leading eventually to things like
last200 making archiving very slow.
I reckon best be to remove all logic around
lastRunKey ideally but not sure what better logic to use and what is feasible.
This is also related to https://github.com/matomo-org/matomo/pull/14937/files#r343304194 and https://github.com/matomo-org/matomo/issues/15117
instead of detecting one
lastProcessedTime for all periods and dates, ideally we would detect the
lastProcessedTime on demand shortly before starting the archiving of a period/segment... by default we would use
last3 or so (or whatever the default is) and if any periods were invalidated since then, we would need to adjust behaviour and use better date or launch multiple archive queries. Instead of last52 we could eg use
last3 and then another query where we specifically archive the invalidated date.
I know I'm not explaining it well, it's a difficult topic. Happy to provide more insights.