By default core:archive uses
SharedSiteIds which is used by multiple workers. For example it has a queue of several idSites that need to be processed. As soon as this queue has been reset we want existing
core:archive commands to finish next time
getNextSiteId is called. This is to not have one core:archive command running forever potentially running into memory issues etc.
So far we have two detections when queue was reset
core:archiveresets the list of all idSites and there are
50sites in the queue now. This means there must have been a reset)
core:archiveis running hourly, and there is only one site in Matomo (so the queue has always only one site) and it takes > 1 hour to archive this site. What happens archiver A inits sharedSiteIds with
idSites=then it processed first site
so queue is , then other
core:archivemight set it to
sharedSiteIds=again. It's bit edge case and there probably would need to be some race conditions as the next archiver should technically empty the queue as well.
Just to be safe thought it might be good to check that
getNextSiteId doesn't return a siteId that it already returned before. Because this would mean we have already archived that site.
This wouldn't work if the next site, for some reason (maybe due to multiple archivers running), isn't in the already processed queue, correct? What about setting an option w/ the timestamp when the queue is created/reset, then checking that this hasn't changed getNextSiteId() is called?
@tsteur merged after this small tweak: https://github.com/matomo-org/matomo/pull/16769/commits/0bba9602a76765b67fc269b9c7c49f34891e7553