By default core:archive uses SharedSiteIds
which is used by multiple workers. For example it has a queue of several idSites that need to be processed. As soon as this queue has been reset we want existing core:archive
commands to finish next time getNextSiteId
is called. This is to not have one core:archive command running forever potentially running into memory issues etc.
So far we have two detections when queue was reset
core:archive
resets the list of all idSites and there are 50
sites in the queue now. This means there must have been a reset)core:archive
is running hourly, and there is only one site in Matomo (so the queue has always only one site) and it takes > 1 hour to archive this site. What happens archiver A inits sharedSiteIds with idSites=[1]
then it processed first site so queue is []
, then other core:archive
might set it to sharedSiteIds=[1]
again. It's bit edge case and there probably would need to be some race conditions as the next archiver should technically empty the queue as well.Just to be safe thought it might be good to check that getNextSiteId
doesn't return a siteId that it already returned before. Because this would mean we have already archived that site.
This wouldn't work if the next site, for some reason (maybe due to multiple archivers running), isn't in the already processed queue, correct? What about setting an option w/ the timestamp when the queue is created/reset, then checking that this hasn't changed getNextSiteId() is called?
@diosmosis changed the PR 👍
@tsteur merged after this small tweak: https://github.com/matomo-org/matomo/pull/16769/commits/0bba9602a76765b67fc269b9c7c49f34891e7553