New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core:archive Better detection for end of sharedSiteId queue #16769
Conversation
This wouldn't work if the next site, for some reason (maybe due to multiple archivers running), isn't in the already processed queue, correct? What about setting an option w/ the timestamp when the queue is created/reset, then checking that this hasn't changed getNextSiteId() is called? |
@diosmosis changed the PR 👍 |
$this->currentSiteId = null; | ||
$this->done = true; | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe log here? Maybe a debug log or an info, since it seems like it might be useful to have in CLI output...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's needed as we would simply stop once iterated over all sites? Not sure how to add easily...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean a quick log, like, The shared site ID queue was reset, stopping.
(or whatever actually happens if not stopping). Might be useful when looking through the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis added a debug log message. It should usually not be that important since it's a regular/normal thing maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use LoggerInterface instead, but otherwise looks good
|
||
public function setQueueWasReset() | ||
{ | ||
Option::set($this->optionName . self::KEY_TIMESTAMP, floor(microtime(true) * 1000)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue, but curious why microtime is used instead of just time()? Something to do with server configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because time would cause maybe concurrency issues not sure. It's a common practice that some users launch multiple archivers at the same second so just wanted to have it more precise just to be safe. Probably wouldn't cause an issue to use time()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left a couple minor comments
Description:
By default core:archive uses
SharedSiteIds
which is used by multiple workers. For example it has a queue of several idSites that need to be processed. As soon as this queue has been reset we want existingcore:archive
commands to finish next timegetNextSiteId
is called. This is to not have one core:archive command running forever potentially running into memory issues etc.So far we have two detections when queue was reset
core:archive
resets the list of all idSites and there are50
sites in the queue now. This means there must have been a reset)core:archive
is running hourly, and there is only one site in Matomo (so the queue has always only one site) and it takes > 1 hour to archive this site. What happens archiver A inits sharedSiteIds withidSites=[1]
then it processed first siteso queue is []
, then othercore:archive
might set it tosharedSiteIds=[1]
again. It's bit edge case and there probably would need to be some race conditions as the next archiver should technically empty the queue as well.Just to be safe thought it might be good to check that
getNextSiteId
doesn't return a siteId that it already returned before. Because this would mean we have already archived that site.Review