New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure core:archive terminates when all initial websites have been processed #12716
Conversation
if ($this->isMaintenanceModeEnabled()) { | ||
$this->logger->info("Archiving will stop now because maintenance mode is enabled"); | ||
return; | ||
} | ||
|
||
$idSite = $this->websites->getNextSiteId(); | ||
$numWebsitesArchived++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note this does not 100% fix it. For example, say there are 50 sites in the initial list that are left to be archived. Now there are 2 archivers activate at the same time and each archive 25 sites. This means $numWebsitesArchived = 25
, $numWebsitesScheduled = 50
and even though the sharedArchiveSiteId
has been finished and will be reseted, we will still run for 25 more sites than it should be. Not sure how to avoid this. Was thinking of setting
$numWebsitesScheduled = min($numWebsitesScheduled, count($this->websites-> getAllSiteIdsToArchive()))
but then we would stop processing too early.
I suppose what would help is like:
$numWebsitesLeft = count($this->websites-> getAllSiteIdsToArchive()) + 1; // +1 for the current already removed site
if ($numWebsitesLeft < $numWebsitesScheduled) {
$numWebsitesScheduled = $numWebsitesLeft;
$numWebsitesArchived = 1;
}
that way, if the number of remaining sites was reduced and not reseted, it may "auto correct itself but not 100% if that works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this, and I'm not sure it works... If more sites get added to the queue, then $numWebsitesScheduled = $numWebsitesLeft;
might make it process more sites, right? Eg, if it's like in your example w/ two processing 50 sites, and when both are at 3 left, 25 more get added, the numWebsitesLeft will become 28 instead of 25. But it'll start to go down on the next iteration I guess. Hmm. I don't know if this will work or if it will fail :)
Oh, here's another idea that may or may not work, what if there was a special 'null' entry in queue that would signal that this batch was done and this core:archive should stop? I guess that would also stop all future jobs too...
Or, for yet another random idea, what if the SharedSiteIds option had the current hour in the option name, so the once the core:archive instances finish that specific queue, they quit? I guess if one job ran for 5 hours because of an idsite in that queue, the rest of those sites would end up waiting. Maybe if a core:archive finishes one queue, but there are still queues for previous hours, they can start working on those. Eventually those processes will stop.
Idk, just throwing out ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$numWebsitesScheduled = $numWebsitesLeft; might make it process more sites, right?
nah shouldn't because it would be only updated when there are less sites left unless I don't see it right.
Oh, here's another idea that may or may not work, what if there was a special 'null' entry in queue that would signal that this batch was done and this core:archive should stop? I guess that would also stop all future jobs too...
yep exactly.
Or, for yet another random idea, what if the SharedSiteIds option had the current hour in the option name, so the once the core:archive instances finish that specific queue, they quit?
that's not quite working either because you might not run an archive command only once per hour but even more often. And sometimes the first sites take very long in which we need to make sure another job an hour later starts working on the next idsite.
This here is currently not a huge problem and we wouldn't even need to add another fix to it at all. But be good to let core:archive
tasks finish rather sooner than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah shouldn't because it would be only updated when there are less sites left unless I don't see it right.
it would be updated when there are less than the original (ie, 50), but not less than the current remaining jobs at the end (ie, 25). although it happens on every iteration, right? that could get weird if jobs keep getting added... I don't think I can predict what would happen then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @mattab @diosmosis I added another patch to this PR to detect this, and I implemented the patch in SharedSiteIds
to on top detect when the shared site IDs have been processed.
… processed (matomo-org#12716) * Make sure core:archive terminates when all initial websites have been processed * Update CronArchive.php * better detection of when the initial queue has been processed
No description provided.