Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure core:archive terminates when all initial websites have been processed #12716

Merged
merged 3 commits into from Apr 23, 2018

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Apr 11, 2018

No description provided.

@tsteur tsteur added this to the 3.5.0 milestone Apr 11, 2018
if ($this->isMaintenanceModeEnabled()) {
$this->logger->info("Archiving will stop now because maintenance mode is enabled");
return;
}

$idSite = $this->websites->getNextSiteId();
$numWebsitesArchived++;
Copy link
Member Author

@tsteur tsteur Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note this does not 100% fix it. For example, say there are 50 sites in the initial list that are left to be archived. Now there are 2 archivers activate at the same time and each archive 25 sites. This means $numWebsitesArchived = 25, $numWebsitesScheduled = 50 and even though the sharedArchiveSiteId has been finished and will be reseted, we will still run for 25 more sites than it should be. Not sure how to avoid this. Was thinking of setting

$numWebsitesScheduled = min($numWebsitesScheduled, count($this->websites-> getAllSiteIdsToArchive())) but then we would stop processing too early.

I suppose what would help is like:

$numWebsitesLeft = count($this->websites-> getAllSiteIdsToArchive()) + 1; // +1 for the current already removed site
if ($numWebsitesLeft < $numWebsitesScheduled) {
   $numWebsitesScheduled = $numWebsitesLeft;
   $numWebsitesArchived = 1;
}

that way, if the number of remaining sites was reduced and not reseted, it may "auto correct itself but not 100% if that works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this, and I'm not sure it works... If more sites get added to the queue, then $numWebsitesScheduled = $numWebsitesLeft; might make it process more sites, right? Eg, if it's like in your example w/ two processing 50 sites, and when both are at 3 left, 25 more get added, the numWebsitesLeft will become 28 instead of 25. But it'll start to go down on the next iteration I guess. Hmm. I don't know if this will work or if it will fail :)

Oh, here's another idea that may or may not work, what if there was a special 'null' entry in queue that would signal that this batch was done and this core:archive should stop? I guess that would also stop all future jobs too...

Or, for yet another random idea, what if the SharedSiteIds option had the current hour in the option name, so the once the core:archive instances finish that specific queue, they quit? I guess if one job ran for 5 hours because of an idsite in that queue, the rest of those sites would end up waiting. Maybe if a core:archive finishes one queue, but there are still queues for previous hours, they can start working on those. Eventually those processes will stop.

Idk, just throwing out ideas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$numWebsitesScheduled = $numWebsitesLeft; might make it process more sites, right?

nah shouldn't because it would be only updated when there are less sites left unless I don't see it right.

Oh, here's another idea that may or may not work, what if there was a special 'null' entry in queue that would signal that this batch was done and this core:archive should stop? I guess that would also stop all future jobs too...

yep exactly.

Or, for yet another random idea, what if the SharedSiteIds option had the current hour in the option name, so the once the core:archive instances finish that specific queue, they quit?

that's not quite working either because you might not run an archive command only once per hour but even more often. And sometimes the first sites take very long in which we need to make sure another job an hour later starts working on the next idsite.

This here is currently not a huge problem and we wouldn't even need to add another fix to it at all. But be good to let core:archive tasks finish rather sooner than later.

Copy link
Member

@diosmosis diosmosis Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah shouldn't because it would be only updated when there are less sites left unless I don't see it right.

it would be updated when there are less than the original (ie, 50), but not less than the current remaining jobs at the end (ie, 25). although it happens on every iteration, right? that could get weird if jobs keep getting added... I don't think I can predict what would happen then :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @mattab @diosmosis I added another patch to this PR to detect this, and I implemented the patch in SharedSiteIds to on top detect when the shared site IDs have been processed.

@mattab mattab merged commit 9bed295 into 3.x-dev Apr 23, 2018
@sgiehl sgiehl deleted the cronarchiveinitialwebsites branch April 23, 2018 10:08
InfinityVoid pushed a commit to InfinityVoid/matomo that referenced this pull request Oct 11, 2018
… processed (matomo-org#12716)

* Make sure core:archive terminates when all initial websites have been processed

* Update CronArchive.php

* better detection of when the initial queue has been processed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants