Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core:archive Better detection for end of sharedSiteId queue #16769

Merged
merged 8 commits into from Dec 14, 2020

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Nov 22, 2020

Description:

By default core:archive uses SharedSiteIds which is used by multiple workers. For example it has a queue of several idSites that need to be processed. As soon as this queue has been reset we want existing core:archive commands to finish next time getNextSiteId is called. This is to not have one core:archive command running forever potentially running into memory issues etc.

So far we have two detections when queue was reset

  • When the queue is empty https://github.com/matomo-org/matomo/blob/4.0.0-rc6/core/CronArchive/SharedSiteIds.php#L171-L174
  • When suddenly more sites are in the queue than before https://github.com/matomo-org/matomo/blob/4.0.0-rc6/core/CronArchive/SharedSiteIds.php#L176-L180 (this works eg when there are only 1 sites in the queue, then a different core:archive resets the list of all idSites and there are 50 sites in the queue now. This means there must have been a reset)
  • Above check does not work though when eg core:archive is running hourly, and there is only one site in Matomo (so the queue has always only one site) and it takes > 1 hour to archive this site. What happens archiver A inits sharedSiteIds with idSites=[1] then it processed first site so queue is [], then other core:archive might set it to sharedSiteIds=[1] again. It's bit edge case and there probably would need to be some race conditions as the next archiver should technically empty the queue as well.

Just to be safe thought it might be good to check that getNextSiteId doesn't return a siteId that it already returned before. Because this would mean we have already archived that site.

Review

  • Functional review done
  • Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • Security review done see checklist
  • Code review done
  • Tests were added if useful/possible
  • Reviewed for breaking changes
  • Developer changelog updated if needed
  • Documentation added if needed
  • Existing documentation updated if needed

@tsteur tsteur added the Needs Review PRs that need a code review label Nov 22, 2020
@tsteur tsteur added this to the 4.1.0 milestone Nov 22, 2020
@diosmosis
Copy link
Member

This wouldn't work if the next site, for some reason (maybe due to multiple archivers running), isn't in the already processed queue, correct? What about setting an option w/ the timestamp when the queue is created/reset, then checking that this hasn't changed getNextSiteId() is called?

@tsteur
Copy link
Member Author

tsteur commented Nov 23, 2020

@diosmosis changed the PR 👍

$this->currentSiteId = null;
$this->done = true;
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe log here? Maybe a debug log or an info, since it seems like it might be useful to have in CLI output...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's needed as we would simply stop once iterated over all sites? Not sure how to add easily...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean a quick log, like, The shared site ID queue was reset, stopping. (or whatever actually happens if not stopping). Might be useful when looking through the logs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis added a debug log message. It should usually not be that important since it's a regular/normal thing maybe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use LoggerInterface instead, but otherwise looks good


public function setQueueWasReset()
{
Option::set($this->optionName . self::KEY_TIMESTAMP, floor(microtime(true) * 1000));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue, but curious why microtime is used instead of just time()? Something to do with server configurations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because time would cause maybe concurrency issues not sure. It's a common practice that some users launch multiple archivers at the same second so just wanted to have it more precise just to be safe. Probably wouldn't cause an issue to use time()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 👍

Copy link
Member

@diosmosis diosmosis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a couple minor comments

@diosmosis
Copy link
Member

@tsteur merged after this small tweak: 0bba960

@diosmosis diosmosis merged commit abf3475 into 4.x-dev Dec 14, 2020
@diosmosis diosmosis deleted the detectemptyqueue branch December 14, 2020 02:06
@mattab mattab changed the title Better detection for end of sharedSiteId queue core:archive Better detection for end of sharedSiteId queue Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants