@tsteur opened this Pull Request on November 22nd 2020 Member

Description:

By default core:archive uses SharedSiteIds which is used by multiple workers. For example it has a queue of several idSites that need to be processed. As soon as this queue has been reset we want existing core:archive commands to finish next time getNextSiteId is called. This is to not have one core:archive command running forever potentially running into memory issues etc.

So far we have two detections when queue was reset

  • When the queue is empty https://github.com/matomo-org/matomo/blob/4.0.0-rc6/core/CronArchive/SharedSiteIds.php#L171-L174
  • When suddenly more sites are in the queue than before https://github.com/matomo-org/matomo/blob/4.0.0-rc6/core/CronArchive/SharedSiteIds.php#L176-L180 (this works eg when there are only 1 sites in the queue, then a different core:archive resets the list of all idSites and there are 50 sites in the queue now. This means there must have been a reset)
  • Above check does not work though when eg core:archive is running hourly, and there is only one site in Matomo (so the queue has always only one site) and it takes > 1 hour to archive this site. What happens archiver A inits sharedSiteIds with idSites=[1] then it processed first site so queue is [], then other core:archive might set it to sharedSiteIds=[1] again. It's bit edge case and there probably would need to be some race conditions as the next archiver should technically empty the queue as well.

Just to be safe thought it might be good to check that getNextSiteId doesn't return a siteId that it already returned before. Because this would mean we have already archived that site.

Review

  • [ ] Functional review done
  • [ ] Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • [ ] Security review done see checklist
  • [ ] Code review done
  • [ ] Tests were added if useful/possible
  • [ ] Reviewed for breaking changes
  • [ ] Developer changelog updated if needed
  • [ ] Documentation added if needed
  • [ ] Existing documentation updated if needed
@diosmosis commented on November 23rd 2020 Member

This wouldn't work if the next site, for some reason (maybe due to multiple archivers running), isn't in the already processed queue, correct? What about setting an option w/ the timestamp when the queue is created/reset, then checking that this hasn't changed getNextSiteId() is called?

@tsteur commented on November 23rd 2020 Member

@diosmosis changed the PR 👍

@diosmosis commented on December 14th 2020 Member
This Pull Request was closed on December 14th 2020
Powered by GitHub Issue Mirror