Scheduled tasks: Always read timetable from the database and not from memory #17849
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
… memory
I was reviewing another issue and then saw that we actually read always the cached option entry for the scheduled tasks timetable by the looks. This is executed in https://github.com/matomo-org/matomo/blob/4.4.1/core/Scheduler/Scheduler.php#L105-L115
Because it is normal to have 2 or many more archivers running in parallel it's not uncommon that multiple archivers might execute the task runner at the same time. They would all fetch the timetable (the entries of what scheduled tasks to execute when) and they would all have a different version of it and work on this version constantly. However, because it can take a long time (from seconds up to hours) to execute all tasks, there's a high risk that some tasks may be executed multiple times if we don't always read the timetable from the database. It will cause quite a few additional queries but should reduce some concurrency issues.
Currently, there was already code to always read the DB value again. However,
Option::get
would always first return a cached result from memory and not fetch the DB again.Basically this is how it currently looks like:
The task runner logic is still far from being thread safe but this should improve it quite a bit.
Consequence of all this is a lot of added load as several tasks may be executed multiple times, potentially some scheduled reports or custom alerts may be sent multiple times (I remember seeing such reports), etc.
Description:
Please include a description of this change and which issue it fixes. If no issue exists yet please include context and what problem it solves.
Review