Handle scheduled task failures #18335

bx80 · 2021-11-17T03:45:35Z

Description:

Fixes #17938 (partial)

Catches and logs exceptions that occur when running scheduled tasks

Review

bx80 · 2021-11-17T04:12:48Z

This PR currently just addresses problem 3 of the three problems listed in #17938 by logging any exceptions.

@tsteur Do we want to just add this logging for now to help determine priority or look for a solution for rescheduling/retrying failed tasks?

As previously noted in the issue, it seems like there would need to be some indication passed from the failed task as to whether to reschedule or not (eg. if an email has been sent then don't reschedule again) and perhaps additionally some sort of retry counter option to avoid getting stuck in a pointless reschedule/fail loop. (reschedule x times, then give up)

So for example:
GeoIP download, no diskspace. Task indicates that it can be rescheduled, task is run three times but disk is still full. Done.
Notification task, error after sending some notification. Task indicates that it should not be rescheduled. Done.
Report fails because of cache clear. Task indicates that it can be rescheduled, second run works. Done.

Perhaps we could start by adding a mechanism to return a reschedule=true/false indication when tasks encounter an exception and then have the scheduler exception handler use this to reschedule if required. Initially we could just have the email reports task check for exceptions that are worth retrying (eg. the 'No such file or directory' error) and then pass back reschedule=true. If this works well then other scheduled tasks could also use the same mechanism as needed.

What do you think?

tsteur · 2021-11-17T22:12:51Z

Perhaps we could start by adding a mechanism to return a reschedule=true/false indication when tasks encounter an exception and then have the scheduler exception handler use this to reschedule if required.

@bx80 I think that makes sense and maybe it's not too complicated to implement? Then we could decide on a per task level whether we want to reschedule or not if it fails.

bx80 · 2021-11-19T03:06:37Z

I've added a fairly simple rescheduling mechanism, for it to work the task must catch an exception and re-throw it with the error code 1. The scheduler execute task exception handler will check if the exception has error code 1 and if so will set a flag to reschedule the task for a retry. The task will then be rescheduled for one hour into the future by the main schedule loop.

To prevent infinite retry loops a new option 'TaskScheduler.retryList' has been added, it is a simple list of task names and the number of times they have been scheduled for a retry, when this count reaches 3 then the failing task will no longer be rescheduled one hour into the future and should fall back to it's normal next scheduled time.

The scheduled reports 'sendReport' method has been adjusted to catch any exceptions thrown on the generateReport() method and re-throw with error code = 1. So failures to generate the report should now be retried, but failures in the send part of the process will not be retried.

I'm not sure if a one hour bump and three retries are the best values to use, happy to change them to whatever makes the most sense.

github-actions · 2021-11-27T01:45:07Z

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

github-actions · 2022-01-08T01:50:13Z

This PR was last updated more than one month ago, maybe it's time to close it. Please check if there is anything we still can do or close this PR. ping @matomo-org/core-reviewers

core/Scheduler/Scheduler.php

sgiehl

Left some comments.

@tsteur would you mind having a quick look if that solution fits. I think it should work fine, but we would need to go through some our tasks afterwards and check if we would need to throw the RetryableException somewhere else as well.

core/Scheduler/Scheduler.php

plugins/ScheduledReports/API.php

tsteur

Haven't tested it but looked through the code and left a few comments.

Also few things here:

Any chance we can add a few (basic) tests for the timetable methods like incrementCounter etc?
Can we also document this new exception in https://developer.matomo.org/guides/scheduled-tasks ?
we'll also want to create a follow up issue to implement this in custom alerts https://github.com/matomo-org/plugin-CustomAlerts/issues/new to retry these as well should there be any exception.

core/Scheduler/Scheduler.php

core/Scheduler/Timetable.php

bx80 · 2022-01-25T02:31:35Z

I've added a new integration test for the various timetable retry methods and improved the option loading/reloading code as suggested.

There is a new paragraph submitted for the scheduled task developer guide matomo-org/developer-documentation#605 and a new issue added to implement this in the CustomAlerts plugin.

bx80 · 2022-01-27T22:24:56Z

All tests are fixed, this should now be ready for a final review.

sgiehl · 2022-01-31T16:06:59Z

@bx80 you could also try adding some integration tests for the retry mechanism. Like creating some kind of test task that does nothing but throwing a retry exception and test that the system behaves correctly with rescheduling and aborting if retry limit was reached. Otherwise this PR already looks fine.

… are scheduled for retry

core/Scheduler/Schedule/SpecificTime.php

Catch and log errors for scheduled tasks

002839b

bx80 added the Easier debugging For issues that make troubleshooting issues for developers easier. label Nov 17, 2021

bx80 added this to the 4.7.0 milestone Nov 17, 2021

bx80 self-assigned this Nov 17, 2021

Added retry on exception mechanism for scheduled tasks

1d553d6

bx80 added Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. labels Nov 19, 2021

github-actions bot added the Stale The label used by the Close Stale Issues action label Nov 27, 2021

github-actions bot added the Stale for long The label used by the Close Stale Issues action label Jan 8, 2022

justinvelluppillai requested a review from sgiehl January 9, 2022 20:30

sgiehl reviewed Jan 10, 2022

View reviewed changes

core/Scheduler/Scheduler.php Outdated Show resolved Hide resolved

bx80 removed Needs Review PRs that need a code review Stale The label used by the Close Stale Issues action Stale for long The label used by the Close Stale Issues action labels Jan 11, 2022

bx80 added 3 commits January 17, 2022 11:11

Merge branch '4.x-dev' into m-17938-scheduled-task-exceptions

a5d34a5

Replace error code with custom exception

a3fc8d8

Fix/workaround for SchedulerTest mock breaking retry list option loading

f229521

sgiehl modified the milestones: 4.7.0, 4.8.0 Jan 18, 2022

sgiehl requested changes Jan 18, 2022

View reviewed changes

core/Scheduler/Scheduler.php Outdated Show resolved Hide resolved

plugins/ScheduledReports/API.php Show resolved Hide resolved

Log warning instead of info if a task has failed three times

4bfe80f

sgiehl requested a review from tsteur January 19, 2022 16:23

bx80 added the Needs Review PRs that need a code review label Jan 19, 2022

tsteur requested changes Jan 20, 2022

View reviewed changes

core/Scheduler/Scheduler.php Show resolved Hide resolved

core/Scheduler/Scheduler.php Show resolved Hide resolved

core/Scheduler/Timetable.php Show resolved Hide resolved

core/Scheduler/Timetable.php Show resolved Hide resolved

core/Scheduler/Timetable.php Outdated Show resolved Hide resolved

Added basic tests, minor code improvements

4626c28

bx80 mentioned this pull request Jan 25, 2022

Schedule a retry for any alerts that fail matomo-org/plugin-CustomAlerts#107

Open

bx80 added 2 commits January 26, 2022 15:28

Test fix

f0ecc9c

Fix for test

1134f49

Added integration test to check that only tasks that throw exceptions…

697a893

… are scheduled for retry

bx80 merged commit f20c4cb into 4.x-dev Feb 1, 2022

bx80 deleted the m-17938-scheduled-task-exceptions branch February 1, 2022 23:19

sgiehl reviewed Feb 2, 2022

View reviewed changes

core/Scheduler/Schedule/SpecificTime.php Show resolved Hide resolved

sgiehl mentioned this pull request Apr 5, 2022

Scheduler Error Missing parameter displayFormat #19047

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle scheduled task failures #18335

Handle scheduled task failures #18335

bx80 commented Nov 17, 2021

bx80 commented Nov 17, 2021

tsteur commented Nov 17, 2021

bx80 commented Nov 19, 2021

github-actions bot commented Nov 27, 2021

github-actions bot commented Jan 8, 2022

sgiehl left a comment

tsteur left a comment

bx80 commented Jan 25, 2022

bx80 commented Jan 27, 2022

sgiehl commented Jan 31, 2022

Handle scheduled task failures #18335

Handle scheduled task failures #18335

Conversation

bx80 commented Nov 17, 2021

Description:

Review

bx80 commented Nov 17, 2021

tsteur commented Nov 17, 2021

bx80 commented Nov 19, 2021

github-actions bot commented Nov 27, 2021

github-actions bot commented Jan 8, 2022

sgiehl left a comment

Choose a reason for hiding this comment

tsteur left a comment

Choose a reason for hiding this comment

bx80 commented Jan 25, 2022

bx80 commented Jan 27, 2022

sgiehl commented Jan 31, 2022