New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle scheduled task failures #18335
Conversation
This PR currently just addresses problem 3 of the three problems listed in #17938 by logging any exceptions. @tsteur Do we want to just add this logging for now to help determine priority or look for a solution for rescheduling/retrying failed tasks? As previously noted in the issue, it seems like there would need to be some indication passed from the failed task as to whether to reschedule or not (eg. if an email has been sent then don't reschedule again) and perhaps additionally some sort of retry counter option to avoid getting stuck in a pointless reschedule/fail loop. (reschedule x times, then give up) So for example: Perhaps we could start by adding a mechanism to return a reschedule=true/false indication when tasks encounter an exception and then have the scheduler exception handler use this to reschedule if required. Initially we could just have the email reports task check for exceptions that are worth retrying (eg. the 'No such file or directory' error) and then pass back reschedule=true. If this works well then other scheduled tasks could also use the same mechanism as needed. What do you think? |
@bx80 I think that makes sense and maybe it's not too complicated to implement? Then we could decide on a per task level whether we want to reschedule or not if it fails. |
I've added a fairly simple rescheduling mechanism, for it to work the task must catch an exception and re-throw it with the error code 1. The scheduler execute task exception handler will check if the exception has error code 1 and if so will set a flag to reschedule the task for a retry. The task will then be rescheduled for one hour into the future by the main schedule loop. To prevent infinite retry loops a new option 'TaskScheduler.retryList' has been added, it is a simple list of task names and the number of times they have been scheduled for a retry, when this count reaches 3 then the failing task will no longer be rescheduled one hour into the future and should fall back to it's normal next scheduled time. The scheduled reports 'sendReport' method has been adjusted to catch any exceptions thrown on the I'm not sure if a one hour bump and three retries are the best values to use, happy to change them to whatever makes the most sense. |
This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers |
This PR was last updated more than one month ago, maybe it's time to close it. Please check if there is anything we still can do or close this PR. ping @matomo-org/core-reviewers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
@tsteur would you mind having a quick look if that solution fits. I think it should work fine, but we would need to go through some our tasks afterwards and check if we would need to throw the RetryableException
somewhere else as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested it but looked through the code and left a few comments.
Also few things here:
- Any chance we can add a few (basic) tests for the timetable methods like
incrementCounter
etc? - Can we also document this new exception in https://developer.matomo.org/guides/scheduled-tasks ?
- we'll also want to create a follow up issue to implement this in custom alerts https://github.com/matomo-org/plugin-CustomAlerts/issues/new to retry these as well should there be any exception.
I've added a new integration test for the various timetable retry methods and improved the option loading/reloading code as suggested. There is a new paragraph submitted for the scheduled task developer guide matomo-org/developer-documentation#605 and a new issue added to implement this in the CustomAlerts plugin. |
All tests are fixed, this should now be ready for a final review. |
@bx80 you could also try adding some integration tests for the retry mechanism. Like creating some kind of test task that does nothing but throwing a retry exception and test that the system behaves correctly with rescheduling and aborting if retry limit was reached. Otherwise this PR already looks fine. |
… are scheduled for retry
Description:
Fixes #17938 (partial)
Catches and logs exceptions that occur when running scheduled tasks
Review