New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better detect failed archiving jobs. #16701
Conversation
Note: working on a test |
core/DataAccess/Model.php
Outdated
@@ -786,10 +788,35 @@ public function isArchiveAlreadyInProgress($invalidatedArchive) | |||
$bind[] = $invalidatedArchive['report']; | |||
} | |||
|
|||
$sql = "SELECT MAX(idinvalidation) FROM `$table` WHERE idsite = ? AND date1 = ? AND date2 = ? AND `period` = ? AND `name` = ? AND status = 1 AND $reportClause"; | |||
$sql = "SELECT MAX(idinvalidation) as idinvalidation, ts_invalidated FROM `$table` WHERE idsite = ? AND date1 = ? AND date2 = ? AND `period` = ? AND `name` = ? AND status = 1 AND $reportClause"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it's possible or make sense but do we also need a max or min on the ts_invalidated
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a mistake I fixed locally, should be a LIMIT 1 query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
core/DataAccess/Model.php
Outdated
]); | ||
|
||
// delete the in progress invalidation since we assume it errored | ||
$this->deleteInvalidations($inProgressInvalidation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random thought... might not make any sense. If we assume it failed, would we rather set it to 0
so it will be rearchived again in the hope it works and doesn't fail the next time? Although I suppose this can get complicated as we'd probably then also would maybe need to invalidate any parent period in case they were finished in the meantime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parent periods shouldn't start if an intersecting lower period is there so this should be ok. The period might just keep failing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what to do best here. I'm tempted to say we set the status 0
and better maybe fail repeatedly than having potentially wrong data? Problem obviously being it wouldn't archive any other data until this is resolved although I think if newer invalidations are added, and they are for newer periods, then these ones would be maybe processed eventually because we prefer more recent dates AFAIK. Might not be thinking this fully through though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsteur Actually, this is in the check for an existing in progress archive, so that would mean there already is another invalidation identical to this one.
I guess we can also check in CoreArchive.php for in progress invalidations that are over a day old and change the status on them before we actually start any archiving. What do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I guess here in isArchiveAlreadyInProgress
if the same invalidation wants to start, and there is one in progress for 1 day already then we can delete it.
There would be still some that would never be detected. It could be in CronArchive.php or could be a task to check for these? Could in this case maybe only use this logic and undo the change in isArchiveAlreadyInProgress
as having a general check should help? Ideally we just don't execute the query looking for in progress archive invalidations older than 1 day too often. Maybe only once hourly or once on cronarchive start or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There would be still some that would never be detected. It could be in CronArchive.php or could be a task to check for these? Could in this case maybe only use this logic and undo the change in isArchiveAlreadyInProgress as having a general check should help? Ideally we just don't execute the query looking for in progress archive invalidations older than 1 day too often. Maybe only once hourly or once on cronarchive start or so.
Yes, this was what I mentioned in the second paragraph. I'll go w/ this.
…ay before starting archiving.
aeca99e
to
a649cf0
Compare
@tsteur updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. The only thing I'm scared of is that an archive run that takes longer than a day would be basically always reset to 0
and always run again but suppose one day shouldn't happen. I wouldn't be surprised if it happens though eg when a monthly archive is triggered, and then it notices some days are outdated, and it also rearchives several days and then weeks etc. Even though we triggered month, I think it might still archive individual days when needed not sure...
Should we change it to maybe 30 hours instead? Still not perfect but I think that would make things bit safer and less likely to run into that issue.
Actually... after more thinking this might not be needed to set it to 30 hours. Because: If it was to try to start that archive again, and it is still in progress, and archiving through CLI is used then it would notice this one is already running. And next time the archive is triggered it should finish in less than 24 hours considering the individual days and weeks within that month would be eventually archived.
core/DataAccess/Model.php
Outdated
public function resetFailedArchivingJobs() | ||
{ | ||
$table = Common::prefixTable('archive_invalidations'); | ||
$sql = "UPDATE $table SET status = ? WHERE status = ? AND ts_invalidated < ?"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis could we add a test for this? and might be also good to document this in https://developer.matomo.org/guides/archiving ?
core/DataAccess/Model.php
Outdated
@@ -632,7 +632,7 @@ public function startArchive($invalidation) | |||
$table = Common::prefixTable('archive_invalidations'); | |||
|
|||
// set archive value to in progress if not set already | |||
$statement = Db::query("UPDATE `$table` SET `status` = ? WHERE idinvalidation = ? AND status = ?", [ | |||
$statement = Db::query("UPDATE `$table` SET `status` = ? AND ts_invalidated = NOW() WHERE idinvalidation = ? AND status = ?", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would almost rather need a different field for this like ts_start
? although maybe not too important. Just thought it might make some troubleshooting bit more difficult but I think it's actually fine as it would only affect invalidations that are in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a quick change, I can add an extra column if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no big preference. Leaving this up to you. Not sure how much work it is. I reckon it may be useful for debugging purposes in the future to understand things better but may not be needed
this shouldn't technically happen since we order the periods by day => month and don't allow simultaneously archiving intersecting periods. there might be some edge cases where it does however, like someone deletes an archive right as a month archive starts, but it would be rare I think. best way to avoid this would be to check the Lock before resetting an archive, not sure if that would be worth doing. |
I'm thinking this might still happen. When requesting a month period then it has to archive within the same request a missing day archive etc. |
It might happen but not to the extent that it would end up taking so much longer, I think? Maybe something happens to cause some days to be deleted while archiving a month, but not the entire month? Maybe if there's a purge but then we shouldn't be archiving that data... |
@tsteur updated |
Description:
Refs #16689
Better detection of in progress invalidations: if an invalidation's status is set to in progress, but the invalidation was created over a day ago. (We also update the ts_invalidated time to now() when starting an archive).
CC @tsteur
Review