Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better detect failed archiving jobs. #16701

Merged
merged 2 commits into from Nov 12, 2020
Merged

Better detect failed archiving jobs. #16701

merged 2 commits into from Nov 12, 2020

Conversation

diosmosis
Copy link
Member

Description:

Refs #16689

Better detection of in progress invalidations: if an invalidation's status is set to in progress, but the invalidation was created over a day ago. (We also update the ts_invalidated time to now() when starting an archive).

CC @tsteur

Review

  • Functional review done
  • Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • Security review done see checklist
  • Code review done
  • Tests were added if useful/possible
  • Reviewed for breaking changes
  • Developer changelog updated if needed
  • Documentation added if needed
  • Existing documentation updated if needed

@diosmosis diosmosis added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Needs Review PRs that need a code review labels Nov 11, 2020
@diosmosis diosmosis added this to the 4.0.0-RC milestone Nov 11, 2020
@diosmosis
Copy link
Member Author

Note: working on a test

@@ -786,10 +788,35 @@ public function isArchiveAlreadyInProgress($invalidatedArchive)
$bind[] = $invalidatedArchive['report'];
}

$sql = "SELECT MAX(idinvalidation) FROM `$table` WHERE idsite = ? AND date1 = ? AND date2 = ? AND `period` = ? AND `name` = ? AND status = 1 AND $reportClause";
$sql = "SELECT MAX(idinvalidation) as idinvalidation, ts_invalidated FROM `$table` WHERE idsite = ? AND date1 = ? AND date2 = ? AND `period` = ? AND `name` = ? AND status = 1 AND $reportClause";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it's possible or make sense but do we also need a max or min on the ts_invalidated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a mistake I fixed locally, should be a LIMIT 1 query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

]);

// delete the in progress invalidation since we assume it errored
$this->deleteInvalidations($inProgressInvalidation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random thought... might not make any sense. If we assume it failed, would we rather set it to 0 so it will be rearchived again in the hope it works and doesn't fail the next time? Although I suppose this can get complicated as we'd probably then also would maybe need to invalidate any parent period in case they were finished in the meantime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parent periods shouldn't start if an intersecting lower period is there so this should be ok. The period might just keep failing though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do best here. I'm tempted to say we set the status 0 and better maybe fail repeatedly than having potentially wrong data? Problem obviously being it wouldn't archive any other data until this is resolved although I think if newer invalidations are added, and they are for newer periods, then these ones would be maybe processed eventually because we prefer more recent dates AFAIK. Might not be thinking this fully through though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsteur Actually, this is in the check for an existing in progress archive, so that would mean there already is another invalidation identical to this one.

I guess we can also check in CoreArchive.php for in progress invalidations that are over a day old and change the status on them before we actually start any archiving. What do you think about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I guess here in isArchiveAlreadyInProgress if the same invalidation wants to start, and there is one in progress for 1 day already then we can delete it.

There would be still some that would never be detected. It could be in CronArchive.php or could be a task to check for these? Could in this case maybe only use this logic and undo the change in isArchiveAlreadyInProgress as having a general check should help? Ideally we just don't execute the query looking for in progress archive invalidations older than 1 day too often. Maybe only once hourly or once on cronarchive start or so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be still some that would never be detected. It could be in CronArchive.php or could be a task to check for these? Could in this case maybe only use this logic and undo the change in isArchiveAlreadyInProgress as having a general check should help? Ideally we just don't execute the query looking for in progress archive invalidations older than 1 day too often. Maybe only once hourly or once on cronarchive start or so.

Yes, this was what I mentioned in the second paragraph. I'll go w/ this.

@diosmosis
Copy link
Member Author

@tsteur updated

Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. The only thing I'm scared of is that an archive run that takes longer than a day would be basically always reset to 0 and always run again but suppose one day shouldn't happen. I wouldn't be surprised if it happens though eg when a monthly archive is triggered, and then it notices some days are outdated, and it also rearchives several days and then weeks etc. Even though we triggered month, I think it might still archive individual days when needed not sure...

Should we change it to maybe 30 hours instead? Still not perfect but I think that would make things bit safer and less likely to run into that issue.

Actually... after more thinking this might not be needed to set it to 30 hours. Because: If it was to try to start that archive again, and it is still in progress, and archiving through CLI is used then it would notice this one is already running. And next time the archive is triggered it should finish in less than 24 hours considering the individual days and weeks within that month would be eventually archived.

public function resetFailedArchivingJobs()
{
$table = Common::prefixTable('archive_invalidations');
$sql = "UPDATE $table SET status = ? WHERE status = ? AND ts_invalidated < ?";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis could we add a test for this? and might be also good to document this in https://developer.matomo.org/guides/archiving ?

@@ -632,7 +632,7 @@ public function startArchive($invalidation)
$table = Common::prefixTable('archive_invalidations');

// set archive value to in progress if not set already
$statement = Db::query("UPDATE `$table` SET `status` = ? WHERE idinvalidation = ? AND status = ?", [
$statement = Db::query("UPDATE `$table` SET `status` = ? AND ts_invalidated = NOW() WHERE idinvalidation = ? AND status = ?", [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would almost rather need a different field for this like ts_start? although maybe not too important. Just thought it might make some troubleshooting bit more difficult but I think it's actually fine as it would only affect invalidations that are in progress

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a quick change, I can add an extra column if needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no big preference. Leaving this up to you. Not sure how much work it is. I reckon it may be useful for debugging purposes in the future to understand things better but may not be needed

@diosmosis
Copy link
Member Author

looks good to me. The only thing I'm scared of is that an archive run that takes longer than a day would be basically always reset to 0 and always run again but suppose one day shouldn't happen. I wouldn't be surprised if it happens though eg when a monthly archive is triggered, and then it notices some days are outdated, and it also rearchives several days and then weeks etc. Even though we triggered month, I think it might still archive individual days when needed not sure...

this shouldn't technically happen since we order the periods by day => month and don't allow simultaneously archiving intersecting periods. there might be some edge cases where it does however, like someone deletes an archive right as a month archive starts, but it would be rare I think. best way to avoid this would be to check the Lock before resetting an archive, not sure if that would be worth doing.

@tsteur
Copy link
Member

tsteur commented Nov 11, 2020

this shouldn't technically happen since we order the periods by day => month and don't allow simultaneously archiving intersecting periods

I'm thinking this might still happen. When requesting a month period then it has to archive within the same request a missing day archive etc.

@diosmosis
Copy link
Member Author

It might happen but not to the extent that it would end up taking so much longer, I think? Maybe something happens to cause some days to be deleted while archiving a month, but not the entire month? Maybe if there's a purge but then we shouldn't be archiving that data...

@diosmosis
Copy link
Member Author

@tsteur updated

@tsteur tsteur merged commit 0e7530e into 4.x-dev Nov 12, 2020
@tsteur tsteur deleted the status-check-tweak branch November 12, 2020 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants