Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If archive_invalidations is in inconsistent state, fix as getting next archive to process. #16886

Merged
merged 7 commits into from Dec 8, 2020

Conversation

diosmosis
Copy link
Member

Description:

In case the archive_invalidations table ends up in an inconsistent state, make sure we process invalidations higher than what we find.

cc @tsteur

Review

  • Functional review done
  • Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • Security review done see checklist
  • Code review done
  • Tests were added if useful/possible
  • Reviewed for breaking changes
  • Developer changelog updated if needed
  • Documentation added if needed
  • Existing documentation updated if needed

@diosmosis diosmosis added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Needs Review PRs that need a code review labels Dec 5, 2020
@diosmosis diosmosis added this to the 4.0.3 milestone Dec 5, 2020
$bind[] = $archiveToProcess['report'];
}

$sql = "SELECT DISTINCT period FROM `$table` WHERE idsite = ? AND name = ? AND period > ? AND ? >= date1 AND date2 >= ? $reportClause";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw is the distinct actually needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do an array_unique as well, was trying to reduce the amount of data selected since there could potentially be a lot of invalidations for a site

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good should be fine to keep distinct for now

@tsteur tsteur requested a review from sgiehl December 6, 2020 20:58
@tsteur
Copy link
Member

tsteur commented Dec 6, 2020

@sgiehl be great if you could review this tomorrow. We'd probably need to merge this at the same time as #16843 to fully avoid too many duplicates.

@@ -278,6 +280,8 @@ public function getNextArchivesToProcess()

$this->logger->debug("Processing invalidation: $invalidationDesc.");

$this->repairInvalidationsIfNeeded($invalidatedArchive);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis any thought about doing this in CronArchive::launchArchivingFor() after it finished successfully? Technically we don't need to schedule the higher period if it's not successful and also it could prevent a race condition where the day archive takes many hours and by the time it's finished the higher period does not exist anymore if I see this right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thought about doing this in CronArchive::launchArchivingFor() after it finished successfully?

I figured it would be better to do it here, since when we invalidate through ArchiveInvalidator, we always add all higher periods. So at the start, we always have the whole list of invalidations.

Technically we don't need to schedule the higher period if it's not successful and also it could prevent a race condition where the day archive takes many hours and by the time it's finished the higher period does not exist anymore if I see this right?

The only case where the higher period would not exist anymore is if another archiver was processing the same site and picked it up, but this could happen w/o this repairing logic... In that case we'd have to do the intersecting period logic w/ a query on archive_invalidations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at the start, we always have the whole list of invalidations.

you mean then we can batch insert it? I reckon ideally we do it after finishing the archive just to make sure there can't be a race condition or any other issues I would say. It be fine if it's just one insert at a time I would say (we could always tweak when needed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsteur no I mean, in the invalidations table, normally we have all the higher periods before we start archiving, so doing the repair logic here mimics that behavior.

We can still do it after finishing an archive (though there will be less test coverage then).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be great to do it after finishing the archive. Could keep the method and just call it from CronArchive instead of QueueConsumer @diosmosis ? It's really just to avoid random edge cases etc which will happen

}

// archive is for week that is over two months, we don't need to care about the month
if ($label == 'month'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we don't need to care about this case? There might be still issues eg when someone imports data in the past etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the week will be on two months, so it won't actually be used in archiving either of the parent months.

There could be the case where there was a day that was archived, then for some reason the month/year disappear, and it's just the week. Then I guess we would need new month/year re-archives, but wouldn't be able to.

It seemed a waste to have to re-archive both months, but I can add it if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it now 👍 We only need to archive the correct month

@diosmosis
Copy link
Member Author

@tsteur updated, there are still tests for it, but some of the effects seen in the queue consumer tests are no longer there


$higherPeriods = Db::fetchAll($sql, $bind);
$higherPeriods = array_column($higherPeriods, 'period');
$higherPeriods = array_flip($higherPeriods);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not important but could have done instead below an in_array($id, $higherPeriods) . not too important though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thought this was a bit faster, but for an array w/ 3 items at most, probably not

'date1' => $period->getDateStart()->getDatetime(),
'date2' => $period->getDateEnd()->getDatetime(),
'period' => $id,
'ts_invalidated' => $archiveToProcess['ts_invalidated'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have maybe the current date? Just to be more clear when it was inserted. Or is it so it's included to be archived in this archive run maybe? (as we maybe do somewhere and ts_invalidated < idSiteStartTime)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's done that way so we process it in this batch. (getNextInvalidatedArchive does the ts_invalidated < idSiteStartTime)

$bind[] = $archiveToProcess['report'];
}

$sql = "SELECT DISTINCT period FROM `$table` WHERE idsite = ? AND name = ? AND period > ? AND ? >= date1 AND date2 >= ? $reportClause";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering. Do we here also need to include and status = 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis worked for me. Only need to maybe have a look at https://github.com/matomo-org/matomo/pull/16886/files#r537995820 as maybe the query needs to have to check status

@tsteur
Copy link
Member

tsteur commented Dec 8, 2020

Otherwise good to merge once the other comment re the where clause is resolved and tests pass

@diosmosis diosmosis merged commit dc3f9db into 4.x-dev Dec 8, 2020
@diosmosis diosmosis deleted the fix-inconsistent-table branch December 8, 2020 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants