New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow date ranges to be passed to core:invalidate-report-data #14450
Conversation
Quick feedback:
|
|
Am working on integration tests for invalidating all overlapping date ranges when "all" is passed. |
Ready for review/merge |
'1.2014-10-15.2014-10-20.5.done3736b708e4d20cfc10610e816a1b2341', | ||
), | ||
), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change one of these tests to have $cascadeDown = true
just to test that behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are actually all passing $cascadeDown = true
but this doesn't do anything as the range gets split down into its individual dates before the cascading is applied. You can get a cascade-like behaviour though by passing --periods=all
to the command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed it didn't do anything, but I think it's important for our coverage to be complete to test that it does nothing in the case of ranges. Otherwise there's a small section of the functionality that isn't tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've chnaged a few of the tests to $cascadeDown = false
$allDates[] = $date; | ||
} | ||
$dates = $allDates; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm getting this right (and I probably am not, the invalidation code always confuses me), the reason we check for all subperiods is because ranges can span multiple months? In this case could we do addMonth(1)
or something to check less dates? I think this might result in slightly more readable SQL, though that may not be too important in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for catching smaller subperiods that lie within the month. If we just check one day each month we'll miss the ones that don't include that day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I could be seeing it incorrectly, but I was thinking of using date pairs like 2019-01-01,2019-01-31
so we'd check if there's any range affecting that whole month, rather than individual days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed this yesterday with @mattab and we have decided that although it would make the SQL easier to read and possibly be more efficient too, it's not worth investing the time as we don't expect that the query will be run very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries.
just tested and debugged to see how it works. I might have actually misunderstood and not quite sure which behaviour is expected. So it seems right now when using eg diff --git a/core/Archive/ArchiveInvalidator.php b/core/Archive/ArchiveInvalidator.php
index 1f03a07dda..f3dd0a68c2 100644
--- a/core/Archive/ArchiveInvalidator.php
+++ b/core/Archive/ArchiveInvalidator.php
@@ -232,11 +232,8 @@ class ArchiveInvalidator
// Get the individual dates for a range, we need to check all of them
if ($periodType === 'range') {
- $allDates = array();
- for ($date = $dates[0]; $date->getTimestamp() <= $dates[1]->getTimestamp(); $date = $date->addDay(1)) {
- $allDates[] = $date;
- }
- $dates = $allDates;
+ $periodsToInvalidate[] = Period\Factory::build($periodType, $dates[0]->toString() . ',' . $dates[1]->toString());
+ return $periodsToInvalidate;
}
foreach ($dates as $date) { @mattab not quite sure what is supposed to happen? |
@mattab still need your input here |
@tsteur i would have also expected what you propose... @katebutler do you maybe remember the idea behind the current PR implementation? |
…validate all archived ranges that overlap the argument range, else we invalidate exact match only
Logic has been re-implemented so that:
|
core/DataAccess/Model.php
Outdated
foreach ($ranges as $range) { | ||
// Ranges in the DB match if their date2 is after the start of the search range and date2 is before the end | ||
$dateConditions[] = "(date2 >= ? AND date1 <= ?)"; | ||
$bind[] = $range->getDateStart()->toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I execute ./console core:invalidate-report-data --dates=2019-02-03,2019-02-04
I get an error
Uncaught exception: Error: Call to a member function getDateStart() on array in core/DataAccess/Model.php:178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when using ./console core:invalidate-report-data --dates=2019-02-03,2019-02-10 --periods=all
core/DataAccess/Model.php
Outdated
|
||
foreach ($ranges as $range) { | ||
// Ranges in the DB match if their date2 is after the start of the search range and date2 is before the end | ||
$dateConditions[] = "(date2 >= ? AND date1 <= ?)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@katebutler not sure if I'm understanding this correctly.
I have here this query generated:
UPDATE piwik_archive_numeric_2012_04 SET value = 4 WHERE name LIKE 'done%'
AND idsite IN (1)
AND (period = 5 AND ((date2 >= ? AND date1 <= ?)))
and the binds are ['2019-02-03', '2019-02-04']
. Does date1
and date2
maybe need to be switched? The comment mentions twice date2
so that's also not quite clear. I called it like this:
./console core:invalidate-report-data --dates=2019-02-03,2019-02-04 --periods=range
Then I would expect date ranges like these to be updated:
2019-01-04,2019-03-03
2019-02-03,2019-03-03
2019-01-04,2019-02-03
2019-01-04,2019-02-04
but with this query I don't think this works? I might be seeing it wrong though. Or maybe we want it to behave differently? @mattab ?
core/DataAccess/Model.php
Outdated
* @return \Zend_Db_Statement | ||
* @throws Exception | ||
*/ | ||
public function updateAllRangeArchivesAsInvalidated($archiveTable, $idSites, $ranges, Segment $segment = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the method updateArchiveAsInvalidated
https://github.com/matomo-org/matomo/blob/3.x-dev/core/DataAccess/Model.php#L108-L119 I think this is actually updating it as expected using (date1 <= ? AND ? <= date2)
? I wonder is this method even needed? Not sure how it behaves different to the other method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually realising there is a new if
in the mentioned method that only deletes an exact range when --periods=range
is set. Not sure we want that behaviour. I reckon we can remove that if
and restore old behaviour and then we can also use the updateArchiveAsInvalidated
method again. I reckon it might be fine to use same behaviour for when using --periods=range
and when using --periods=all
. Currently it seems to be using different behaviour when using --periods=range
where it only deletes the exact given range but it should maybe delete all ranges it is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @mattab not exactly sure what behaviour we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using --periods=all. Currently it seems to be using different behaviour when using --periods=range where it only deletes the exact given range but it should maybe delete all ranges it is present.
👍 when --periods=all
then all ranges should be invalidated.
$dateConditions[] = "(date1 <= ? AND ? <= date2)"; | ||
$bind[] = $date; | ||
$bind[] = $date; | ||
if ($periodType == Period\Range::PERIOD_ID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the if
I mean below. I reckon we could restore old behaviour and delete all ranges it affects when using --period=range
. I understand when using --period=range
the problem is that it wouldn't iterate over all archive tables. For this we could always call markArchivesOverlappingRangeAsInvalidated()
for the ranges and instead of updateAllRangeArchivesAsInvalidated
call updateArchiveAsInvalidated
.
@katebutler lets discuss details tmrw. Also waiting to here from @mattab the comments. I reckon though it be good to let it behave the same way when |
It's hard to explain what I meant so quickly made 3-4 changes... lets discuss later and see what makes sense... I meant something like this: https://github.com/matomo-org/matomo/compare/14178_2?expand=1 see my last commit df13cbe where we basically handle period range separately. (didn't test it at all)... there are still some things that would need to be adjusted and restored maybe such as https://github.com/matomo-org/matomo/compare/14178_2?expand=1#diff-27aba6f51c03fb57f660eeb0b2a18484L234 maybe... seems to be making the whole PR easier though |
…ods=all --dates=2019-01-02
Fixes #14178