Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow date ranges to be passed to core:invalidate-report-data #14450

Merged
merged 15 commits into from Oct 16, 2019
Merged

Conversation

katebutler
Copy link

Fixes #14178

@katebutler katebutler added the Needs Review PRs that need a code review label May 15, 2019
@katebutler katebutler added this to the 3.11.0 milestone May 15, 2019
@mattab
Copy link
Member

mattab commented May 16, 2019

Quick feedback:

  • would have expected the line unset($result[4]); // remove 'range' period to be removed as well, so that when you don't specify --periods it will also invalidate all ranges overlapping the invalidated date.

@mattab
Copy link
Member

mattab commented May 16, 2019

  • can you add some more test cases in
    // range period, one site, cascade = true
    array(
    array(1),
    array('2015-01-02', '2015-03-05'),
    'range',
    null,
    true,
    array(
    '2015_01' => array(
    '1.2015-01-01.2015-01-10.5.done.VisitsSummary',
    ),
    '2015_03' => array(
    '1.2015-03-04.2015-03-05.5.done.VisitsSummary',
    '1.2015-03-05.2015-03-10.5.done3736b708e4d20cfc10610e816a1b2341.UserCountry',
    ),
    ),
    ),
    maybe

@katebutler
Copy link
Author

Am working on integration tests for invalidating all overlapping date ranges when "all" is passed.

@katebutler
Copy link
Author

Ready for review/merge

'1.2014-10-15.2014-10-20.5.done3736b708e4d20cfc10610e816a1b2341',
),
),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change one of these tests to have $cascadeDown = true just to test that behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are actually all passing $cascadeDown = true but this doesn't do anything as the range gets split down into its individual dates before the cascading is applied. You can get a cascade-like behaviour though by passing --periods=all to the command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed it didn't do anything, but I think it's important for our coverage to be complete to test that it does nothing in the case of ranges. Otherwise there's a small section of the functionality that isn't tested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've chnaged a few of the tests to $cascadeDown = false

$allDates[] = $date;
}
$dates = $allDates;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm getting this right (and I probably am not, the invalidation code always confuses me), the reason we check for all subperiods is because ranges can span multiple months? In this case could we do addMonth(1) or something to check less dates? I think this might result in slightly more readable SQL, though that may not be too important in the long run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for catching smaller subperiods that lie within the month. If we just check one day each month we'll miss the ones that don't include that day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I could be seeing it incorrectly, but I was thinking of using date pairs like 2019-01-01,2019-01-31 so we'd check if there's any range affecting that whole month, rather than individual days.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this yesterday with @mattab and we have decided that although it would make the SQL easier to read and possibly be more efficient too, it's not worth investing the time as we don't expect that the query will be run very often.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries.

@tsteur
Copy link
Member

tsteur commented Jul 17, 2019

@katebutler @mattab

just tested and debugged to see how it works. I might have actually misunderstood and not quite sure which behaviour is expected.

So it seems right now when using eg ./console core:invalidate-report-data --periods=range --dates=2018-01-02,2018-02-02 it will invalidate the range period for each individual date. So it would invalidate period=range,date=2018-01-02,2018-01-02;period=range,date=2018-01-03,2018-01-03;.... period=range,date=2018-02-02,2018-02-02 but I would have now maybe expected it invalidates the exact date range where period=range, date=2018-01-02,2018-02-02. So basically something like this code:

diff --git a/core/Archive/ArchiveInvalidator.php b/core/Archive/ArchiveInvalidator.php
index 1f03a07dda..f3dd0a68c2 100644
--- a/core/Archive/ArchiveInvalidator.php
+++ b/core/Archive/ArchiveInvalidator.php
@@ -232,11 +232,8 @@ class ArchiveInvalidator
 
         // Get the individual dates for a range, we need to check all of them
         if ($periodType === 'range') {
-            $allDates = array();
-            for ($date = $dates[0]; $date->getTimestamp() <= $dates[1]->getTimestamp(); $date = $date->addDay(1)) {
-                $allDates[] = $date;
-            }
-            $dates = $allDates;
+            $periodsToInvalidate[] = Period\Factory::build($periodType, $dates[0]->toString() . ',' . $dates[1]->toString());
+            return $periodsToInvalidate;
         }
 
         foreach ($dates as $date) {

@mattab not quite sure what is supposed to happen?

@mattab mattab modified the milestones: 3.11.0, 3.12.0 Jul 23, 2019
@tsteur
Copy link
Member

tsteur commented Jul 29, 2019

@mattab still need your input here

@mattab
Copy link
Member

mattab commented Aug 9, 2019

@tsteur i would have also expected what you propose... @katebutler do you maybe remember the idea behind the current PR implementation?

@katebutler
Copy link
Author

Logic has been re-implemented so that:

  • If the range period type is passed, only archives whose date ranges exactly match the input range(s) will be invalidated. This should be more intuitive to the user, as we are only invalidating exactly what we've been asked to.
  • If the all period type is passed, all archives whose date ranges overlap the input range(s) will be invalidated. This is the correct behaviour when the user has recently removed some data (e.g. for GDPR compliance) and needs all archives that may contain that data to be invalidated.

foreach ($ranges as $range) {
// Ranges in the DB match if their date2 is after the start of the search range and date2 is before the end
$dateConditions[] = "(date2 >= ? AND date1 <= ?)";
$bind[] = $range->getDateStart()->toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I execute ./console core:invalidate-report-data --dates=2019-02-03,2019-02-04 I get an error

Uncaught exception: Error: Call to a member function getDateStart() on array in core/DataAccess/Model.php:178

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when using ./console core:invalidate-report-data --dates=2019-02-03,2019-02-10 --periods=all


foreach ($ranges as $range) {
// Ranges in the DB match if their date2 is after the start of the search range and date2 is before the end
$dateConditions[] = "(date2 >= ? AND date1 <= ?)";
Copy link
Member

@tsteur tsteur Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katebutler not sure if I'm understanding this correctly.

I have here this query generated:

UPDATE piwik_archive_numeric_2012_04 SET value = 4 WHERE name LIKE 'done%'
                   AND idsite IN (1)
                   AND (period = 5 AND ((date2 >= ? AND date1 <= ?)))

and the binds are ['2019-02-03', '2019-02-04']. Does date1 and date2 maybe need to be switched? The comment mentions twice date2 so that's also not quite clear. I called it like this:

./console core:invalidate-report-data --dates=2019-02-03,2019-02-04 --periods=range

Then I would expect date ranges like these to be updated:

2019-01-04,2019-03-03
2019-02-03,2019-03-03
2019-01-04,2019-02-03
2019-01-04,2019-02-04

but with this query I don't think this works? I might be seeing it wrong though. Or maybe we want it to behave differently? @mattab ?

* @return \Zend_Db_Statement
* @throws Exception
*/
public function updateAllRangeArchivesAsInvalidated($archiveTable, $idSites, $ranges, Segment $segment = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the method updateArchiveAsInvalidated https://github.com/matomo-org/matomo/blob/3.x-dev/core/DataAccess/Model.php#L108-L119 I think this is actually updating it as expected using (date1 <= ? AND ? <= date2)? I wonder is this method even needed? Not sure how it behaves different to the other method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually realising there is a new if in the mentioned method that only deletes an exact range when --periods=range is set. Not sure we want that behaviour. I reckon we can remove that if and restore old behaviour and then we can also use the updateArchiveAsInvalidated method again. I reckon it might be fine to use same behaviour for when using --periods=range and when using --periods=all. Currently it seems to be using different behaviour when using --periods=range where it only deletes the exact given range but it should maybe delete all ranges it is present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @mattab not exactly sure what behaviour we want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using --periods=all. Currently it seems to be using different behaviour when using --periods=range where it only deletes the exact given range but it should maybe delete all ranges it is present.

👍 when --periods=all then all ranges should be invalidated.

$dateConditions[] = "(date1 <= ? AND ? <= date2)";
$bind[] = $date;
$bind[] = $date;
if ($periodType == Period\Range::PERIOD_ID) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the if I mean below. I reckon we could restore old behaviour and delete all ranges it affects when using --period=range. I understand when using --period=range the problem is that it wouldn't iterate over all archive tables. For this we could always call markArchivesOverlappingRangeAsInvalidated() for the ranges and instead of updateAllRangeArchivesAsInvalidated call updateArchiveAsInvalidated.

@tsteur
Copy link
Member

tsteur commented Oct 14, 2019

@katebutler lets discuss details tmrw. Also waiting to here from @mattab the comments. I reckon though it be good to let it behave the same way when --periods=range is used and when --periods=all. We don't really want it to behave differently.

@tsteur
Copy link
Member

tsteur commented Oct 14, 2019

It's hard to explain what I meant so quickly made 3-4 changes... lets discuss later and see what makes sense... I meant something like this:

https://github.com/matomo-org/matomo/compare/14178_2?expand=1

see my last commit df13cbe where we basically handle period range separately. (didn't test it at all)... there are still some things that would need to be adjusted and restored maybe such as https://github.com/matomo-org/matomo/compare/14178_2?expand=1#diff-27aba6f51c03fb57f660eeb0b2a18484L234 maybe... seems to be making the whole PR easier though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In core:invalidate-report-data add support for invalidating date range
4 participants