@flamisz opened this Pull Request on March 16th 2021 Contributor

Description:

fixes #17336

Review

  • [ ] Functional review done
  • [ ] Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems)
  • [ ] Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • [ ] Security review done see checklist
  • [ ] Code review done
  • [ ] Tests were added if useful/possible
  • [ ] Reviewed for breaking changes
  • [ ] Developer changelog updated if needed
  • [ ] Documentation added if needed
  • [ ] Existing documentation updated if needed
@flamisz commented on March 16th 2021 Contributor

Hi @diosmosis, @tsteur and I had a quick chat this morning, and this is what figured in the end.

I tested locally and I see it does return true when it's expected, but I don't know all the cases/edge cases to see if it really solves the bug you mentioned in your comment.

Can you please check it?

Thanks

@diosmosis commented on March 17th 2021 Member

@flamisz I see two possible issues w/ this solution (though it's on the right track):

  1. the process_new_segments_from value can be set to something like lastN or last update time (rather than segment creation time), which means the earliest value to archive from when rearchiving historical data can change. So we might rearchive 6 months in the past today (October), then next month (April) the user needs to invalidate and re-archive October, but that will be outside the range allowed by this code, and nothing will happen.
  2. it's possible for users to want old data processed for the segment that is before the date that is computed from process_new_segments_from. this solution would not allow users to do that at all.

@tsteur there might need to be some discussion on a good solution as it seems like it may be complicated to solve this.

@flamisz commented on March 18th 2021 Contributor

It looks like we need more discussions on this issue. I pause working on it until you'll have more time to dig deeper into it.

@tsteur commented on March 18th 2021 Member

@diosmosis great point. This makes me think it can't have worked in Matomo 3 either and we might need to lower the priority for this. Out of the box do you otherwise have any quick idea of how to fix this without the need of a big refactoring? Like ideally we'd know whether it was maybe forced to archive this segment for a range before segment creation but we can't really know (although maybe we could check the archive_invalidation table assuming an entry would exist there for the day etc in that case).

@diosmosis commented on March 18th 2021 Member

@tsteur

Like ideally we'd know whether it was maybe forced to archive this segment for a range before segment creation but we can't really know (although maybe we could check the archive_invalidation table assuming an entry would exist there for the day etc in that case).

The only way I can think of is to have a new column on archive_invalidations like is_rearchiving. Then in applyScheduledRearchiving() and related methods, we save that column value as 1. If we see that in core:archive, we can attach a parameter to archiving requests isReArchiving=1, then we can detect it in an archiving request and handle every rearchiving case (or we can alternatively just do it for segments, that would be a bit more work though).

I can't think of a way to do it w/o a new column. I guess we could also just stick it on the name column value, but that would make the code a bit weirder imo.

@tsteur commented on March 19th 2021 Member

thanks @diosmosis just double checking so something like this wouldn't work?

diff --git a/core/ArchiveProcessor/Loader.php b/core/ArchiveProcessor/Loader.php
index 74adf31215..dc2e8918a7 100644
--- a/core/ArchiveProcessor/Loader.php
+++ b/core/ArchiveProcessor/Loader.php
@@ -384,6 +384,11 @@ class Loader
             if ($segmentInfo) {
                 $segmentArchiveStartDate = $segmentArchiving->getReArchiveSegmentStartDate($segmentInfo);
                 if ($segmentArchiveStartDate->isLater($periodEnd)) {
+
+                    $invalidator = StaticContainer::get(ArchiveInvalidator::class);
+                    // hasSegmentArchiveInvalid doesn't exist yet but would check if it's currently exists in archive_invalidation table since we would have an entry there when it's actually forced
+                    if (!$invalidator->hasSegmentArchiveInvalid($idSite, $segmentHash, $plugin, $date1, $date2)) {
+                        return true;
+                    }
                 }
             }

Assuming that when someone invalidates eg a week, we also add the entries for each day to archive_invalidations? Not sure if that's the case though.

@diosmosis commented on March 19th 2021 Member

@tsteur I'm not sure that approach will work (if I understand it correctly). When checking a year, for example, the archiver should have already handled months, weeks days in the year, so they would have been removed from the invalidations table.

@tsteur commented on March 19th 2021 Member

Which be good that they are removed meaning it wouldn't launch archiving anyway in that case? as the days etc are already archived and it would only aggregate the existing data?

@diosmosis commented on March 19th 2021 Member

@tsteur thinking out loud:

so there are three cases, one where we archive correctly where date range > than process_new_segments_from's date:

  • archiver has day, week, month invalidations and processes them
  • the day, week, month invalidations get removed
  • the year archive starts, tries to get month archives
  • all months are > date range, so we don't skip
  • all months are already archived so we use them

this case works.

next case, we archive correctly where some of date range < process_new_segments_from's date:

  • archiver has day, week, month invalidations and processes them
  • some day/week/month invalidations are before process_new_segments_from date, they get skipped
  • year archive is initialized, is not entirely before process_new_segments_from date, gets archived
  • months inside get skipped or aggregated because they are either before the date or already archived

This is what we want.

third case, user invalidates date range before process_new_segments_from's date, because they want that data rearchived (not automated use case):

  • archiver has day, week, month invalidations and processes them
  • day/week/month invalidations are before process_new_segments_from date, they get skipped
  • year archive is initialized, is entirely before process_new_segments_from date, does not get archived
  • months inside get skipped or aggregated because they are either before the date or already archived
  • full set of dates is not archived because it is before process_new_segments_from date

This is not what we want, because the user specifically invalidated this date range, it wasn't automated. Checking the invalidation table isn't a way to differentiate between user initiated invalidations and scheduled rearchiving.

there are probably some edge cases when archives fail for some reason, that might be an entirely different issue.

@tsteur commented on March 22nd 2021 Member

third case, user invalidates date range before process_new_segments_from's date, because they want that data rearchived (not automated use case):

I suppose you mean an actual custom date range like 2020-02-02,2020-02-04? When a date range is being invalidated, do we put the individual days and weeks etc into the invalidations table?

Checking the invalidation table isn't a way to differentiate between user initiated invalidations and scheduled rearchiving.

Because of the setting there should be no invalidations from scheduled rearchiving so I thought it is maybe not needed to differentiate between user initiated? Basically, my thoughts (and hope) was that we would have entries in invalidations table before the segment creation time only when it was triggered by a user and then we could force the archiving maybe?

@diosmosis commented on March 22nd 2021 Member

@tsteur

I suppose you mean an actual custom date range like 2020-02-02,2020-02-04? When a date range is being invalidated, do we put the individual days and weeks etc into the invalidations table?

No, sorry I didn't mean that. I just meant something invalidated by the user and not rearchived.

Basically, my thoughts (and hope) was that we would have entries in invalidations table before the segment creation time only when it was triggered by a user and then we could force the archiving maybe?

These invalidations are added right before archiving for a site, so they will always be there before archiving (and are removed after archiving). Sorry, I'm not entirely sure what you're getting at. The day archives will get removed after the day is archived, then the week will start archiving, but this is true of rearchiving AND of user initiated invalidation. There's no difference at the moment, which sort of the problem. It would be fairly trivial to signify it though w/ an extra db column on the invalidation table. Or we could hack something into the name column.

@mattab commented on March 24th 2021 Member

Quickly wanted to check in on how this PR is progressing and what would be the next step(s)? (sorry haven't read the whole discussion :see_no_evil: )

(asking because this PR is the first "stage" of hoping to fix https://innocraft.atlassian.net/jira/software/projects/L3/boards/20?selectedIssue=L3-46 which is not currently a major issue it seems (as customer hasn't complained) but is nevertheless quite important issue (fyi they are an enterprise customer). also it generates many emails).
Thanks,

@tsteur commented on March 25th 2021 Member

The day archives will get removed after the day is archived, then the week will start archiving, but this is true of rearchiving AND of user initiated invalidation.

@diosmosis can you maybe explain what you mean by this is true of rearchiving? I'm not quite sure what you mean with rearchiving here?

@mattab this one is quite complex and it's not a regression AFAIK and it'll take a while longer. Maybe there's another workaround meanwhile

@diosmosis commented on March 25th 2021 Member

@tsteur can you maybe explain what you mean by this is true of rearchiving? I'm not quite sure what you mean with rearchiving here?

I mean there's no difference between "user invalidates archive, it gets processed by core:archive" and "system schedules rearchiving, system invalidates archives, they get processed by core:archive". There's no way to tell those invalidations apart. BUT in this issue, we want to apply the archive limiting ONLY to rearchiving segments. If a user invalidates in the past before process_new_segments_from, we want the data to be rearchived.

So we need some way to tell the invalidations apart, whether they were created via user invalidation or via system invalidation.

If a user tracks data in the past before process_new_segments_from, we also want that to archive as well.

@mattab commented on March 25th 2021 Member

@tsteur afaik the other workaround might be to implement rankingQuery in custom dimensions archiving (what @diosmosis had suggested in the L3)

@diosmosis commented on March 25th 2021 Member

@mattab I can work on that as part of the L3

@tsteur commented on March 26th 2021 Member

If a user tracks data in the past before process_new_segments_from, we also want that to archive as well.

FYI for segments we actually wouldn't want to archive it if a user tracks data for a period that is eg before segment creation date. We'd only want to archive it if user specifically invalidates it. If we could have a check to not create the invalidations for such a segment when data is tracked in the past, then it would probably work maybe?

@diosmosis commented on March 26th 2021 Member

@tsteur does that apply to all possible values of process_new_segments_from or just creation date? ie, if it is set to last6, then two months pass, and the user tracks data from 7 months ago (which was after when the segment was created), we archive or not?

And since we can sometimes archive in the past when creating a segment (ie, if set to last6, and we create it in Nov, then we rearchive from May - Nov), should we still allow tracking data in May and rearchiving?

And would it not apply if process_new_segments_from is set to beginning_of_time?\

We'd only want to archive it if user specifically invalidates it.

Even with this, we'd still need to differentiate between user created invalidation and system created invalidation.

@tsteur commented on March 29th 2021 Member

@tsteur does that apply to all possible values of process_new_segments_from or just creation date? ie, if it is set to last6, then two months pass, and the user tracks data from 7 months ago (which was after when the segment was created), we archive or not?

We would archive the last 6 months before creation date (according to the documentation in the global ini config). But we wouldn't archive the reports 7 months before creation date if someone was to track data into this.

And since we can sometimes archive in the past when creating a segment (ie, if set to last6, and we create it in Nov, then we rearchive from May - Nov), should we still allow tracking data in May and rearchiving?

We would still allow tracking because it's required for All Visits but we wouldn't rearchive the data for this segment if it is before the 6 months. We would only do it if someone was to manually invalidate the data maybe to trigger the archiving this way.

@diosmosis commented on March 29th 2021 Member

@tsteur

We would only do it if someone was to manually invalidate the data maybe to trigger the archiving this way.

This is the problem I'm trying to describe: there is no way currently to tell between an invalidation that was manually invalidated and an invalidation that was created through reArchiveReport.

@tsteur commented on March 29th 2021 Member

I guess my thinking is that reArchiveReport would automatically always correctly respect the configured process_new_segments_from setting so we wouldn't need to think about it. Eg when you create or edit a segment, it should never invalidate an archive before the configured date. Eg if you create a segment today, and configure last10, then it would only invalidate the last 10 days but nothing before that date (day 11 and previous days). If we can trust that this works properly I'm thinking it should be no problem unless scheduleReArchiving is called by the system without respecting the configured date. Not sure if the system would create them somewhere and not respect the configuration?

@diosmosis commented on March 29th 2021 Member

@tsteur

guess my thinking is that reArchiveReport would automatically always correctly respect the configured process_new_segments_from setting so we wouldn't need to think about it.

Well the issue isn't that it will invalidate more than is necessary, it already only invalidates what it needs to (eg, last6, invaldates the last 6 months, but ALSO the year 2021 and 2020). This ends up triggering archiving of missing child archives, so reArchiveReport can't really stop child archives for those invalidations from being initiated, unless we add an extra column to the table and mark the invalidations in some way.

@diosmosis commented on March 29th 2021 Member

@tsteur another option if not adding a column is having new statuses, instead of just in progress/queued, maybe in progress/queued/queued rearchive. Then when we encounter queued rearchive, we can send a query param telling the core archiver not to archive child archives if not already present (or not to archive anything outside the segment date limits). This would require changing queries that check for status.

@tsteur commented on April 7th 2021 Member

Technically we could add a new column but trying to avoid making anything possibly more complicated etc then it needs to be.

Well the issue isn't that it will invalidate more than is necessary, it already only invalidates what it needs to (eg, last6, invaldates the last 6 months, but ALSO the year 2021 and 2020). This ends up triggering archiving of missing child archives, so reArchiveReport can't really stop child archives for those invalidations from being initiated, unless we add an extra column to the table and mark the invalidations in some way.

I'm really thinking that be generally find if it triggers year 2021 and 2020 as by then it would have already triggered the days, weeks, and months that are allowed to be archived. I'll try to show what I mean below otherwise might need to just try and see if it works or go on a call to chat so we're sure we're meaning the same thing.

I'm thinking eg of these flows:

process_new_segments_from=last91. Segment is created on April 30th 2021. It means it is allowed to archive data from January 29th 2021. And we'd create these archive_invalidations:

  • day, 29th Jan 2021
  • day, 30th Jan 2021
  • day, 31st Jan 2021
  • month Jan 2021
  • day, 1st Feb 2021
  • day, 2nd Feb 2021
  • day, 3rd Feb 2021
  • week, 28th Jan-3rd Feb 2021
  • day, 4th Feb 2021
  • ...
  • month Feb 2021
  • ...
  • month, March 2021
  • ...
  • month, April 2021
  • year April 2021

When it then tries to archive any day before 29th Jan 2021 we wouldn't ever archive that period since no entry in archive_invalidations exist and it is before process_new_segments_from=last91.

When we then trigger the archive of day 29th Jan 2021, it would allow this since that entry exist.

If it was to trigger the archive of year 2021 before any of the other months or weeks or days were processed for some reasons, then it would still allow the archiving of these subperiods because they exist in the archive_invalidations table.

If it was to trigger the archive of year 2021 and all the other archive_invalidations were already processed and all the entries don't exist in the table anymore then it would correctly not launch the archiving of these subperiods (since it can't find a matching entry in the table) which be fine because would have been already archived anyway and there is no need to archive them again.

If then later a tracking request comes for 22nd Jan 2021, then we wouldn't invalidate anything for this segment because it is before process_new_segments_from=last91 (Jan 29th 2021).

If then later a tracking request comes for 31st Jan 2021, we would invalidate the following entries:

  • day, 31st Jan 2021
  • month Jan 2021
  • day, 1st Feb 2021
  • day, 2nd Feb 2021
  • day, 3rd Feb 2021
  • week, 28th Jan-3rd Feb 2021
  • month Jan 2021
  • year 2021

and the same flow would work again because it would only process these days.

If a user was to invalidate manually any date, that be fine too and we'd allow to invalidate any date (even before Jan 29th) and the system would correctly archive only the ones specifically that were invalidated as the archive_invalidations would exist.

I might be missing some things but not sure. Hoping my thinking might be bit more clear maybe?

Sorry for the late response.

@diosmosis commented on April 7th 2021 Member

@tsteur haven't read the comment, but just fyi, last91 would be in months not days

@diosmosis commented on April 7th 2021 Member

@tsteur

If a user was to invalidate manually any date, that be fine too and we'd allow to invalidate any date (even before Jan 29th) and the system would correctly archive only the ones specifically that were invalidated as the archive_invalidations would exist.

This is where I think it doesn't work. If the user invalidates before Jan 29th, say they invalidate the day Dec 1, 2020. The day period is added to archive_invalidations, as well as the week, month and year.

When we archive them we would archive the day first. I think in your solution we check for whether there are any child period invalidations in the table before last91? There aren't for the day, so we archive it. Then we remove the entry from archive_invalidations.

Then we try to archive the week. But there are no day periods before last91 for the segment, so it never archives. Instead we assume we don't have to rearchive, and remove the archive_invalidation row. Same thing happens for month and year.

Does this make sense or would this happen a different way?

@tsteur commented on April 7th 2021 Member

@diosmosis process_new_segments_from=last91 is supposed to be days, not months just fyi.

I added a commit that roughly shows what I was thinking. I didn't actually test it and think it would maybe need to be adjusted slightly to get maybe correct date in timezone and the right done flags (might have to test multiple done flags not sure).

When we archive them we would archive the day first. I think in your solution we check for whether there are any child period invalidations in the table before last91? There aren't for the day, so we archive it. Then we remove the entry from archive_invalidations. Then we try to archive the week. But there are no day periods before last91 for the segment, so it never archives. Instead we assume we don't have to rearchive, and remove the archive_invalidation row. Same thing happens for month and year.

We would correctly archive the day first. We wouldn't check if there are any child period invalidations in the table. Only if the same period/date exists in the invalidations table then we'd allow the archiving. We would only archive the days and weeks and months that were requested and wouldn't archive any other ones. Maybe check the PR to see what I'm thinking.

@diosmosis commented on April 8th 2021 Member

process_new_segments_from=last91 is supposed to be days, not months just fyi.

You're right, I was thinking of the rearchive lastN INI config.

Only if the same period/date exists in the invalidations table then we'd allow the archiving. We would only archive the days and weeks and months that were requested and wouldn't archive any other ones. Maybe check the PR to see what I'm thinking.

Ok, I see, so if year is invalidated, but not a month in the year, we can tell the month shouldn't be invalidated, at the time of invalidating the month (rather than checking whether we should launch archiving for child periods when archiving the year). It makes sense to me now.

@tsteur commented on April 13th 2021 Member

@flamisz be great to continue with this PR. I started adding some code for it but will probably need to explain what needs to be done here as a next step. Basically, if someone was to invalidate reports for a segment for dates where it would usually not be allowed, then we'd still want to archive these reports. See https://matomo.org/faq/how-to/faq_155/ re how to invalidate.

Step 1

Basically, let's assume you create a segment on April 13th 2021 and you have set to only allow archiving from segment creation time. Then any date before April 13th 2021 we would usually not archive. However, if someone was to call something like this command below then we would want to archive the data after all:

./console core:invalidate-report-data --dates=2021-01-01,2021-03-15 --sites=1 --segment=3

This command would invalidate segment ID 3 for site 1 for the mentioned dates and should create entries in the archive_invalidations table. If in https://github.com/matomo-org/matomo/pull/17351/files#diff-3ad7e38a2ff2d49f9ac9f268bf0ce70db9b1a924be38e5802816acc0ae343d32R388-R409 we then find an entry for this segment in archive_invalidations table then we would allow the archiving of that period.

Step 2

We'd also need to adjust https://github.com/matomo-org/matomo/blob/4.x-dev/core/CronArchive.php#L897 and only invalidate a segment automatically if we're allowed to based on the process_new_segments_from setting. Currently, AFAIK this code path is triggered when you track for example a tracking request for 10 days in the past. Say for April 3rd 2021. But archiving the specific segment is only allowed from April 13th 2021. Then when running ./console core:archive currently we would be invalidating the reports for this segment (even though it is before April 13th 2021) and we would then be archiving this report. This should however not be the case so we only want to invalidate reports there when allowed. Something like the code below (which I didn't test):

diff --git a/core/CronArchive.php b/core/CronArchive.php
index 0475067ce9..2430fbcc8a 100644
--- a/core/CronArchive.php
+++ b/core/CronArchive.php
@@ -890,6 +890,17 @@ class CronArchive
                 if ($this->isThereExistingValidPeriod($params)) {
                     $this->logger->debug('  Found usable archive for {archive}, skipping invalidation.', ['archive' => $params]);
                 } else {
+                    $segmentArchiving = StaticContainer::get(SegmentArchiving::class);
+                    $segmentInfo = $segmentArchiving->findSegmentForHash($segmentDefinition, $idSite);
+                    $periodEnd = $params->getPeriod()->getDateEnd();
+                    if ($segmentInfo) {
+                        $segmentArchiveStartDate = $segmentArchiving->getReArchiveSegmentStartDate($segmentInfo);
+
+                        if ($segmentArchiveStartDate->isLater($periodEnd)) {
+                            // the system is not allowed to invalidate reports for this period automatically, only a user can specifically invalidate 
+                            continue;
+                        }
+                    }
                     $this->getApiToInvalidateArchivedReport()->invalidateArchivedReports($idSite, $date, $period, urlencode($segmentDefinition),
                         $cascadeDown = false, $_forceInvalidateNonexistant);
                 }
@flamisz commented on April 15th 2021 Contributor

@tsteur I added your code to the cronarchive (with a little modification) and tested it. It looks fine.

Now we should write tests. Any thoughts on that? Or I can check what we already have and adjust them/add more where needed.

@tsteur commented on April 15th 2021 Member

Regarding test not 100% sure. Thought initially to maybe make invalidateWithSegments public and then test it but it may be better to test the use case so even if implementation changes we make sure that things work and also that it actually works what we want to achieve.

Basically, a system test could look like this:

  • set these settings
    • process_new_segments_from="segment_creation_time"
    • browser_archiving_disabled_enforce=1
    • and disable browser archiving (eg enable_browser_archiving_triggering=0 and enable_general_settings_admin=0)
  • Create site and segment (say April 16th 2021)
  • Track a request for a date say 10 days and 1 day before segment creation time, on creation time and 1 day after creation time. The requests would need to match the segment so eg the segment could be like pageUrl contains "e" and then track a request where the page url always contains e
  • Trigger a cron archive (there's some way in the tests to do this, or some other tests do it don't remember how exactly, maybe @diosmosis knows)
  • Then request a report like VisitsSummary.get API with a day period and pass a range like 2021-04-05,2021-04-26.

I think the test should then only show data for reports since creation date for the created segment but not any before that.

This be a rough idea and should hopefully work. not sure if I missed something.

@flamisz commented on April 15th 2021 Contributor

Currently we have failing tests and as @tsteur told me, if during the test we use the default process_new_segments_from = "beginning_of_time" config, we should get the same result.

I checked and we indeed use the default setting during the failing test.

So I investigated, why the test could fail.
It's failing because of the modifications in the CronArchive.php:

if ($segmentInfo) {
    $segmentArchiveStartDate = $segmentArchiving->getReArchiveSegmentStartDate($segmentInfo);

    if ($segmentArchiveStartDate !== null && $segmentArchiveStartDate->isLater($periodEnd)) {
        // the system is not allowed to invalidate reports for this period
        // automatically, only a user can specifically invalidate
        continue;
    }
}

It can happen that the $segmentArchiveStartDate->isLater($peridEnd) is true.

So I checked how we calculate that in the getReArchiveSegmentStartDate() method:

https://github.com/matomo-org/matomo/blob/6e3f165ab16f49969b7393ca752aad51e180bf0d/core/CronArchive/SegmentArchiving.php#L144-L166

The logic there is not "compatible" with the current logic we try to achieve, because we don't go back to the site creation date. We have this hardcoded 7 years DEFAULT_BEGINNING_OF_TIME_LAST_N_YEARS and every time we pick the higher date number. First, we go back 7 years, then check if this is earlier than the site creation, if not, we don't go back in time to the site creation date.

Hard to explain properly, but hopefully, you can understand it from the code.

I just shared it, because you probably understand more, why we have this logic here. Probably it's working and worked as it expected, so I have to modify the logic in the cronarchive.

Any thoughts on this @tsteur and @diosmosis?
👨‍💻

@flamisz commented on April 15th 2021 Contributor

Does that hardcoded 7 mean, we don't want to process segments back to more than 7 years? Because if that is the case, the current logic would do that, so the test result has to be changed.

@tsteur commented on April 18th 2021 Member

@flamisz @diosmosis is there much left to do? @flamisz not sure if your question has been answered yet? Regarding the 7 years yes I think it means that we don't archive data before 7 years. We'd like to have this in the next release ideally.

@flamisz commented on April 18th 2021 Contributor

My question not answered yet. Previously the CronArchive did not respect the the getReArchiveSegmentStartDate(), and in our tests we have very old dates, so the tests are not correct now.
If we want to respect the 7 years, we either modify the dates in the tests or modify the expected results.

Of course modifying the date will cause issue in the future, or we can use relative dates in these tests.

Either way it's not finished yet. Which one is the preferred solution?

@diosmosis commented on April 18th 2021 Member

@flamisz I'm not 100% sure what the issue is, but the last N years number can be overridden w/ the --force-date-last-n parameter. We use it in ArchiveCronTest::runArchivePhpCron() for example. Does that help?

@flamisz commented on April 18th 2021 Contributor

I'm not 100% sure what the issue is, but the last N years number can be overridden w/ the --force-date-last-n parameter. We use it in ArchiveCronTest::runArchivePhpCron() for example. Does that help?

@diosmosis thanks, I checked that, and we indeed set to 10 during the tests, but that number doesn't get into the constructor of the SegmentArchiving class.

https://github.com/matomo-org/matomo/blob/3969bb967fb6c82196394bbf0320a5c8a4f71ff7/core/CronArchive/SegmentArchiving.php#L70-L81

We create that class during the fixture setup. So maybe this is a problem? That number should be the same we setup with that parameter? What we setup with that parameter is the $dateLastForced here:

https://github.com/matomo-org/matomo/blob/3969bb967fb6c82196394bbf0320a5c8a4f71ff7/core/CronArchive.php#L147

If I modify that constant by hand before the test, it doesn't fail.
During the test I can see the $dateLastForced is what we setup with that parameter, but in the SegmentArchiving is still the 7.

@flamisz commented on April 18th 2021 Contributor

OK, I see what can be the problem, get back to you soon.

@diosmosis commented on April 19th 2021 Member
@diosmosis commented on April 19th 2021 Member

@flamisz actually I see another issue now, segment invalidation is done now in SegmentEditor API, so the param from the CLI can't be transferred there. It should probably be deprecated. We could use a new INI config to override that or just a DI property (though that's actually currently possible, so wouldn't need much of a change)

cc @tsteur

@tsteur commented on April 19th 2021 Member

Could use an ini setting and deprecate it indeed. That's unrelated though to this PR? Generally would there be otherwise any harm to simply set it to say last 10 years and not have it configurable? Actually, if people want the last 10 years or so they could also just set the setting to last3600 or so. I think it might not need a new setting and the default of 7 years is quite good.

@flamisz commented on April 22nd 2021 Contributor

Hi @diosmosis, can you please have the last look at this PR? Tests are looking good, I check those failing UI tests.

EDIT: those UI tests are solved with the latest commit.

@flamisz commented on April 27th 2021 Contributor

My latest commit contains the code is bringing into account requested report.
I wrote a couple of tests, but I couldn't test in my environment, because I don't know how to use this feature properly.
Can you please check the code @diosmosis when you are back? And maybe a couple of words about how can I test it in my environment? Thanks

@tsteur commented on May 3rd 2021 Member

@diosmosis this is the last PR we need to get merged for 4.3 release. When you have time be great to have another look

@diosmosis commented on May 5th 2021 Member

@flamisz @tsteur left two new comments

@flamisz commented on May 5th 2021 Contributor

@flamisz @tsteur left two new comments

Hi @diosmosis, I can see only one, I updated the pr based on that recommendation

@diosmosis commented on May 5th 2021 Member
@flamisz commented on May 5th 2021 Contributor

@flamisz this is the other one: #17351 (comment)

thanks @diosmosis I see. Yeah I didn't address that, totally forget. I look into it today.

@diosmosis commented on May 7th 2021 Member

@flamisz some tests are failing now, looks like a quick fix, can you make an update? I'll merge afterwards.

@flamisz commented on May 7th 2021 Contributor

@diosmosis it's been fixed now

This Pull Request was closed on May 7th 2021
Powered by GitHub Issue Mirror