Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make archiving process to respect process_new_segments_from settings #17351

Merged
merged 17 commits into from May 7, 2021

Conversation

flamisz
Copy link
Contributor

@flamisz flamisz commented Mar 16, 2021

Description:

fixes #17336

Review

  • Functional review done
  • Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems)
  • Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • Security review done see checklist
  • Code review done
  • Tests were added if useful/possible
  • Reviewed for breaking changes
  • Developer changelog updated if needed
  • Documentation added if needed
  • Existing documentation updated if needed

@flamisz
Copy link
Contributor Author

flamisz commented Mar 16, 2021

Hi @diosmosis, @tsteur and I had a quick chat this morning, and this is what figured in the end.

I tested locally and I see it does return true when it's expected, but I don't know all the cases/edge cases to see if it really solves the bug you mentioned in your comment.

Can you please check it?

Thanks

@flamisz flamisz added this to the 4.3.0 milestone Mar 16, 2021
@diosmosis
Copy link
Member

@flamisz I see two possible issues w/ this solution (though it's on the right track):

  1. the process_new_segments_from value can be set to something like lastN or last update time (rather than segment creation time), which means the earliest value to archive from when rearchiving historical data can change. So we might rearchive 6 months in the past today (October), then next month (April) the user needs to invalidate and re-archive October, but that will be outside the range allowed by this code, and nothing will happen.
  2. it's possible for users to want old data processed for the segment that is before the date that is computed from process_new_segments_from. this solution would not allow users to do that at all.

@tsteur there might need to be some discussion on a good solution as it seems like it may be complicated to solve this.

@flamisz
Copy link
Contributor Author

flamisz commented Mar 18, 2021

It looks like we need more discussions on this issue. I pause working on it until you'll have more time to dig deeper into it.

@flamisz flamisz changed the title update canSkipThisArchive in Loader Make archiving process to respect process_new_segments_from settings Mar 18, 2021
@tsteur
Copy link
Member

tsteur commented Mar 18, 2021

@diosmosis great point. This makes me think it can't have worked in Matomo 3 either and we might need to lower the priority for this. Out of the box do you otherwise have any quick idea of how to fix this without the need of a big refactoring? Like ideally we'd know whether it was maybe forced to archive this segment for a range before segment creation but we can't really know (although maybe we could check the archive_invalidation table assuming an entry would exist there for the day etc in that case).

@diosmosis
Copy link
Member

@tsteur

Like ideally we'd know whether it was maybe forced to archive this segment for a range before segment creation but we can't really know (although maybe we could check the archive_invalidation table assuming an entry would exist there for the day etc in that case).

The only way I can think of is to have a new column on archive_invalidations like is_rearchiving. Then in applyScheduledRearchiving() and related methods, we save that column value as 1. If we see that in core:archive, we can attach a parameter to archiving requests isReArchiving=1, then we can detect it in an archiving request and handle every rearchiving case (or we can alternatively just do it for segments, that would be a bit more work though).

I can't think of a way to do it w/o a new column. I guess we could also just stick it on the name column value, but that would make the code a bit weirder imo.

@tsteur
Copy link
Member

tsteur commented Mar 19, 2021

thanks @diosmosis just double checking so something like this wouldn't work?

diff --git a/core/ArchiveProcessor/Loader.php b/core/ArchiveProcessor/Loader.php
index 74adf31215..dc2e8918a7 100644
--- a/core/ArchiveProcessor/Loader.php
+++ b/core/ArchiveProcessor/Loader.php
@@ -384,6 +384,11 @@ class Loader
             if ($segmentInfo) {
                 $segmentArchiveStartDate = $segmentArchiving->getReArchiveSegmentStartDate($segmentInfo);
                 if ($segmentArchiveStartDate->isLater($periodEnd)) {
+
+                    $invalidator = StaticContainer::get(ArchiveInvalidator::class);
+                    // hasSegmentArchiveInvalid doesn't exist yet but would check if it's currently exists in archive_invalidation table since we would have an entry there when it's actually forced
+                    if (!$invalidator->hasSegmentArchiveInvalid($idSite, $segmentHash, $plugin, $date1, $date2)) {
+                        return true;
+                    }
                 }
             }

Assuming that when someone invalidates eg a week, we also add the entries for each day to archive_invalidations? Not sure if that's the case though.

@diosmosis
Copy link
Member

@tsteur I'm not sure that approach will work (if I understand it correctly). When checking a year, for example, the archiver should have already handled months, weeks days in the year, so they would have been removed from the invalidations table.

@tsteur
Copy link
Member

tsteur commented Mar 19, 2021

Which be good that they are removed meaning it wouldn't launch archiving anyway in that case? as the days etc are already archived and it would only aggregate the existing data?

@diosmosis
Copy link
Member

@tsteur thinking out loud:

so there are three cases, one where we archive correctly where date range > than process_new_segments_from's date:

  • archiver has day, week, month invalidations and processes them
  • the day, week, month invalidations get removed
  • the year archive starts, tries to get month archives
  • all months are > date range, so we don't skip
  • all months are already archived so we use them

this case works.

next case, we archive correctly where some of date range < process_new_segments_from's date:

  • archiver has day, week, month invalidations and processes them
  • some day/week/month invalidations are before process_new_segments_from date, they get skipped
  • year archive is initialized, is not entirely before process_new_segments_from date, gets archived
  • months inside get skipped or aggregated because they are either before the date or already archived

This is what we want.

third case, user invalidates date range before process_new_segments_from's date, because they want that data rearchived (not automated use case):

  • archiver has day, week, month invalidations and processes them
  • day/week/month invalidations are before process_new_segments_from date, they get skipped
  • year archive is initialized, is entirely before process_new_segments_from date, does not get archived
  • months inside get skipped or aggregated because they are either before the date or already archived
  • full set of dates is not archived because it is before process_new_segments_from date

This is not what we want, because the user specifically invalidated this date range, it wasn't automated. Checking the invalidation table isn't a way to differentiate between user initiated invalidations and scheduled rearchiving.

there are probably some edge cases when archives fail for some reason, that might be an entirely different issue.

@tsteur
Copy link
Member

tsteur commented Mar 22, 2021

third case, user invalidates date range before process_new_segments_from's date, because they want that data rearchived (not automated use case):

I suppose you mean an actual custom date range like 2020-02-02,2020-02-04? When a date range is being invalidated, do we put the individual days and weeks etc into the invalidations table?

Checking the invalidation table isn't a way to differentiate between user initiated invalidations and scheduled rearchiving.

Because of the setting there should be no invalidations from scheduled rearchiving so I thought it is maybe not needed to differentiate between user initiated? Basically, my thoughts (and hope) was that we would have entries in invalidations table before the segment creation time only when it was triggered by a user and then we could force the archiving maybe?

@diosmosis
Copy link
Member

diosmosis commented Mar 22, 2021

@tsteur

I suppose you mean an actual custom date range like 2020-02-02,2020-02-04? When a date range is being invalidated, do we put the individual days and weeks etc into the invalidations table?

No, sorry I didn't mean that. I just meant something invalidated by the user and not rearchived.

Basically, my thoughts (and hope) was that we would have entries in invalidations table before the segment creation time only when it was triggered by a user and then we could force the archiving maybe?

These invalidations are added right before archiving for a site, so they will always be there before archiving (and are removed after archiving). Sorry, I'm not entirely sure what you're getting at. The day archives will get removed after the day is archived, then the week will start archiving, but this is true of rearchiving AND of user initiated invalidation. There's no difference at the moment, which sort of the problem. It would be fairly trivial to signify it though w/ an extra db column on the invalidation table. Or we could hack something into the name column.

@mattab
Copy link
Member

mattab commented Mar 24, 2021

Quickly wanted to check in on how this PR is progressing and what would be the next step(s)? (sorry haven't read the whole discussion 🙈 )

(asking because this PR is the first "stage" of hoping to fix https://innocraft.atlassian.net/jira/software/projects/L3/boards/20?selectedIssue=L3-46 which is not currently a major issue it seems (as customer hasn't complained) but is nevertheless quite important issue (fyi they are an enterprise customer). also it generates many emails).
Thanks,

@tsteur
Copy link
Member

tsteur commented Mar 25, 2021

The day archives will get removed after the day is archived, then the week will start archiving, but this is true of rearchiving AND of user initiated invalidation.

@diosmosis can you maybe explain what you mean by this is true of rearchiving? I'm not quite sure what you mean with rearchiving here?

@mattab this one is quite complex and it's not a regression AFAIK and it'll take a while longer. Maybe there's another workaround meanwhile

@diosmosis
Copy link
Member

@tsteur can you maybe explain what you mean by this is true of rearchiving? I'm not quite sure what you mean with rearchiving here?

I mean there's no difference between "user invalidates archive, it gets processed by core:archive" and "system schedules rearchiving, system invalidates archives, they get processed by core:archive". There's no way to tell those invalidations apart. BUT in this issue, we want to apply the archive limiting ONLY to rearchiving segments. If a user invalidates in the past before process_new_segments_from, we want the data to be rearchived.

So we need some way to tell the invalidations apart, whether they were created via user invalidation or via system invalidation.

If a user tracks data in the past before process_new_segments_from, we also want that to archive as well.

@mattab
Copy link
Member

mattab commented Mar 25, 2021

@tsteur afaik the other workaround might be to implement rankingQuery in custom dimensions archiving (what @diosmosis had suggested in the L3)

@diosmosis
Copy link
Member

@mattab I can work on that as part of the L3

@tsteur
Copy link
Member

tsteur commented Mar 26, 2021

If a user tracks data in the past before process_new_segments_from, we also want that to archive as well.

FYI for segments we actually wouldn't want to archive it if a user tracks data for a period that is eg before segment creation date. We'd only want to archive it if user specifically invalidates it. If we could have a check to not create the invalidations for such a segment when data is tracked in the past, then it would probably work maybe?

@diosmosis
Copy link
Member

diosmosis commented Mar 26, 2021

@tsteur does that apply to all possible values of process_new_segments_from or just creation date? ie, if it is set to last6, then two months pass, and the user tracks data from 7 months ago (which was after when the segment was created), we archive or not?

And since we can sometimes archive in the past when creating a segment (ie, if set to last6, and we create it in Nov, then we rearchive from May - Nov), should we still allow tracking data in May and rearchiving?

And would it not apply if process_new_segments_from is set to beginning_of_time?\

We'd only want to archive it if user specifically invalidates it.

Even with this, we'd still need to differentiate between user created invalidation and system created invalidation.

@tsteur
Copy link
Member

tsteur commented Mar 29, 2021

@tsteur does that apply to all possible values of process_new_segments_from or just creation date? ie, if it is set to last6, then two months pass, and the user tracks data from 7 months ago (which was after when the segment was created), we archive or not?

We would archive the last 6 months before creation date (according to the documentation in the global ini config). But we wouldn't archive the reports 7 months before creation date if someone was to track data into this.

And since we can sometimes archive in the past when creating a segment (ie, if set to last6, and we create it in Nov, then we rearchive from May - Nov), should we still allow tracking data in May and rearchiving?

We would still allow tracking because it's required for All Visits but we wouldn't rearchive the data for this segment if it is before the 6 months. We would only do it if someone was to manually invalidate the data maybe to trigger the archiving this way.

@diosmosis
Copy link
Member

@tsteur

We would only do it if someone was to manually invalidate the data maybe to trigger the archiving this way.

This is the problem I'm trying to describe: there is no way currently to tell between an invalidation that was manually invalidated and an invalidation that was created through reArchiveReport.

@tsteur
Copy link
Member

tsteur commented Mar 29, 2021

I guess my thinking is that reArchiveReport would automatically always correctly respect the configured process_new_segments_from setting so we wouldn't need to think about it. Eg when you create or edit a segment, it should never invalidate an archive before the configured date. Eg if you create a segment today, and configure last10, then it would only invalidate the last 10 days but nothing before that date (day 11 and previous days). If we can trust that this works properly I'm thinking it should be no problem unless scheduleReArchiving is called by the system without respecting the configured date. Not sure if the system would create them somewhere and not respect the configuration?

@diosmosis
Copy link
Member

@tsteur

guess my thinking is that reArchiveReport would automatically always correctly respect the configured process_new_segments_from setting so we wouldn't need to think about it.

Well the issue isn't that it will invalidate more than is necessary, it already only invalidates what it needs to (eg, last6, invaldates the last 6 months, but ALSO the year 2021 and 2020). This ends up triggering archiving of missing child archives, so reArchiveReport can't really stop child archives for those invalidations from being initiated, unless we add an extra column to the table and mark the invalidations in some way.

@diosmosis
Copy link
Member

diosmosis commented Mar 29, 2021

@tsteur another option if not adding a column is having new statuses, instead of just in progress/queued, maybe in progress/queued/queued rearchive. Then when we encounter queued rearchive, we can send a query param telling the core archiver not to archive child archives if not already present (or not to archive anything outside the segment date limits). This would require changing queries that check for status.

@tsteur
Copy link
Member

tsteur commented Apr 7, 2021

Technically we could add a new column but trying to avoid making anything possibly more complicated etc then it needs to be.

Well the issue isn't that it will invalidate more than is necessary, it already only invalidates what it needs to (eg, last6, invaldates the last 6 months, but ALSO the year 2021 and 2020). This ends up triggering archiving of missing child archives, so reArchiveReport can't really stop child archives for those invalidations from being initiated, unless we add an extra column to the table and mark the invalidations in some way.

I'm really thinking that be generally find if it triggers year 2021 and 2020 as by then it would have already triggered the days, weeks, and months that are allowed to be archived. I'll try to show what I mean below otherwise might need to just try and see if it works or go on a call to chat so we're sure we're meaning the same thing.

I'm thinking eg of these flows:

process_new_segments_from=last91. Segment is created on April 30th 2021. It means it is allowed to archive data from January 29th 2021. And we'd create these archive_invalidations:

  • day, 29th Jan 2021
  • day, 30th Jan 2021
  • day, 31st Jan 2021
  • month Jan 2021
  • day, 1st Feb 2021
  • day, 2nd Feb 2021
  • day, 3rd Feb 2021
  • week, 28th Jan-3rd Feb 2021
  • day, 4th Feb 2021
  • ...
  • month Feb 2021
  • ...
  • month, March 2021
  • ...
  • month, April 2021
  • year April 2021

When it then tries to archive any day before 29th Jan 2021 we wouldn't ever archive that period since no entry in archive_invalidations exist and it is before process_new_segments_from=last91.

When we then trigger the archive of day 29th Jan 2021, it would allow this since that entry exist.

If it was to trigger the archive of year 2021 before any of the other months or weeks or days were processed for some reasons, then it would still allow the archiving of these subperiods because they exist in the archive_invalidations table.

If it was to trigger the archive of year 2021 and all the other archive_invalidations were already processed and all the entries don't exist in the table anymore then it would correctly not launch the archiving of these subperiods (since it can't find a matching entry in the table) which be fine because would have been already archived anyway and there is no need to archive them again.

If then later a tracking request comes for 22nd Jan 2021, then we wouldn't invalidate anything for this segment because it is before process_new_segments_from=last91 (Jan 29th 2021).

If then later a tracking request comes for 31st Jan 2021, we would invalidate the following entries:

  • day, 31st Jan 2021
  • month Jan 2021
  • day, 1st Feb 2021
  • day, 2nd Feb 2021
  • day, 3rd Feb 2021
  • week, 28th Jan-3rd Feb 2021
  • month Jan 2021
  • year 2021

and the same flow would work again because it would only process these days.

If a user was to invalidate manually any date, that be fine too and we'd allow to invalidate any date (even before Jan 29th) and the system would correctly archive only the ones specifically that were invalidated as the archive_invalidations would exist.

I might be missing some things but not sure. Hoping my thinking might be bit more clear maybe?

Sorry for the late response.

@diosmosis
Copy link
Member

@tsteur haven't read the comment, but just fyi, last91 would be in months not days

@diosmosis
Copy link
Member

@tsteur

If a user was to invalidate manually any date, that be fine too and we'd allow to invalidate any date (even before Jan 29th) and the system would correctly archive only the ones specifically that were invalidated as the archive_invalidations would exist.

This is where I think it doesn't work. If the user invalidates before Jan 29th, say they invalidate the day Dec 1, 2020. The day period is added to archive_invalidations, as well as the week, month and year.

When we archive them we would archive the day first. I think in your solution we check for whether there are any child period invalidations in the table before last91? There aren't for the day, so we archive it. Then we remove the entry from archive_invalidations.

Then we try to archive the week. But there are no day periods before last91 for the segment, so it never archives. Instead we assume we don't have to rearchive, and remove the archive_invalidation row. Same thing happens for month and year.

Does this make sense or would this happen a different way?

@tsteur
Copy link
Member

tsteur commented Apr 7, 2021

@diosmosis process_new_segments_from=last91 is supposed to be days, not months just fyi.

I added a commit that roughly shows what I was thinking. I didn't actually test it and think it would maybe need to be adjusted slightly to get maybe correct date in timezone and the right done flags (might have to test multiple done flags not sure).

When we archive them we would archive the day first. I think in your solution we check for whether there are any child period invalidations in the table before last91? There aren't for the day, so we archive it. Then we remove the entry from archive_invalidations. Then we try to archive the week. But there are no day periods before last91 for the segment, so it never archives. Instead we assume we don't have to rearchive, and remove the archive_invalidation row. Same thing happens for month and year.

We would correctly archive the day first. We wouldn't check if there are any child period invalidations in the table. Only if the same period/date exists in the invalidations table then we'd allow the archiving. We would only archive the days and weeks and months that were requested and wouldn't archive any other ones. Maybe check the PR to see what I'm thinking.

@diosmosis
Copy link
Member

process_new_segments_from=last91 is supposed to be days, not months just fyi.

You're right, I was thinking of the rearchive lastN INI config.

Only if the same period/date exists in the invalidations table then we'd allow the archiving. We would only archive the days and weeks and months that were requested and wouldn't archive any other ones. Maybe check the PR to see what I'm thinking.

Ok, I see, so if year is invalidated, but not a month in the year, we can tell the month shouldn't be invalidated, at the time of invalidating the month (rather than checking whether we should launch archiving for child periods when archiving the year). It makes sense to me now.

@flamisz flamisz added the Needs Review PRs that need a code review label Apr 20, 2021
@flamisz
Copy link
Contributor Author

flamisz commented Apr 22, 2021

Hi @diosmosis, can you please have the last look at this PR? Tests are looking good, I check those failing UI tests.

EDIT: those UI tests are solved with the latest commit.

Copy link
Member

@diosmosis diosmosis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

core/ArchiveProcessor/Loader.php Show resolved Hide resolved
core/ArchiveProcessor/Loader.php Outdated Show resolved Hide resolved
);

$table = Common::prefixTable('archive_invalidations');
$sql = "SELECT idinvalidation FROM `$table` WHERE idsite = ? AND date1 = ? AND date2 = ? AND `period` = ? AND `name` = ?";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query could have a LIMIT 1 in case there are invalidations w/ the same params, but different $reports.

EDIT: actually, if we get a request to do a whole archive, but there is only an invalidation w/ a report value set, we should probably still skip. It might be needed to take into account any requested report (cc @tsteur).

Copy link
Contributor Author

@flamisz flamisz Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we turn off the browser triggered archiving, when we load a report it will archive for that report, only for that one, am I right? Could it be happen somehow this gets into the invalidation table? Because if it can, we don't want to skip in this case, do we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we turn off the browser triggered archiving, when we load a report it will archive for that report, only for that one, am I right?

I'm not sure what you mean. The 'archive only specific report' functionality is only triggered through core:archive and if there is a 'report' value set in archive_invalidations. This only happens if we reArchiveReport for a specific report. It never gets triggered via browser triggered archiving.

The issue I'm thinking of is if there is a request the archive, eg, period = day, date = 2020-01-02, idSite = 2, segment = blahblahblah, plugin = Plugin and the only item in the invalidation table is:

idsite = 2, period = day, date1 = 2020-01-02, date2 = 2020-01-02, name = done . md5(blahblahblah).Plugin, report = specificreport

In this case if everything else about the period/segment is true (it's outside of the process_new_segment_from range), we don't want to archive, because it would archive more then the specific report (it would do the whole plugin).

If we receive another request where report = specificreport as well, the we do want to archive because it's in the table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was confused. I thought about segment archiving in browser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't modified the reports part but added the limit 1.

I'm going to add the specific report case and see how it goes with tests.

core/CronArchive.php Show resolved Hide resolved
// automatically, only a user can specifically invalidate
continue;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code (the function calls and ifs) looks very similar to the above code. Can it be moved to a method to reduce duplication? Maybe a method in the SegmentArchiving class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flamisz this wasn't addressed, did you discuss it w/ @tsteur or something?

Copy link
Contributor Author

@flamisz flamisz May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any useful way to introduce a new function, but refactored both files, mostly get rid off temporary variables.

Copy link
Member

@diosmosis diosmosis May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flamisz I was thinking something like:

SegmentArchiving.php

public function getSegmentReArchiveDateIfPastEndOfArchivingPeriod($params)
{
    $segmentInfo = $this->findSegmentForHash($params->getSegment()->getHash(), $params->getSite()->getId());

        if (!$segmentInfo) {
            return false;
        }

        $segmentArchiveStartDate = $this->getReArchiveSegmentStartDate($segmentInfo);

    if ($segmentArchiveStartDate !==null && $segmentArchiveStartDate->isLater($params->getPeriod()->getDateEnd()->getEndOfDay())) {
        return $segmentArchiveDate;
    }

    return false;
}

then in Loader.php:

    public function canSkipArchiveForSegment()
    {
        /** @var SegmentArchiving */
        $segmentArchiving = StaticContainer::get(SegmentArchiving::class);
        $segmentReArchiveDate = $segmentArchiving->getSegmentReArchiveDateIfPastEndOfArchivingPeriod($this->params);

        if (!empty($segmentReArchiveDate)) {
            $doneFlag = Rules::getDoneStringFlagFor(
                [$params->getSite()->getId()],
                $params->getSegment(),
                $params->getPeriod()->getLabel(),
                $params->getRequestedPlugin()
            );

            // if there is no invalidation where the report is null, we can skip
            // if we have invalidations for the period and name, but only for a specific reports, we can skip
            // if the report is not null we only want to rearchive if we have invalidation for that report
            // if we don't find invalidation for that report, we can skip
            return !$this->dataAccessModel->hasInvalidationForPeriodAndName($idSite, $params->getPeriod(), $doneFlag, $params->getArchiveOnlyReport());
        }

        return false;
    }

and above in CronArchive.php:

$segmentReArchiveDate = $segmentArchiving->getSegmentReArchiveDateIfPastEndOfArchivingPeriod($this->params);
if (!empty($segmentReArchiveDate)) {
    // the system is not allowed to invalidate reports for this period
    // automatically, only a user can specifically invalidate
    continue;
}

Maybe this doesn't work or it's not a good idea for another reason, I'll just take your word for it since it's not a huge deal.

@flamisz flamisz removed the Needs Review PRs that need a code review label Apr 23, 2021
@flamisz
Copy link
Contributor Author

flamisz commented Apr 27, 2021

My latest commit contains the code is bringing into account requested report.
I wrote a couple of tests, but I couldn't test in my environment, because I don't know how to use this feature properly.
Can you please check the code @diosmosis when you are back? And maybe a couple of words about how can I test it in my environment? Thanks

@flamisz flamisz added the Needs Review PRs that need a code review label Apr 27, 2021
@tsteur
Copy link
Member

tsteur commented May 3, 2021

@diosmosis this is the last PR we need to get merged for 4.3 release. When you have time be great to have another look

// if we don't find invalidation for that report, we can skip
if ($params->getArchiveOnlyReport() && !$this->dataAccessModel->hasInvalidationForPeriodAndName($idSite, $params->getPeriod(), $doneFlag, $params->getArchiveOnlyReport())) {
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the two ifs above be simplified to just:

return !$this->dataAccessModel->hasInvalidationForPeriodAndName($idSite, $params->getPeriod(), $doneFlag, $params->getArchiveOnlyReport());

?

Since archive only report will be empty when it's not set, we can just pass it as is, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. To be honest, I'm not sure, can't remember exactly, but probably you are right. Needs to be tested to make sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought and walked through it, it should work the way you recommended, I updated the PR.

@diosmosis
Copy link
Member

@flamisz @tsteur left two new comments

@flamisz
Copy link
Contributor Author

flamisz commented May 5, 2021

@flamisz @tsteur left two new comments

Hi @diosmosis, I can see only one, I updated the pr based on that recommendation

@diosmosis
Copy link
Member

@flamisz this is the other one: #17351 (comment)

@flamisz
Copy link
Contributor Author

flamisz commented May 5, 2021

@flamisz this is the other one: #17351 (comment)

thanks @diosmosis I see. Yeah I didn't address that, totally forget. I look into it today.

@diosmosis
Copy link
Member

@flamisz some tests are failing now, looks like a quick fix, can you make an update? I'll merge afterwards.

@flamisz
Copy link
Contributor Author

flamisz commented May 7, 2021

@diosmosis it's been fixed now

@diosmosis diosmosis merged commit 1fcc105 into 4.x-dev May 7, 2021
@diosmosis diosmosis deleted the 17336-segment-archiving branch May 7, 2021 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting process_new_segments_from not respected when archiving bigger periods
4 participants