@tsteur opened this Issue on February 10th 2022 Member

refs DEV-2418 for more details.

When archiving a week, then in https://github.com/matomo-org/matomo/blob/4.7.1/core/Archive.php#L629-L643 we archive up to 2 days into the future. That means if today is 2022-02-09, then when archiving this week we also archive 2022-02-10 and 2022-02-11. I noticed this in the archive tables:


In above screenshots we can see that the archive was created eg on 2nd Feb for a date in the future.

That means we create archives for in the future and of course these archives will have 0 data.

I assume that these archives in the future are also created for regular archives, not only dependent archives. However, I believe it works for these regular archives because they are invalidated again regularly and therefore the data is re-archived one or two days later. The logic for invalidating archives currently doesn't know which dependent archives exist with what segments. Meaning I'm assuming these get never invalidated.

I'm meaning calls like https://github.com/matomo-org/matomo/blob/4.7.1/plugins/Goals/Archiver.php#L124-L127
$this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::NEW_VISITOR_SEGMENT);. These calls also happen in non-core plugins specifically MediaAnalytics for example. As a result, the media map always shows 0 data unless if for some reason there is rarely already some data there.

Not sure if we still need to allow archiving any date in the future and why this is there in https://github.com/matomo-org/matomo/blob/4.7.1/core/Archive.php#L629-L643 . This code goes back 9 years. There are likely few ways to fix this:

The goals of this issue is that Media Analytics Map works again correctly (see DEV-2418 for all the details). We can see this eg on demo

  1. Maybe we don't need to archive this data in the future then the issue be fixed.
  2. Or maybe we need to have some event in something like https://github.com/matomo-org/matomo/blob/4.7.1/core/DataAccess/Model.php#L119 to also trigger archives for segments that are used by processDependentArchive.

I've also created a related issue: "invalidation for dependent archives doesn't work" in #18772 . Depending how this issue is fixed, the other issue will be fixed too. I separated them because this one might be easier to fix by changing to not archive data in the future anymore.

@heurteph-ei commented on March 7th 2022

We just encounter some similar issue (please confirm the similarity, or maybe I have to create another ticket?):

We use some segment based on visit dimension data.
On Friday, March the 4th 2022:

  • We had data (number of visits, of page views, of hits) for whole January 2022.
  • I don't know why, but we did not have data for week 2022-01-31..22-02-06
  • We had data for other weeks of February (2022-02-07..2022-02-13 - 2022-02-14..2022-02-20 - 2022-02-21..2022-02-27)
  • We had data for the last day of February ((2022-02-28) and for other single days of the first week of March.

First problem: the data (displayed in custom reports) of the February month is equivalent to sum of week 2022-02-21..2022-02-27 and single day 2022-02-28...

Then I decided to invalidate the February month and also the last week of January (2022-01-31..22-02-06):

  1. /?module=API&method=CoreAdminHome.invalidateArchivedReports&idSites=6&dates=2022-02-01&cascadeDown=1&period=month
  2. /?module=API&method=CoreAdminHome.invalidateArchivedReports&idSites=6&dates=2022-01-31&cascadeDown=1&period=week

Now the second problem:
As we now have good data (number of visits, of page views, of hits) for the February Month 🥇 and last week of January 🥇, the data for the whole month of January is wrong (equal to the last day of January 😭).

@heurteph-ei commented on March 9th 2022

@bx80 do you think my problem will be solved by your PR?

@bx80 commented on March 27th 2022 Contributor

Hi @heurteph-ei, thanks for reaching out, sorry about the delay getting back to you.

Unfortunately I don't think this PR will resolve your problem. It fixes an issue where blank archives were being created for future time periods, which could potentially result in single day archives missing data, but if you've invalidated and regenerated archives for the affected time periods then that would have been corrected.

Did you try invalidating the whole month of January?

If you're still experiencing the problem then it'd be great if you could create a separate ticket so we can look into this further.

This Issue was closed on March 8th 2022
Powered by GitHub Issue Mirror