Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependent archives show only partial data or no report data #18773

Closed
tsteur opened this issue Feb 10, 2022 · 3 comments · Fixed by #18790
Closed

Dependent archives show only partial data or no report data #18773

tsteur opened this issue Feb 10, 2022 · 3 comments · Fixed by #18790
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. Critical Indicates the severity of an issue is very critical and the issue has a very high priority. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Feb 10, 2022

refs DEV-2418 for more details.

When archiving a week, then in https://github.com/matomo-org/matomo/blob/4.7.1/core/Archive.php#L629-L643 we archive up to 2 days into the future. That means if today is 2022-02-09, then when archiving this week we also archive 2022-02-10 and 2022-02-11. I noticed this in the archive tables:

image

image

In above screenshots we can see that the archive was created eg on 2nd Feb for a date in the future.

That means we create archives for in the future and of course these archives will have 0 data.

I assume that these archives in the future are also created for regular archives, not only dependent archives. However, I believe it works for these regular archives because they are invalidated again regularly and therefore the data is re-archived one or two days later. The logic for invalidating archives currently doesn't know which dependent archives exist with what segments. Meaning I'm assuming these get never invalidated.

I'm meaning calls like https://github.com/matomo-org/matomo/blob/4.7.1/plugins/Goals/Archiver.php#L124-L127
$this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::NEW_VISITOR_SEGMENT);. These calls also happen in non-core plugins specifically MediaAnalytics for example. As a result, the media map always shows 0 data unless if for some reason there is rarely already some data there.

Not sure if we still need to allow archiving any date in the future and why this is there in https://github.com/matomo-org/matomo/blob/4.7.1/core/Archive.php#L629-L643 . This code goes back 9 years. There are likely few ways to fix this:

The goals of this issue is that Media Analytics Map works again correctly (see DEV-2418 for all the details). We can see this eg on demo

  1. Maybe we don't need to archive this data in the future then the issue be fixed.
  2. Or maybe we need to have some event in something like https://github.com/matomo-org/matomo/blob/4.7.1/core/DataAccess/Model.php#L119 to also trigger archives for segments that are used by processDependentArchive.

I've also created a related issue: "invalidation for dependent archives doesn't work" in #18772 . Depending how this issue is fixed, the other issue will be fixed too. I separated them because this one might be easier to fix by changing to not archive data in the future anymore.

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label Feb 10, 2022
@tsteur tsteur added this to the 4.8.0 milestone Feb 10, 2022
@bx80 bx80 self-assigned this Feb 14, 2022
@justinvelluppillai justinvelluppillai modified the milestones: 4.8.0, 4.9.0 Mar 1, 2022
@tsteur tsteur added the Critical Indicates the severity of an issue is very critical and the issue has a very high priority. label Mar 2, 2022
@heurteph-ei
Copy link

We just encounter some similar issue (please confirm the similarity, or maybe I have to create another ticket?):

We use some segment based on visit dimension data.
On Friday, March the 4th 2022:

  • We had data (number of visits, of page views, of hits) for whole January 2022.
  • I don't know why, but we did not have data for week 2022-01-31..22-02-06
  • We had data for other weeks of February (2022-02-07..2022-02-13 - 2022-02-14..2022-02-20 - 2022-02-21..2022-02-27)
  • We had data for the last day of February ((2022-02-28) and for other single days of the first week of March.

First problem: the data (displayed in custom reports) of the February month is equivalent to sum of week 2022-02-21..2022-02-27 and single day 2022-02-28...

Then I decided to invalidate the February month and also the last week of January (2022-01-31..22-02-06):

  1. /?module=API&method=CoreAdminHome.invalidateArchivedReports&idSites=6&dates=2022-02-01&cascadeDown=1&period=month
  2. /?module=API&method=CoreAdminHome.invalidateArchivedReports&idSites=6&dates=2022-01-31&cascadeDown=1&period=week

Now the second problem:
As we now have good data (number of visits, of page views, of hits) for the February Month 🥇 and last week of January 🥇, the data for the whole month of January is wrong (equal to the last day of January 😭).

@heurteph-ei
Copy link

@bx80 do you think my problem will be solved by your PR?

@bx80
Copy link
Contributor

bx80 commented Mar 27, 2022

Hi @heurteph-ei, thanks for reaching out, sorry about the delay getting back to you.

Unfortunately I don't think this PR will resolve your problem. It fixes an issue where blank archives were being created for future time periods, which could potentially result in single day archives missing data, but if you've invalidated and regenerated archives for the affected time periods then that would have been corrected.

Did you try invalidating the whole month of January?

If you're still experiencing the problem then it'd be great if you could create a separate ticket so we can look into this further.

@justinvelluppillai justinvelluppillai added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Critical Indicates the severity of an issue is very critical and the issue has a very high priority. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants