@sgiehl opened this Pull Request on November 19th 2021 Member

Description:

fixes #18283

Note: The test changes are kind of expected. We had a couple of fixtures where ecommerce data was tracked, even though it actually wasn't enabled for that site.

For some Fixtures I have activate the ecommerce flag, while for others I kept it off on purpose. That way we can see that there are null records archived, even though some data would actually be tracked and available.

Review

@sgiehl commented on November 22nd 2021 Member

@tsteur won't $this->getProcessor()->processDependentArchive also (only) call the goals archiver, but with a segment? It should skip doing any queries in that case as well, but create empty archives. Thought it would be good to have empty ones there as well, to ensure they won't be created later maybe.

@tsteur commented on November 23rd 2021 Member

@sgiehl AFAIK it would still try to resolve the segment because every time we process dependent archives it's also trying to resolve the number of visits we had for that segment even before we go into the goals archiver

image

@tsteur commented on November 23rd 2021 Member

The only way would be to initialise a new archive manually for that segment and writing empty entries and finishing the archive without even going in there

@sgiehl commented on November 23rd 2021 Member

Haven't yet debugged that through, but won't those "VisitsSummary" numbers be archived by the VisitFrequency archiver nevertheless. So it should at least be no additional "work" to do 🤔

@tsteur commented on November 23rd 2021 Member

It's something general the archiver does for any kind of archive. Even if it was a say customreports specific archive. I'm not sure if we can check for any other archives there as they may be outdated or be invalidated soon after this or something. Also would partially depend on the order in which plugins archive in the same archive etc.

@sgiehl commented on November 23rd 2021 Member

@tsteur added some code to skip the dependent archives if there are no goals and ecommerce isn't used. I guess that's the simplest solution for now even though that might mean that those archives might be archived later when that changes.

@tsteur commented on November 24th 2021 Member

@sgiehl we might need to do the same for the processDependentArchive calls in aggregateMultipleReports? I'm not entirely sure what Matomo does here but usual behaviour when Matomo tries to aggregate data, and it doesn't have the data for the subperiods, then it would launch the archiving for these subperiods. Aka it might try to launch the day archive for each dependent archive. I haven't double checked but wanted to check if you tested that case maybe?

@sgiehl commented on November 24th 2021 Member

@tsteur Haven't tested that. But as there won't be a done flag for that segment, I'm pretty sure the day periods would then be processed. Just pushed an update so it will be skipped as well

@tsteur commented on November 30th 2021 Member

@sgiehl I've just done some testing what happens when previously we were archiving days/weeks/months.

Then we add a goal.

Then it goes into the aggregateMultipleReports for a week or month.

It will then launch a processDependentArchive.

It will notice there's no such archive there yet.

It will initiate a new launch of archiving for each day. The parameters are then set again to root request and onlyArchiveRequestedPlugin is set to false meaning it will archive all plugins by the looks for that segment instead of only the goal specific one. That makes it potentially worse. It probably also means that we might sometimes launch the entire day of archiving for a dependent archive if the weeks or month being processed before a day which may be especially bad for browser archiving (in general, not just for this issue).

The call stack is quite long.

Screenshot 2021-11-30 at 1 34 42 PM
Screenshot 2021-11-30 at 1 35 54 PM

Even worse it will then launch processDependentArchive again for that day, and because self::$ARCHIVE_DEPENDENT is not set to false, it will eg combine the returning visit with the new visit segment and archive this too.

image

We could probably work around it by writing empty archives somewhat similar to this when no goal is configured:

image

But it makes me think maybe there is a problem with processDependentArchive overall as even if the processDependentArchive launches archiving for missing periods, then it should not suddenly lose these archive process parameters
image

How can we make sure that we wouldn't suddenly launch archiving for all plugins in that case and to make sure we only archive data we need? or maybe there are other ideas/fixes?

Powered by GitHub Issue Mirror