@sgiehl opened this Pull Request on November 19th 2021 Member

Description:

fixes #18283

Note: The test changes are kind of expected. We had a couple of fixtures where ecommerce data was tracked, even though it actually wasn't enabled for that site.

For some Fixtures I have activate the ecommerce flag, while for others I kept it off on purpose. That way we can see that there are null records archived, even though some data would actually be tracked and available.

Review

@sgiehl commented on November 22nd 2021 Member

@tsteur won't $this->getProcessor()->processDependentArchive also (only) call the goals archiver, but with a segment? It should skip doing any queries in that case as well, but create empty archives. Thought it would be good to have empty ones there as well, to ensure they won't be created later maybe.

@tsteur commented on November 23rd 2021 Member

@sgiehl AFAIK it would still try to resolve the segment because every time we process dependent archives it's also trying to resolve the number of visits we had for that segment even before we go into the goals archiver

image

@tsteur commented on November 23rd 2021 Member

The only way would be to initialise a new archive manually for that segment and writing empty entries and finishing the archive without even going in there

@sgiehl commented on November 23rd 2021 Member

Haven't yet debugged that through, but won't those "VisitsSummary" numbers be archived by the VisitFrequency archiver nevertheless. So it should at least be no additional "work" to do 🤔

@tsteur commented on November 23rd 2021 Member

It's something general the archiver does for any kind of archive. Even if it was a say customreports specific archive. I'm not sure if we can check for any other archives there as they may be outdated or be invalidated soon after this or something. Also would partially depend on the order in which plugins archive in the same archive etc.

@sgiehl commented on November 23rd 2021 Member

@tsteur added some code to skip the dependent archives if there are no goals and ecommerce isn't used. I guess that's the simplest solution for now even though that might mean that those archives might be archived later when that changes.

@tsteur commented on November 24th 2021 Member

@sgiehl we might need to do the same for the processDependentArchive calls in aggregateMultipleReports? I'm not entirely sure what Matomo does here but usual behaviour when Matomo tries to aggregate data, and it doesn't have the data for the subperiods, then it would launch the archiving for these subperiods. Aka it might try to launch the day archive for each dependent archive. I haven't double checked but wanted to check if you tested that case maybe?

@sgiehl commented on November 24th 2021 Member

@tsteur Haven't tested that. But as there won't be a done flag for that segment, I'm pretty sure the day periods would then be processed. Just pushed an update so it will be skipped as well

@tsteur commented on November 30th 2021 Member

@sgiehl I've just done some testing what happens when previously we were archiving days/weeks/months.

Then we add a goal.

Then it goes into the aggregateMultipleReports for a week or month.

It will then launch a processDependentArchive.

It will notice there's no such archive there yet.

It will initiate a new launch of archiving for each day. The parameters are then set again to root request and onlyArchiveRequestedPlugin is set to false meaning it will archive all plugins by the looks for that segment instead of only the goal specific one. That makes it potentially worse. It probably also means that we might sometimes launch the entire day of archiving for a dependent archive if the weeks or month being processed before a day which may be especially bad for browser archiving (in general, not just for this issue).

The call stack is quite long.

Screenshot 2021-11-30 at 1 34 42 PM
Screenshot 2021-11-30 at 1 35 54 PM

Even worse it will then launch processDependentArchive again for that day, and because self::$ARCHIVE_DEPENDENT is not set to false, it will eg combine the returning visit with the new visit segment and archive this too.

image

We could probably work around it by writing empty archives somewhat similar to this when no goal is configured:

image

But it makes me think maybe there is a problem with processDependentArchive overall as even if the processDependentArchive launches archiving for missing periods, then it should not suddenly lose these archive process parameters
image

How can we make sure that we wouldn't suddenly launch archiving for all plugins in that case and to make sure we only archive data we need? or maybe there are other ideas/fixes?

@github-actions[bot] commented on December 7th 2021 Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@sgiehl commented on December 21st 2021 Member

@tsteur I tried working further on that. But actually I did not yet find a solution that would really work.
What we actually tried to avoid is processing the VisitsSummary metrics when the dependent segment metrics for Goals are processed, when it is actually not needed.
Tried inserting an empty archive (so there is a done flag for goals), as well as changing the code, so VisitsSummary won't be processed at all if only one plugin is requested. In theory that works. Running the archiving only creates archives and done flags for the Goals plugin. The VisitsSummary metrics aren't archived, at least not in the Goals plugin.
The VisitFrequency plugin will always archive the VisitsSummary metrics from the same segments. So that will be done in most cases nevertheless.
But that actually does not really solve the problem. Because as soon as the report Goals.get is requested all those VisitsSummary archives would be processed, which is caused by:
https://github.com/matomo-org/matomo/blob/d5997bd1dd1218c09d8369b880a732af2c69e685/plugins/Goals/API.php#L472-L500
The getMetrics API tries to fetch the numeric record for nb_visits, which is a VisitsSummary metric...

The only useful solution I still would see is to process the dependent archives for goals in all cases (like it was in the beginning). The day archiver inserts empty archives and skips all queries. That for sure will also cause the core metrics being processed (which VisitsFrequency would do nevertheless).

~Nevertheless I've tried to do some other changes that might prevent that the requested plugin is lost during the archiving process.~ Had to revert that, as it broke almost all tests.

@tsteur commented on December 22nd 2021 Member

👍 we can keep it basically to as before. The only other thing would be that if no goals and no ecommerce is configured, to manually create these archives
image

Not sure if it's worth looking at this or not? I'm generally thinking it could make archiving quite a bit faster when we don't need to do the processDependentArchives as segments are always a bit slow to resolve to create this temporary table etc. But it's also not the end of the world if we just keep it now as is in the PR and always process the dependent archives.

@sgiehl commented on December 22nd 2021 Member

@tsteur I tried that, but it led to strange behavior when the VisitFrequency plugin was disabled.
Running the normal archiving created an empty done flag for Goals only. So far so good, but once the report Goals.get was requested an additional VisitsSummary done flag for those segment appeared, without having any additional data. Didn't debug that through, but guess that might cause other issues if that segment would be used later for other purpose, as the core metrics might stay empty.

@github-actions[bot] commented on December 30th 2021 Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions[bot] commented on January 6th 2022 Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@sgiehl commented on January 11th 2022 Member

@tsteur I've updated the tests. From my side everything should be done. Let me know if we can merge or if anything else needs to be adjusted.

@tsteur commented on January 11th 2022 Member

sounds good to merge @sgiehl 👍

This Pull Request was closed on January 11th 2022
Powered by GitHub Issue Mirror