Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid any archiving queries in Goals Archiver if no goals or ecommerce are active #18354

Merged
merged 24 commits into from Jan 11, 2022

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented Nov 19, 2021

Description:

fixes #18283

Note: The test changes are kind of expected. We had a couple of fixtures where ecommerce data was tracked, even though it actually wasn't enabled for that site.

For some Fixtures I have activate the ecommerce flag, while for others I kept it off on purpose. That way we can see that there are null records archived, even though some data would actually be tracked and available.

Review

@sgiehl sgiehl added this to the 4.7.0 milestone Nov 19, 2021
@sgiehl sgiehl added Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. labels Nov 19, 2021
@sgiehl sgiehl marked this pull request as ready for review November 19, 2021 10:27
Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgiehl I suppose it won't cause any issue if we don't implement similar logic in aggregate archives? I had this done initially in https://github.com/matomo-org/matomo/compare/improvegoalarchive?expand=1 but maybe it keeps code simpler if we don't do it like it did. Just not entirely sure if there could be downsides like if you request certain days and it doesn't find that archive, whether it would maybe try to archive it. But I think it won't as long as there is a done flag there. Just double checking.

The biggest benefit of this issue is to not having to go into $this->getProcessor()->processDependentArchive when there is no goals and archive. I think that's currently still the case. Could we ideally not call it when there are no goals and no ecommerce?

This saves us having to create a temporary table for the new_visit and returning visit segment and not having to fetch these visitors (which can be very slow)

plugins/Goals/Archiver.php Show resolved Hide resolved
@sgiehl
Copy link
Member Author

sgiehl commented Nov 22, 2021

@tsteur won't $this->getProcessor()->processDependentArchive also (only) call the goals archiver, but with a segment? It should skip doing any queries in that case as well, but create empty archives. Thought it would be good to have empty ones there as well, to ensure they won't be created later maybe.

@tsteur
Copy link
Member

tsteur commented Nov 23, 2021

@sgiehl AFAIK it would still try to resolve the segment because every time we process dependent archives it's also trying to resolve the number of visits we had for that segment even before we go into the goals archiver

image

@tsteur
Copy link
Member

tsteur commented Nov 23, 2021

The only way would be to initialise a new archive manually for that segment and writing empty entries and finishing the archive without even going in there

@sgiehl
Copy link
Member Author

sgiehl commented Nov 23, 2021

Haven't yet debugged that through, but won't those "VisitsSummary" numbers be archived by the VisitFrequency archiver nevertheless. So it should at least be no additional "work" to do 🤔

@tsteur
Copy link
Member

tsteur commented Nov 23, 2021

It's something general the archiver does for any kind of archive. Even if it was a say customreports specific archive. I'm not sure if we can check for any other archives there as they may be outdated or be invalidated soon after this or something. Also would partially depend on the order in which plugins archive in the same archive etc.

@sgiehl
Copy link
Member Author

sgiehl commented Nov 23, 2021

@tsteur added some code to skip the dependent archives if there are no goals and ecommerce isn't used. I guess that's the simplest solution for now even though that might mean that those archives might be archived later when that changes.

@tsteur
Copy link
Member

tsteur commented Nov 24, 2021

@sgiehl we might need to do the same for the processDependentArchive calls in aggregateMultipleReports? I'm not entirely sure what Matomo does here but usual behaviour when Matomo tries to aggregate data, and it doesn't have the data for the subperiods, then it would launch the archiving for these subperiods. Aka it might try to launch the day archive for each dependent archive. I haven't double checked but wanted to check if you tested that case maybe?

@sgiehl
Copy link
Member Author

sgiehl commented Nov 24, 2021

@tsteur Haven't tested that. But as there won't be a done flag for that segment, I'm pretty sure the day periods would then be processed. Just pushed an update so it will be skipped as well

@@ -181,28 +177,12 @@ public function test_pluginOnlyArchivingDoesNotRelaunchChildArchives_whenReusing
],
[
'idarchive' => '3',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this files are caused by the fact, that the dependent archives are no longer been processed when no goals are available and ecommerce is disabled. Therefor the Goals archives (for segments) disappeared. The VisitsSummary archives are still present, as they are archived by the VisitFrequency archiver.

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Dec 7, 2021
@sgiehl sgiehl added Do not close PRs with this label won't be marked as stale by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Dec 7, 2021
@sgiehl
Copy link
Member Author

sgiehl commented Dec 21, 2021

@tsteur I tried working further on that. But actually I did not yet find a solution that would really work.
What we actually tried to avoid is processing the VisitsSummary metrics when the dependent segment metrics for Goals are processed, when it is actually not needed.
Tried inserting an empty archive (so there is a done flag for goals), as well as changing the code, so VisitsSummary won't be processed at all if only one plugin is requested. In theory that works. Running the archiving only creates archives and done flags for the Goals plugin. The VisitsSummary metrics aren't archived, at least not in the Goals plugin.
The VisitFrequency plugin will always archive the VisitsSummary metrics from the same segments. So that will be done in most cases nevertheless.
But that actually does not really solve the problem. Because as soon as the report Goals.get is requested all those VisitsSummary archives would be processed, which is caused by:

$segments = array(
'' => false,
'_new_visit' => VisitFrequencyAPI::NEW_VISITOR_SEGMENT,
'_returning_visit' => VisitFrequencyAPI::RETURNING_VISITOR_SEGMENT
);
foreach ($segments as $appendToMetricName => $predefinedSegment) {
if (!empty($predefinedSegment)) {
// we are disabling the archiving of these segments as the archiver archives them already using
// archiveProcessDependend logic. Otherwise we would eg archive reports that we don't need:
// userid=5;visitorType%3D%3Dnew;visitorType%3D%3Dreturning%2CvisitorType%3D%3DreturningCustomer
// userid=5;visitorType%3D%3Dreturning%2CvisitorType%3D%3DreturningCustomer;visitorType%3D%3Dnew;
// it would also archive dependends for these segments that we already combined here and then combine
// segments again when archiving dependends
Archiver::$ARCHIVE_DEPENDENT = false;
}
$segmentToUse = $this->appendSegment($segment, $predefinedSegment);
/** @var DataTable|DataTable\Map $tableSegmented */
$tableSegmented = Request::processRequest('Goals.getMetrics', array(
'segment' => $segmentToUse,
'idSite' => $idSite,
'period' => $period,
'date' => $date,
'idGoal' => $idGoal,
'columns' => $columns,
'showAllGoalSpecificMetrics' => $showAllGoalSpecificMetrics,
'format_metrics' => !empty($compare) ? 0 : Common::getRequestVar('format_metrics', 'bc'),
), $default = []);

The getMetrics API tries to fetch the numeric record for nb_visits, which is a VisitsSummary metric...

The only useful solution I still would see is to process the dependent archives for goals in all cases (like it was in the beginning). The day archiver inserts empty archives and skips all queries. That for sure will also cause the core metrics being processed (which VisitsFrequency would do nevertheless).

Nevertheless I've tried to do some other changes that might prevent that the requested plugin is lost during the archiving process. Had to revert that, as it broke almost all tests.

@tsteur
Copy link
Member

tsteur commented Dec 22, 2021

👍 we can keep it basically to as before. The only other thing would be that if no goals and no ecommerce is configured, to manually create these archives
image

Not sure if it's worth looking at this or not? I'm generally thinking it could make archiving quite a bit faster when we don't need to do the processDependentArchives as segments are always a bit slow to resolve to create this temporary table etc. But it's also not the end of the world if we just keep it now as is in the PR and always process the dependent archives.

@sgiehl
Copy link
Member Author

sgiehl commented Dec 22, 2021

@tsteur I tried that, but it led to strange behavior when the VisitFrequency plugin was disabled.
Running the normal archiving created an empty done flag for Goals only. So far so good, but once the report Goals.get was requested an additional VisitsSummary done flag for those segment appeared, without having any additional data. Didn't debug that through, but guess that might cause other issues if that segment would be used later for other purpose, as the core metrics might stay empty.

Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We wouldn't want to create any such archive record for the VisitsSummary plugin either as we would be writing 0 entries probably when there could be visits etc.

Left only one comment to look at. Not sure we need to do anything there. Then good to merge 👍


/*
* Archive General Goal metrics
*/
$goalIdsToSum = GoalManager::getGoalIds($this->getProcessor()->getParams()->getSite()->getId());

//Ecommerce
$goalIdsToSum[] = GoalManager::IDGOAL_ORDER;
$goalIdsToSum[] = GoalManager::IDGOAL_CART; //bug here if idgoal=1
$goalIdsToSum = array_merge($goalIdsToSum, $this->getEcommerceIdGoals());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we only add the ecommerceGoalIds if ecommerce is enabled? and then $this->getProcessor()->aggregateNumericMetrics($fieldsToSum); we could run only if $fieldsToSum is not empty? $this->getProcessor()->aggregateNumericMetrics($fieldsToSum);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a check for ecommerce plugin. But the $goalIdsToSum array actually always contains the entry false, for processing the overview. Not sure if we should skip that one if no other goal is processed or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we could always keep the false 👍

@github-actions
Copy link
Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Dec 30, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2022

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Jan 6, 2022
@sgiehl
Copy link
Member Author

sgiehl commented Jan 11, 2022

@tsteur I've updated the tests. From my side everything should be done. Let me know if we can merge or if anything else needs to be adjusted.

@tsteur
Copy link
Member

tsteur commented Jan 11, 2022

sounds good to merge @sgiehl 👍

@sgiehl sgiehl merged commit 0e5a50e into 4.x-dev Jan 11, 2022
@sgiehl sgiehl deleted the skipgoalarchiving branch January 11, 2022 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Do not close PRs with this label won't be marked as stale by the Close Stale Issues action Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't archive anything in Goals unless a goal is configured or ecommerce is used
2 participants