Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segment table logic might falsely remove a where statement #17525

Closed
wants to merge 2 commits into from

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented May 6, 2021

Description:

I noticed a problem in the log aggregator that affects especially non-core plugins that aren't tracking. For example A/B testing were where clauses ignored when a segment is used.

For example when calling

$select = 'log_abtesting.idvariation as label, count(distinct log_abtesting.idvisitor) as uniqueVisitors, log_abtesting.entered as entered';
$from = ['log_abtesting'];
$where = 'log_abtesting.server_time >= ?
				AND log_abtesting.server_time <= ?
				AND log_abtesting.idsite IN (?) AND log_abtesting.idexperiment = 1';
$groupBy = 'log_abtesting.idvariation, log_abtesting.entered';
$this->logAggregator->generateQuery($select, $from, $where, $groupBy, $orderBy='');

Then it would generate below query:

SELECT /* segmenthash cc1bf84e0798d633ba96c7e24983cb07 */ /* sites 1 */ /* 2020-08-27,2021-05-06 */
				log_abtesting.idvariation as label, count(distinct log_abtesting.idvisitor) as uniqueVisitors, log_abtesting.entered as entered
			FROM
				matomo_logtmpsegmentad7af0eba5c424ef32a77fe905923136 AS logtmpsegmentad7af0eba5c424ef32a77fe905923136 INNER JOIN matomo_log_abtesting AS log_abtesting ON log_abtesting.idvisit = logtmpsegmentad7af0eba5c424ef32a77fe905923136.idvisit
			WHERE
				log_abtesting.idexperiment = 1
			GROUP BY
				log_abtesting.idvariation, log_abtesting.entered

as you notice the where clause is completely gone! And it wasn't applied in the table matomo_logtmpsegmentad7af0eba5c424ef32a77fe905923136 because we create the tmp table only once during the entire archiving and there is only this where applied: https://github.com/matomo-org/matomo/blob/4.3.0-b3/core/DataAccess/LogAggregator.php#L360 which is

log_visit.visit_last_action_time>= ? AND log_visit.visit_last_action_time<= ? AND log_visit.idsite IN (...)"

The where part of the actual query gets lost in https://github.com/matomo-org/matomo/blob/4.3.0-b3/core/DataAccess/LogAggregator.php#L382-L391 because it removes the where part thinking it is already applied in the logtmpsegment table but this is not the case.

With this PR it actually results in the correct query:

SELECT /* segmenthash cc1bf84e0798d633ba96c7e24983cb07 */ /* sites 1 */ /* 2020-08-27,2021-05-06 */
				log_abtesting.idvariation as label, count(distinct log_abtesting.idvisitor) as uniqueVisitors, log_abtesting.entered as entered
			FROM
				matomo_logtmpsegmentad7af0eba5c424ef32a77fe905923136 AS logtmpsegmentad7af0eba5c424ef32a77fe905923136 INNER JOIN matomo_log_abtesting AS log_abtesting ON log_abtesting.idvisit = logtmpsegmentad7af0eba5c424ef32a77fe905923136.idvisit
			WHERE
				log_abtesting.server_time >= ?
				AND log_abtesting.server_time <= ?
				AND log_abtesting.idsite IN (?) AND log_abtesting.idexperiment = 1
			GROUP BY
				log_abtesting.idvariation, log_abtesting.entered

I'm not 100% sure why no tests are failing now in core itself. Either they always included log_visit table and where condition or the where condition started differently. It is definitely an issue for A/B testing and from what I can see also other premium features.

Review

  • Functional review done
  • Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems)
  • Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • Security review done see checklist
  • Code review done
  • Tests were added if useful/possible
  • Reviewed for breaking changes
  • Developer changelog updated if needed
  • Documentation added if needed
  • Existing documentation updated if needed

@tsteur tsteur added the Needs Review PRs that need a code review label May 6, 2021
@tsteur tsteur added this to the 4.3.0 milestone May 6, 2021
$bind = array();
break;
}
if (strpos($where, $segmentWhere) === 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we basically only want to remove log_visit.visit_last_action_time>= ? AND log_visit.visit_last_action_time<= ? AND log_visit.idsite IN (...)" from a potential where statement (because it is already applied in original statement) but not any other where clause.

@tsteur tsteur added Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. and removed Needs Review PRs that need a code review labels May 6, 2021
@tsteur tsteur closed this May 6, 2021
@tsteur tsteur deleted the logvisitwhere branch May 6, 2021 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant