Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mismatched visitors (Overview / Visitor Log) when segment is applied #14963

Merged
merged 7 commits into from Oct 14, 2019

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Oct 7, 2019

fix #13861

also refs #9200 and now we no longer need workaround in #10024

Tried various different solutions until I realised we can simplify this actually all quite a lot by no longer using any subselects at all and likely also making regular visitor logs without segment a bit faster with some logic I added recently.

Now we basically have no more subqueries in the visitor log (similar to archiving yay). Instead we apply the group by, order, and limit directly. More on this in the comments...

You can see in the tests how this simplifies queries. When segments were applied we had before 3 subqueries (or one query with 2 nested inner queries), and when no segment was applied we had 2 subqueries. Now there's always only one query. It also removes a group by when no segment is applied.

@tsteur tsteur added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Needs Review PRs that need a code review labels Oct 7, 2019
@tsteur tsteur added this to the 3.12.0 milestone Oct 7, 2019
$groupBy, $orderBy, $limitAndOffset);
try {
if ($forceGroupBy && $groupBy) {
$this->segmentQueryBuilder->forceInnerGroupBySubselect(LogQueryBuilder::FORCE_INNER_GROUP_BY_NO_SUBSELECT);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a feature I added recently... as we're using a group by idvisit we basically don't ever need to use a subselect...

// this $innerLimit is a workaround (see https://github.com/piwik/piwik/issues/9200#issuecomment-183641293)
$innerLimit = $limit;
if (!$segment->isEmpty()) {
$innerLimit = $limit * 10;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer need that workaround since things should work correctly now

// Group by idvisit so that a given visit appears only once, useful when for example:
// 1) when a visitor converts 2 goals
// 2) when an Action Segment is used, the inner query will return one row per action, but we want one row per visit
$sql = "
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subquery was actually never needed when no segment was applied yet we had it there... and instead of having the group by in the outer query, we now have only one query and the group by applied there.

$expectedSql = ' SELECT sub.* FROM
(
SELECT log_visit.*
$expectedSql = ' SELECT log_visit.*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's how it simplifies the queries when no segment is applied. No more subquery, and we don't even need a group by anymore.

ORDER BY sub.visit_last_action_time DESC
LIMIT 100
';
$expectedSql = ' SELECT log_visit.*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can see how it simplifies those 3 queries into one.

@@ -78,6 +78,19 @@ public function getApiForTesting()
)
);

$apiToTest[] = array(array('Live.getLastVisitsDetails'),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this test to ensure it works besides the already existing tests that generates the SQL where we can see in the SQL query that it will now work

@tsteur
Copy link
Member Author

tsteur commented Oct 7, 2019

@sgiehl or @mattab if someone could have a review that be great? @mattab be good you can confirm it should be fine from a performance perspective. From what I see it should be fine... or there is at least no other way to have segmenting fully work with the visitor log. In regards to your comment in #13861 (comment) we basically only added this to the visitor log and only when segmenting. Considering we remove a few subselects it might not even be any slower or even faster?

@mattab
Copy link
Member

mattab commented Oct 14, 2019

From performance perspective it should be fine 👍
Haven't reviewed the exact code in detail, but overall the SQL change looks great and I can't think of a way this could create a problem.

@tsteur tsteur merged commit 148e049 into 3.x-dev Oct 14, 2019
@tsteur tsteur deleted the 13861 branch October 14, 2019 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mismatched visitors (Overview / Visitor Log) when segment is applied
2 participants