Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatched visitors (Overview / Visitor Log) when segment is applied #13861

Closed
linoacarvalho opened this issue Dec 14, 2018 · 14 comments · Fixed by #14963
Closed

Mismatched visitors (Overview / Visitor Log) when segment is applied #13861

linoacarvalho opened this issue Dec 14, 2018 · 14 comments · Fixed by #14963
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo.
Milestone

Comments

@linoacarvalho
Copy link

linoacarvalho commented Dec 14, 2018

I have created a basic segment to show this unexpected behavior.

1

When I apply the segment as shown in above image, I expect to see 5 visits in Visitor Log. But, when the segment is applied, there are only 3 visitors in Visitor Log as shown in the image below.

2

I have observed this behavior in more complex segments, but I can reproduce it in basic segments when I use the options Action URL or Page URL.

I'm doing something wrong or this is really a bug!? I'm using Matomo 3.7.0, but this occurs in release 3.5.1 too.

Best regards.

@tsteur
Copy link
Member

tsteur commented Dec 14, 2018

Could be timezone related maybe #13799 #13787 @diosmosis any thoughts maybe?

@linoacarvalho
Copy link
Author

linoacarvalho commented Dec 17, 2018

More information for this issue:

  • I imported the logs with Log Analytics tool (Python script).
  • This mismatch not occurs if the options Action URL or Page URL isn't used in segment (but i need this). Because of this i don't think this is timezone related.
  • Sometimes the Visitor Log shows "more" visitors than in Overview. In this case the visits in Visitor Log (when I click Next) are shown as duplicated (or even more).

@diosmosis
Copy link
Member

Tried to reproduce this locally, w/ & w/o different timezone, but it's all working for me.

@linoacarvalho if you try a larger period, like month or year, are the results accurate? And if you look at each day within the week (assuming there's data for more than one day), are all the days inaccurate or just some? And just to be certain, if Provider isn't a part of the segment, the same inaccuracy is still present?

@linoacarvalho
Copy link
Author

@diosmosis with a larger period the results aren't accurate too. If I look at each day within the week, just few days are accurate. The Provider was just used for generate the images with small results, with or without it don't change the accuracy, but in my tests I could observe that the inaccuracy is present only if I use Action URL or Page URL as a part of the segment.

@tsteur
Copy link
Member

tsteur commented Dec 18, 2018

What happens when you increase the limit in bottom right from 10 to say 100? I've also tried to reproduce but works fine for me here.

@diosmosis
Copy link
Member

One other suggestion if increasing the limit doesn't help: if you compare one of the days that is accurate vs one of the days that is inaccurate, do you see any noticeable differences? Does one day, eg, have visits near midnight while the other doesn't? Or maybe one day has visits that span multiple days?

@linoacarvalho
Copy link
Author

linoacarvalho commented Dec 19, 2018

@tsteur in my tests, i used a segment with 22 visitors in overview. See whats happen changing what you suggested:

  • When used the limit 10 in bottom right, is shown 8 visitors in first page of Visitor Log. The visits shown in Visitor Log when i click "Next" or "Previous" are repeated parts of the set of visits. In a page is shown 8 visitors, in other 10, in other 7, in other 2, until i get the message "There is no data for this report". If I click "Next", until is shown "There is no data for this report", the sum of visitors is 170.
  • When used the limit 100 in bottom right, is shown 22 visitors in first page of Visitor Log. Despite the fact that all visitors are shown in the first page, if i click "Next" (until i get "There is no data for this report"), the sum of visitors is 40.
  • Apparentelly if i "filter" the unique visits shown in the all pages of Visitor Log, i will see the 22 visits.

@diosmosis i don't see any noticeables diferences, the problem occurs with or without visits near midnight. However, maybe some days has visits that span multiple days.

In more tests, i got a situation with 526 visitors in overview and only one visitor in the first page of Visitor Log (with limit 10 in bottom right). When the limit is raised to at least 250, two visitors are shown.

Again, if i don't use Page URL as part of a segment, this behavior doesn't occurs, everything is perfect.

Maybe the Log Analytics tool (Python script) is importing any garbage? Maybe there are many similars URL matched by the Action/Page URL of a segment?

@tsteur
Copy link
Member

tsteur commented Dec 19, 2018

Do I see this right that when you select the limit 100, then you see the 22 visits in the visits log which is also shown in the visits overview and the number is correct? Of course I see you have the problem with the paging. This might be due to #9200 and #10024 . I reckon it might be just a duplicate of those issues?

@linoacarvalho
Copy link
Author

@tsteur on the case used in tests yesterday this is correct. In other case that i have tested today, with 441 visitors in overview, i get 1 visitor with the limit 10 in bottom right and 8 visitors with the limit 500 in bottom right on Visitor Log.

I confirm that i get same behaviour using CustomVariable as a part of segment, without using Action or Page URL. It might be a duplicate of those issues because if i add '& filter_limit = -1' to the end URL, all visitors are shown in Visitor Log, but it differs from the description of the issue #9200 in the following things:

  • I'm not getting less visitors in Visitor Log than the overview (maybe i don't understand this before). The issue is based on not getting all visitors (or the limit specified on bottom right) in a page on the Visitors Log, even if the limit in bottom right it is greater than the total visitors shown in overview.
  • When clicked next i do not ever get a page showing no date.
  • The visitors, counted by the sum of all pages exhibited in Visitor Log, is often greater than the overview.

I have seen the issue #9200 before, but thought that problem wasn't applicable to version 3. I considered that this was a old issue and related to version 2 of Matomo. You think this is the same problem?

Thanks for your help and time.

@tsteur
Copy link
Member

tsteur commented Dec 20, 2018

I was going to close the issue and reference to #9200 but I think I can see what happens. First of all... unfortunately #9200 is still an issue in Matomo 3 :(

Then instead of just reproducing the issue I should have debugged it and looked at the query it generates and would have likely understand it much faster, sorry about that! I think you're absolutely right that it is a different issue. I think it happens when the visitors have a lot of pageviews or quite a few... In my test instance most visitors had only one pageview or two and it wasn't noticeable.

The query generated is eg

			SELECT sub.* FROM (
				/* idSegments = [44] */

			SELECT
				log_inner.*
			FROM
				
        (
            
			SELECT
				log_visit.*
			FROM
				piwik_log_visit AS log_visit LEFT JOIN piwik_log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
			WHERE
				( log_visit.idsite in (?) 
				AND log_visit.visit_last_action_time >= ?
				AND  log_visit.visit_last_action_time <= ? )
                AND
                ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', ?, '%')  AND type = 1 )) ) )
			ORDER BY
				idsite DESC, visit_last_action_time DESC LIMIT 0, 110
        ) AS log_inner
			ORDER BY
				idsite DESC, visit_last_action_time DESC
			) AS sub
			GROUP BY sub.idvisit
			ORDER BY sub.visit_last_action_time DESC
		LIMIT 11 

I think the inner query likely generates one row per pageview a visitor had. So if eg 2 visitors had 55 pageviews each, then it would only show 2 visitors on the first page when it is grouped by visitor.

I just checked and can reproduce this in the DB:
image

The inner query should have an additional GROUP BY log_visit.idvisit. I've tried to tweak the inner query to group by log_visit.idvisit but it wasn't applied. Debugged it and noticed it is caused by https://github.com/matomo-org/matomo/blob/3.8.0-b5/core/DataAccess/LogQueryBuilder.php#L170-L172 which could be a performance related issue if we add it. Don't think it would be a big performance issue in this case but not sure. The group by is certainly needed there with any non-visit segment.

@mattab any thoughts re performance? I see you added it in 2014.

Moving the issue into 3.10 for now.

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label Dec 20, 2018
@tsteur tsteur added this to the 3.10.0 milestone Dec 20, 2018
@mattab
Copy link
Member

mattab commented Jan 24, 2019

Don't think it would be a big performance issue in this case but not sure.
The group by is certainly needed there with any non-visit segment.

If the group by is needed, reckon we must add it back.
But, is it possible to do so only in this particular case, somehow? Ie. wouldn't want to regress performance wider than necessary..

@tsteur
Copy link
Member

tsteur commented Jan 24, 2019

But, is it possible to do so only in this particular case, somehow?

that might be possible by setting $queryBuilder->forceInnerGroupBySubselect()

@tsteur tsteur added the c: Performance For when we could improve the performance / speed of Matomo. label May 12, 2019
@tsteur
Copy link
Member

tsteur commented May 12, 2019

FYI: Looked into this again. So we basically would need to set groupBy= log_visit.idvisit here: https://github.com/matomo-org/matomo/blob/3.10.0-b1/plugins/Live/Model.php#L281 and at the same time we could then remove the outer query and the innerLimit workaround from #9200 (https://github.com/matomo-org/matomo/blob/3.10.0-b1/plugins/Live/Model.php#L293-L297)

We would set the groupBy only when there is a segment defined.

The problem is when using a monthly or yearly period, or a long range period, then it can obviously get very slow. Maybe for monthly and yearly periods we would need to fetch 1000 times more rows here https://github.com/matomo-org/matomo/blob/3.10.0-b1/plugins/Live/Model.php#L296? Ideally we get rid of this though as the paging will always be broken but maybe there is no alternative as a partial result might be better than no result.

Or if there is a yearly period, could we split it down into up to 52 individual queries for each week and stop querying more data as soon as we have enough results? So if someone requests 250 visitors, in most cases we only issue one query for the first week (assuming there were 250 visitors in the first week). If someone was to fetch unlimited visitors (filter_limit=-1), then we would issue 52 queries (if a segment is applied). Depending on the segment we might need to always issue 52 queries though when period=year if the segment is eg IP or userId related.

For now I'll mark #9200 as a duplicate.

@tsteur
Copy link
Member

tsteur commented May 12, 2019

We may in the end not really be able to fix it. Or we only decide to fix it for day and week period which may be the best thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants