I have created a basic segment to show this unexpected behavior.
When I apply the segment as shown in above image, I expect to see 5 visits in Visitor Log. But, when the segment is applied, there are only 3 visitors in Visitor Log as shown in the image below.
I have observed this behavior in more complex segments, but I can reproduce it in basic segments when I use the options Action URL or Page URL.
I'm doing something wrong or this is really a bug!? I'm using Matomo 3.7.0, but this occurs in release 3.5.1 too.
More information for this issue:
Tried to reproduce this locally, w/ & w/o different timezone, but it's all working for me.
@linoacarvalho if you try a larger period, like month or year, are the results accurate? And if you look at each day within the week (assuming there's data for more than one day), are all the days inaccurate or just some? And just to be certain, if Provider isn't a part of the segment, the same inaccuracy is still present?
@diosmosis with a larger period the results aren't accurate too. If I look at each day within the week, just few days are accurate. The Provider was just used for generate the images with small results, with or without it don't change the accuracy, but in my tests I could observe that the inaccuracy is present only if I use Action URL or Page URL as a part of the segment.
What happens when you increase the limit in bottom right from 10 to say 100? I've also tried to reproduce but works fine for me here.
One other suggestion if increasing the limit doesn't help: if you compare one of the days that is accurate vs one of the days that is inaccurate, do you see any noticeable differences? Does one day, eg, have visits near midnight while the other doesn't? Or maybe one day has visits that span multiple days?
@tsteur in my tests, i used a segment with 22 visitors in overview. See whats happen changing what you suggested:
@diosmosis i don't see any noticeables diferences, the problem occurs with or without visits near midnight. However, maybe some days has visits that span multiple days.
In more tests, i got a situation with 526 visitors in overview and only one visitor in the first page of Visitor Log (with limit 10 in bottom right). When the limit is raised to at least 250, two visitors are shown.
Again, if i don't use Page URL as part of a segment, this behavior doesn't occurs, everything is perfect.
Maybe the Log Analytics tool (Python script) is importing any garbage? Maybe there are many similars URL matched by the Action/Page URL of a segment?
Do I see this right that when you select the limit 100, then you see the 22 visits in the visits log which is also shown in the visits overview and the number is correct? Of course I see you have the problem with the paging. This might be due to https://github.com/matomo-org/matomo/issues/9200 and https://github.com/matomo-org/matomo/pull/10024 . I reckon it might be just a duplicate of those issues?
@tsteur on the case used in tests yesterday this is correct. In other case that i have tested today, with 441 visitors in overview, i get 1 visitor with the limit 10 in bottom right and 8 visitors with the limit 500 in bottom right on Visitor Log.
I confirm that i get same behaviour using CustomVariable as a part of segment, without using Action or Page URL. It might be a duplicate of those issues because if i add '& filter_limit = -1' to the end URL, all visitors are shown in Visitor Log, but it differs from the description of the issue #9200 in the following things:
I have seen the issue #9200 before, but thought that problem wasn't applicable to version 3. I considered that this was a old issue and related to version 2 of Matomo. You think this is the same problem?
Thanks for your help and time.
Then instead of just reproducing the issue I should have debugged it and looked at the query it generates and would have likely understand it much faster, sorry about that! I think you're absolutely right that it is a different issue. I think it happens when the visitors have a lot of pageviews or quite a few... In my test instance most visitors had only one pageview or two and it wasn't noticeable.
The query generated is eg
SELECT sub.* FROM ( /* idSegments =  */ SELECT log_inner.* FROM ( SELECT log_visit.* FROM piwik_log_visit AS log_visit LEFT JOIN piwik_log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit WHERE ( log_visit.idsite in (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ? ) AND ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', ?, '%') AND type = 1 )) ) ) ORDER BY idsite DESC, visit_last_action_time DESC LIMIT 0, 110 ) AS log_inner ORDER BY idsite DESC, visit_last_action_time DESC ) AS sub GROUP BY sub.idvisit ORDER BY sub.visit_last_action_time DESC LIMIT 11
I think the inner query likely generates one row per pageview a visitor had. So if eg 2 visitors had 55 pageviews each, then it would only show 2 visitors on the first page when it is grouped by visitor.
I just checked and can reproduce this in the DB:
The inner query should have an additional
GROUP BY log_visit.idvisit. I've tried to tweak the inner query to group by
log_visit.idvisit but it wasn't applied. Debugged it and noticed it is caused by https://github.com/matomo-org/matomo/blob/3.8.0-b5/core/DataAccess/LogQueryBuilder.php#L170-L172 which could be a performance related issue if we add it. Don't think it would be a big performance issue in this case but not sure. The group by is certainly needed there with any non-visit segment.
@mattab any thoughts re performance? I see you added it in 2014.
Moving the issue into 3.10 for now.
Don't think it would be a big performance issue in this case but not sure.
The group by is certainly needed there with any non-visit segment.
If the group by is needed, reckon we must add it back.
But, is it possible to do so only in this particular case, somehow? Ie. wouldn't want to regress performance wider than necessary..
But, is it possible to do so only in this particular case, somehow?
that might be possible by setting
FYI: Looked into this again. So we basically would need to set
groupBy= log_visit.idvisit here: https://github.com/matomo-org/matomo/blob/3.10.0-b1/plugins/Live/Model.php#L281 and at the same time we could then remove the outer query and the innerLimit workaround from #9200 (https://github.com/matomo-org/matomo/blob/3.10.0-b1/plugins/Live/Model.php#L293-L297)
We would set the groupBy only when there is a segment defined.
The problem is when using a monthly or yearly period, or a long range period, then it can obviously get very slow. Maybe for monthly and yearly periods we would need to fetch 1000 times more rows here https://github.com/matomo-org/matomo/blob/3.10.0-b1/plugins/Live/Model.php#L296? Ideally we get rid of this though as the paging will always be broken but maybe there is no alternative as a partial result might be better than no result.
Or if there is a yearly period, could we split it down into up to 52 individual queries for each week and stop querying more data as soon as we have enough results? So if someone requests 250 visitors, in most cases we only issue one query for the first week (assuming there were 250 visitors in the first week). If someone was to fetch unlimited visitors (filter_limit=-1), then we would issue 52 queries (if a segment is applied). Depending on the segment we might need to always issue 52 queries though when period=year if the segment is eg
For now I'll mark https://github.com/matomo-org/matomo/issues/9200 as a duplicate.
We may in the end not really be able to fix it. Or we only decide to fix it for day and week period which may be the best thing.