@linoacarvalho opened this Issue on December 14th 2018

I have created a basic segment to show this unexpected behavior.

1

When I apply the segment as shown in above image, I expect to see 5 visits in Visitor Log. But, when the segment is applied, there are only 3 visitors in Visitor Log as shown in the image below.

2

I have observed this behavior in more complex segments, but I can reproduce it in basic segments when I use the options Action URL or Page URL.

I'm doing something wrong or this is really a bug!? I'm using Matomo 3.7.0, but this occurs in release 3.5.1 too.

Best regards.

@tsteur commented on December 14th 2018 Member
@linoacarvalho commented on December 17th 2018

More information for this issue:

  • I imported the logs with Log Analytics tool (Python script).
  • This mismatch not occurs if the options Action URL or Page URL isn't used in segment (but i need this). Because of this i don't think this is timezone related.
  • Sometimes the Visitor Log shows "more" visitors than in Overview. In this case the visits in Visitor Log (when I click Next) are shown as duplicated (or even more).
@diosmosis commented on December 18th 2018 Member

Tried to reproduce this locally, w/ & w/o different timezone, but it's all working for me.

@linoacarvalho if you try a larger period, like month or year, are the results accurate? And if you look at each day within the week (assuming there's data for more than one day), are all the days inaccurate or just some? And just to be certain, if Provider isn't a part of the segment, the same inaccuracy is still present?

@linoacarvalho commented on December 18th 2018

@diosmosis with a larger period the results aren't accurate too. If I look at each day within the week, just few days are accurate. The Provider was just used for generate the images with small results, with or without it don't change the accuracy, but in my tests I could observe that the inaccuracy is present only if I use Action URL or Page URL as a part of the segment.

@tsteur commented on December 18th 2018 Member

What happens when you increase the limit in bottom right from 10 to say 100? I've also tried to reproduce but works fine for me here.

@diosmosis commented on December 19th 2018 Member

One other suggestion if increasing the limit doesn't help: if you compare one of the days that is accurate vs one of the days that is inaccurate, do you see any noticeable differences? Does one day, eg, have visits near midnight while the other doesn't? Or maybe one day has visits that span multiple days?

@linoacarvalho commented on December 19th 2018

@tsteur in my tests, i used a segment with 22 visitors in overview. See whats happen changing what you suggested:

  • When used the limit 10 in bottom right, is shown 8 visitors in first page of Visitor Log. The visits shown in Visitor Log when i click "Next" or "Previous" are repeated parts of the set of visits. In a page is shown 8 visitors, in other 10, in other 7, in other 2, until i get the message "There is no data for this report". If I click "Next", until is shown "There is no data for this report", the sum of visitors is 170.
  • When used the limit 100 in bottom right, is shown 22 visitors in first page of Visitor Log. Despite the fact that all visitors are shown in the first page, if i click "Next" (until i get "There is no data for this report"), the sum of visitors is 40.
  • Apparentelly if i "filter" the unique visits shown in the all pages of Visitor Log, i will see the 22 visits.

@diosmosis i don't see any noticeables diferences, the problem occurs with or without visits near midnight. However, maybe some days has visits that span multiple days.

In more tests, i got a situation with 526 visitors in overview and only one visitor in the first page of Visitor Log (with limit 10 in bottom right). When the limit is raised to at least 250, two visitors are shown.

Again, if i don't use Page URL as part of a segment, this behavior doesn't occurs, everything is perfect.

Maybe the Log Analytics tool (Python script) is importing any garbage? Maybe there are many similars URL matched by the Action/Page URL of a segment?

@tsteur commented on December 19th 2018 Member

Do I see this right that when you select the limit 100, then you see the 22 visits in the visits log which is also shown in the visits overview and the number is correct? Of course I see you have the problem with the paging. This might be due to https://github.com/matomo-org/matomo/issues/9200 and https://github.com/matomo-org/matomo/pull/10024 . I reckon it might be just a duplicate of those issues?

@linoacarvalho commented on December 20th 2018

@tsteur on the case used in tests yesterday this is correct. In other case that i have tested today, with 441 visitors in overview, i get 1 visitor with the limit 10 in bottom right and 8 visitors with the limit 500 in bottom right on Visitor Log.

I confirm that i get same behaviour using CustomVariable as a part of segment, without using Action or Page URL. It might be a duplicate of those issues because if i add '& filter_limit = -1' to the end URL, all visitors are shown in Visitor Log, but it differs from the description of the issue #9200 in the following things:

  • I'm not getting less visitors in Visitor Log than the overview (maybe i don't understand this before). The issue is based on not getting all visitors (or the limit specified on bottom right) in a page on the Visitors Log, even if the limit in bottom right it is greater than the total visitors shown in overview.
  • When clicked next i do not ever get a page showing no date.
  • The visitors, counted by the sum of all pages exhibited in Visitor Log, is often greater than the overview.

I have seen the issue #9200 before, but thought that problem wasn't applicable to version 3. I considered that this was a old issue and related to version 2 of Matomo. You think this is the same problem?

Thanks for your help and time.

@tsteur commented on December 20th 2018 Member

I was going to close the issue and reference to #9200 but I think I can see what happens. First of all... unfortunately #9200 is still an issue in Matomo 3 :(

Then instead of just reproducing the issue I should have debugged it and looked at the query it generates and would have likely understand it much faster, sorry about that! I think you're absolutely right that it is a different issue. I think it happens when the visitors have a lot of pageviews or quite a few... In my test instance most visitors had only one pageview or two and it wasn't noticeable.

The query generated is eg


            SELECT sub.* FROM (
                /* idSegments = [44] */

            SELECT
                log_inner.*
            FROM

        (

            SELECT
                log_visit.*
            FROM
                piwik_log_visit AS log_visit LEFT JOIN piwik_log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
            WHERE
                ( log_visit.idsite in (?) 
                AND log_visit.visit_last_action_time >= ?
                AND  log_visit.visit_last_action_time <= ? )
                AND
                ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', ?, '%')  AND type = 1 )) ) )
            ORDER BY
                idsite DESC, visit_last_action_time DESC LIMIT 0, 110
        ) AS log_inner
            ORDER BY
                idsite DESC, visit_last_action_time DESC
            ) AS sub
            GROUP BY sub.idvisit
            ORDER BY sub.visit_last_action_time DESC
        LIMIT 11 

I think the inner query likely generates one row per pageview a visitor had. So if eg 2 visitors had 55 pageviews each, then it would only show 2 visitors on the first page when it is grouped by visitor.

I just checked and can reproduce this in the DB:
image

The inner query should have an additional GROUP BY log_visit.idvisit. I've tried to tweak the inner query to group by log_visit.idvisit but it wasn't applied. Debugged it and noticed it is caused by https://github.com/matomo-org/matomo/blob/3.8.0-b5/core/DataAccess/LogQueryBuilder.php#L170-L172 which could be a performance related issue if we add it. Don't think it would be a big performance issue in this case but not sure. The group by is certainly needed there with any non-visit segment.

@mattab any thoughts re performance? I see you added it in 2014.

Moving the issue into 3.10 for now.

@mattab commented on January 24th 2019 Member

Don't think it would be a big performance issue in this case but not sure.
The group by is certainly needed there with any non-visit segment.

If the group by is needed, reckon we must add it back.
But, is it possible to do so only in this particular case, somehow? Ie. wouldn't want to regress performance wider than necessary..

@tsteur commented on January 24th 2019 Member

But, is it possible to do so only in this particular case, somehow?

that might be possible by setting $queryBuilder->forceInnerGroupBySubselect()

Powered by GitHub Issue Mirror