Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API response may include more unique visitors than actions #9357

Closed
mattab opened this issue Dec 7, 2015 · 3 comments
Closed

API response may include more unique visitors than actions #9357

mattab opened this issue Dec 7, 2015 · 3 comments
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Dec 7, 2015

It has been reported at least once that an API request was returning more "Unique visitors" than "Actions". This is un-expected as we should always have: nb_uniq_visitors <= nb_visits <= nb_actions

Here is the API response that was at least once:

[nb_uniq_visitors] => 2
[nb_users] => 0
[nb_visits] => 1
[nb_actions] => 1
[nb_visits_converted] => 0
[bounce_count] => 1
[sum_visit_length] => 0
[max_actions] => 1
[bounce_rate] => 100%
[nb_actions_per_visit] => 1
[avg_time_on_site] => 0

(request was method=VisitsSummary.get&idSite=10435&period=range&date=2015-11-16,2015-12-03 )

Questions:

  • How can we reproduce this issue?
  • What is the root cause?

Only one possible explanation we came up with so far:

  • imagine the period=range request includes today's data, and if the "Unique visitors processing" archiving query runs a few seconds after the standard metrics processing... then imagine that a new visitor visits today's in the few seconds right after the standard metrics processing and right before the unique visitors processing... then we could get nb_uniq_visitors = 2 while nb_visits = 1...
@mattab mattab added Bug For errors / faults / flaws / inconsistencies etc. Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. labels Dec 7, 2015
@mattab mattab added this to the Short term milestone Dec 23, 2015
@mattab mattab modified the milestones: 2.16.0, Mid term Jan 7, 2016
@mattab mattab removed the Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. label Jan 7, 2016
@mattab
Copy link
Member Author

mattab commented Jan 7, 2016

I still cannot reproduce the issue but it was reported again:

 [nb_uniq_visitors] => 11 [nb_users] => 0 [nb_visits] => 9 [nb_actions] => 18 [nb_visits_converted] => 0 [bounce_count] => 4 [sum_visit_length] => 4367 [max_actions] => 4 [bounce_rate] => 44% [nb_actions_per_visit] => 2 [avg_time_on_site] => 485

One solution for this likely edge-case issue would be to do: nb_uniq_visitors = min(nb_uniq_visitors, nb_visits).

@quba
Copy link
Contributor

quba commented Jan 18, 2016

@tsteur, @mattab,

personally, I don't like this approach to resolve this issue. This may work as a quick fix, but for the long term it looks like a hack.

Here are my ideas:

  1. (preferred but risky) After starting the archiving process, save current timestamp and calculate ALL reports for given site ID based on this timestamp. This will make sure that all reports are consistent and based on the same data set. Of course there will be some issue with the visit_last_action_time (we may report less visits for today) but still this is better than random differences because of new visits during archiving process.
  2. Calculate the number of uniques at the beginning of archive process for each period.

@tsteur
Copy link
Member

tsteur commented Jan 18, 2016

This might be something for the archiver refactoring #7470 where we can think about such cases. Right now it's not really doable since we need to rewrite the archiver first (and possibly also allow different backends like elastic search etc).

  1. If we do this first we have pretty much the same problem. Only solution would be probably actually to save the last action time and to always use a fixed one.

@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Jan 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

3 participants