Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time spent on page calculation is buggy #9198

Open
tsteur opened this issue Nov 11, 2015 · 17 comments
Open

Time spent on page calculation is buggy #9198

tsteur opened this issue Nov 11, 2015 · 17 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Data Integrity & Accuracy Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Nov 11, 2015

  1. Avg time spent on a page is calculated by dividing the sum of all time_spent_ref_action divided by the number of visits nb_visits. Not all visits have time_spent_ref_action though. Instead sum_time_spent should be divided by something like nb_hits_with_time_spent.
  2. In tracker we calculate the time_spent_ref_action wrong. It calculates visit_last_action_time - currentTimestamp but visit_last_action_time is updated on any tracking call, meaning also on any hit.

To make it a bit more clear let's say there are the following tracking calls

  • pageview
  • event
  • pageview

The time spent for first pageview is calculated by the time difference between the event and the pageview, not between the two pageviews. This means for many common scenarios where one triggers a pageview and then an event, search, content impression, ... the time spent information is not accurate.

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label Nov 11, 2015
@hpvd
Copy link

hpvd commented Nov 12, 2015

uuh good find.- that's quite important!

@SR-mkuhn
Copy link

@tsteur
Copy link
Member Author

tsteur commented Nov 15, 2015

I think the issue I mentioned here has been an issue for a long time and not only since the last update. From which Piwik version did you update? I presume the problem you are describing might be actually a different one

@mattab
Copy link
Member

mattab commented Nov 18, 2015

Marking this issue as duplicate of #9199 - which was renamed to include in its scope this bug

Edit: re-opened this issue as it may be easier to fix this one rather than #9199

See also: #9539

@mattab mattab closed this as completed Nov 18, 2015
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label Nov 18, 2015
@SR-mkuhn
Copy link

@tsteur : from 2.14.3 to 2.15.0
And this is just one of 800 Sites tracked in one Piwik instance. (it has effect on the other 799 too)

@SR-mkuhn
Copy link

Main Question is: which counting is correct?

@tsteur
Copy link
Member Author

tsteur commented Nov 22, 2015

@SR-mkuhn this particular issue has been buggy for quite a while and not only since last update I think.

@tsteur
Copy link
Member Author

tsteur commented Nov 22, 2015

@SR-mkuhn maybe create a new issue for your problem and describe it there

@mattab mattab reopened this Nov 23, 2015
@mattab mattab removed the duplicate For issues that already existed in our issue tracker and were reported previously. label Nov 23, 2015
@mattab mattab added this to the Short term milestone Nov 25, 2015
@sebastianpiskorski
Copy link
Contributor

I've also encountered this issue recently. And I've found that there is problem with Metric calculation:

Time spent on site is defined as sum_time_spent and calculated as SUM():
https://github.com/piwik/plugin-CustomDimensions/blob/master/Archiver.php#L164

SUM() function in SQL databases omits records containing NULL values. Later average time spend on time is calculated by dividing this sum by number of visits nb_hits ( https://github.com/piwik/piwik/blob/master/plugins/Actions/Columns/Metrics/AverageTimeOnPage.php#L39 ) which is calculated as COUNT(*) ( https://github.com/piwik/piwik/blob/master/plugins/Actions/Archiver.php#L359 ).

The problem is that COUNT(*) counts all rows, even those containing NULL value. So average value isn't average at all. Solution would be use of SUM(COALESCE(sum_time_spent, 0)) which will count NULL values or introducing nb_hits_with_time_spent as COUNT(sum_time_spent), then using it to divide as @tsteur said.

@petecroaker
Copy link

I’ve just encountered the same issue. Checking at the Visitor Profile, I can see that if I look at interactions where we have a page with lots of tracking events occurring, the Page View event gets a minimal time whereas the events are given times between each interaction. As such the user could be interacting on a page for say a minute or more, triggering numerous events, but the page dwell time would be still close to zero. Surely this is a major bug. This means that any page which has subsequent events occurring will have a incorrect dwell time.

@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Jun 6, 2017
@mattab
Copy link
Member

mattab commented Jun 6, 2017

Would agree this bug is quite major as it is causing the Time on page to be wrong, for any page tracking events.

@hpvd
Copy link

hpvd commented Jun 6, 2017

it is causing the Time on page to be wrong, for any page tracking events.

Possibly there is a strong relationship to:
we have no "time on url": Piwik handels events as leaving page (at least in visitor log) #11546

@mgloss
Copy link

mgloss commented Nov 29, 2018

Has it been already solved? We had Piwik 2.something, now we are in upgrade process to matomo 3.7. and I am wondering if it will be correct. I have checked the previous data in database. Zou can easily see, when filter one specific idvisit in piwik_log_link_action_table and you will see, that every event will close the time on pageview. Especially if you are usinng some events like formSeen, bannerImpression etc. you will understand that it is not correct. And also time spent with some events is soo funny. Thank you also for pointing me to some another issue etc.

@tsteur
Copy link
Member Author

tsteur commented Nov 29, 2018

As the issue is still open I don't think anything has been solved here yet AFAIK. @mattab that might be indeed quite important to fix the time on page.

@hatdio
Copy link

hatdio commented Jun 26, 2019

I also encountered this issue getting wrong time on page.
Is this planned by anyone?
I know it's in the backlog, but it's the older of the - only - two issues labeled as Major + Bug

@rennyeb
Copy link

rennyeb commented May 20, 2022

I'm being hit by this problem, too - my "AVG. TIME ON PAGE" numbers are coming out as near-zero due to events on the page.

How do we get this bug prioritised for fixing, please?

In case it's helpful to anyone, in my local Matomo deployment I unashamedly hacked my ./plugins/Actions/Archiver.php file and commented-out the line which restricts by getWhereClauseActionIsNotEvent:

    /**
     * Time per action
     */
    protected function archiveDayActionsTime($rankingQueryLimit)
    {
        $rankingQuery = false;
        if ($rankingQueryLimit > 0) {
            $rankingQuery = new RankingQuery($rankingQueryLimit);
            $rankingQuery->addLabelColumn('idaction');
            $rankingQuery->addColumn(PiwikMetrics::INDEX_PAGE_SUM_TIME_SPENT, 'sum');
            $rankingQuery->partitionResultIntoMultipleGroups('type', array_keys($this->actionsTablesByType));

            $extraSelects = "log_action.type, log_action.name, count(*) as `" . PiwikMetrics::INDEX_PAGE_NB_HITS . "`,";
            $from = array(
                "log_link_visit_action",
                array(
                    "table"  => "log_action",
                    "joinOn" => "log_link_visit_action.%s = log_action.idaction"
                )
            );
            $orderBy = "`" . PiwikMetrics::INDEX_PAGE_NB_HITS . "` DESC, log_action.name ASC";
        } else {
            $extraSelects = false;
            $from = "log_link_visit_action";
            $orderBy = false;
        }

        $select = "log_link_visit_action.%s as idaction, $extraSelects
                sum(log_link_visit_action.time_spent_ref_action) as `" . PiwikMetrics::INDEX_PAGE_SUM_TIME_SPENT . "`";

        $where = $this->getLogAggregator()->getWhereStatement('log_link_visit_action', 'server_time');
        $where .= " AND log_link_visit_action.time_spent_ref_action > 0
                 AND log_link_visit_action.%s > 0"
//            . $this->getWhereClauseActionIsNotEvent() //include time spent in events as well
;

Informally, this worked for my use case - I haven't given any thought about whether this is a robust solution.

@desertking
Copy link

How can this issue still be open after 7 years? Just wondered why the big gap between GA3 (Universal) data and this one could be and found out that his has been discussed a few times. Is there any ohter workaround in the code for that to ignore the users that spent 0 time in the "avg time on page" row?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Data Integrity & Accuracy Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

10 participants