Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong data in visitors overview over time #15170

Closed
kiliandr opened this issue Nov 18, 2019 · 18 comments · Fixed by #15272
Closed

Wrong data in visitors overview over time #15170

kiliandr opened this issue Nov 18, 2019 · 18 comments · Fixed by #15272
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@kiliandr
Copy link

In Matomo 3.12.0 a bug appeared that i have described here:
https://forum.matomo.org/t/visitors-overview-not-updating/34804/7

@tsteur
Copy link
Member

tsteur commented Nov 18, 2019

@kiliandr check your webserver (and PHP) error logs if there's any error in there. And I recommend you echo the output of the cron into a log file as well to see if there are ever any errors or so.

Otherwise from our side everything seems fine and not sure there's much we can do. Be good to check for errors and double check if you have any custom configuration in config/config.ini.php

@betarobot
Copy link

betarobot commented Nov 21, 2019

Not sure if it is connected, but I also see some odd calculation after updating to 3.12.0.
More here: https://forum.matomo.org/t/iframe-widget-showing-wrong-results/34954

No errors in logs, cron runs smooth.

@kiliandr
Copy link
Author

@tsteur thanks. It is a standard configuration on a standard webhoster package.
if i export the data as .csv it seems to be correct. The panel still shows me wrong data.

@tsteur
Copy link
Member

tsteur commented Nov 27, 2019

Any chance you can send us access to your Matomo install together with some instructions on where/how to reproduce this issue to hello at matomo.org ?

@tsteur
Copy link
Member

tsteur commented Nov 27, 2019

When you say visits over time do you sum eg the days manually? When you say visitors do you mean unique visitors or visits? Do you sum days, weeks or months?

@tsteur
Copy link
Member

tsteur commented Dec 5, 2019

Thanks for sending us the details @kiliandr

I can reproduce the issue and the sum of each individual day is indeed not the sum of the shown visits for month.

Since we have a few reports around this recently, I'm thinking this may be the issue:

https://github.com/matomo-org/matomo/blob/3.13.0/core/ArchiveProcessor/Rules.php#L298-L302

Since we removed temporary archives in 3.11 or 3.12, we need to always include them just like we did with temporary archives. Otherwise these archives won't be included in https://github.com/matomo-org/matomo/blob/3.13.0/core/DataAccess/ArchiveSelector.php#L160-L165

I suppose the logic will need to make sure we also fetch invalidated archives but always use the most recent archiveId and prefer a done archive if there is one.

This might be also related to #15226

Just a guess but reckon the issue could be around there.

@diosmosis any thoughts maybe? I'm not so much into this right now.

@tsteur tsteur added this to the 3.13.1 milestone Dec 5, 2019
@tsteur tsteur added Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should. labels Dec 5, 2019
@diosmosis diosmosis self-assigned this Dec 6, 2019
@diosmosis
Copy link
Member

@tsteur I tried to reproduce this in a test:

    public function test_multiPeriodArchiving_includesLatestUsableArchivesForSubperiods()
    {
        $idSite = Fixture::createWebsite('2012-02-02 00:00:00');

        // track across 3 days
        $tracker = Fixture::getTracker($idSite, '2019-10-10 03:44:45');
        $tracker->enableBulkTracking();
        $tracker->setUrl('http://example.com/page1.html');
        $this->assertTrue($tracker->doTrackPageView('page 1'));

        $tracker->setForceVisitDateTime('2019-10-11 12:02:30');
        $tracker->setUrl('http://example.com/page2.html');
        $this->assertTrue($tracker->doTrackPageView('page 2'));

        $tracker->setForceVisitDateTime('2019-10-12 12:02:30');
        $tracker->setUrl('http://example.com/page3.html');
        $this->assertTrue($tracker->doTrackPageView('page 3'));

        Fixture::checkBulkTrackingResponse($tracker->doBulkTrack());

        // create day archives
        API::getInstance()->get($idSite, 'day', '2019-10-10');
        API::getInstance()->get($idSite, 'day', '2019-10-11');
        API::getInstance()->get($idSite, 'day', '2019-10-12');

        // track another visit for one day & invalidate archive
        $tracker->setForceVisitDateTime('2019-10-11 16:02:30');
        $tracker->setUrl('http://example.com/page4.html');
        $this->assertTrue($tracker->doTrackPageView('page 4'));
        Fixture::checkBulkTrackingResponse($tracker->doBulkTrack());

        $cronArchive = new CronArchive();
        $cronArchive->invalidateArchivedReportsForSitesThatNeedToBeArchivedAgain();

        $invalidatedArchives = Db::fetchOne('SELECT COUNT(*) FROM ' . Common::prefixTable('archive_numeric_2019_10') . ' WHERE name LIKE "done%" AND value = ' . ArchiveWriter::DONE_INVALIDATED);
        $this->assertGreaterThan(0, $invalidatedArchives);

        // create week archive && check visits
        $result = API::getInstance()->get($idSite, 'week', '2019-10-10');
        $visits = $result->getFirstRow()->getColumn('nb_visits');
        $this->assertEquals(4, $visits);
    }

but it's passing for me w/ the current code. Can you provide some steps to how you reproduced?

I wonder if this is somehow a race condition where archives are invalidated while archiving is going on...

@tsteur
Copy link
Member

tsteur commented Dec 7, 2019

@diosmosis not sure how to reproduce but it looks like there could be various edge cases where it maybe doesn't work. Like

It is trying to archive week, and doesn't include today because it only has an invalid archive. Then because of some race conditions it might set the week as done when it is processing the previous week even though the last day within the week might only have an invalid archive so far for some reason. Maybe people select the week as a range and it gets archived as done or something.

In general, the invalid archives have basically become the temporary archives and we used to always look at them so it sounds to be like we should always also look at an invalid archive. But the code seems to not do this under some conditions.

@diosmosis
Copy link
Member

It is trying to archive week, and doesn't include today because it only has an invalid archive.

In this case it should initiate archiving for the invalidated day archive (that is what my test ended up doing).

In general, the invalid archives have basically become the temporary archives and we used to always look at them so it sounds to be like we should always also look at an invalid archive.

Strangely when I add that as a possible value and ran the test, I ended up getting a completely incorrect value... I'll keep looking.

@diosmosis
Copy link
Member

Ok, I think I found the race condition:

  1. day period gets archived
  2. week period starts archiving
  3. tracking visit comes in for today and day period is invalidated. week period isn't done archiving, so it doesn't get invalidated.
  4. week period finishes archiving. at this point there is an invalid day period and valid week period.
  5. month period gets archived, it uses a day period causing the day period to get re-archived, without the week being aware. the day period now has the new visit in 3., but the week period does not.

I was going to attempt a fix by making sure invalidated archives are used in ArchiveProcessor. However that's not the whole fix since we need to be able to invalidate in progress archives as well, so the week/month/etc. in this case eventually gets reprocessed. Or perhaps we don't perform actual invalidation of archives if archiving is on-going?

@tsteur
Copy link
Member

tsteur commented Dec 8, 2019

week period finishes archiving. at this point there is an invalid day period and valid week period.

It should invalidate that week period as well. I guess that's maybe why we had temporary archives in the first place. Do you have an idea to workaround it?

@tsteur
Copy link
Member

tsteur commented Dec 8, 2019

Also in general between archive start and archive end there might be only an invalidated archive there and no done archive? Like CronArchive starts, it invalidates an archive, then the tracker starts, and say 30 min later it writes the done archive (it may take a while for the archive to finish)?

@diosmosis
Copy link
Member

It should invalidate that week period as well. I guess that's maybe why we had temporary archives in the first place. Do you have an idea to workaround it?

Temp archives were around well before invalidated archives were. I suspect we just never ran into the race condition when implementing invalidating archives since we never had to think about it... Don't know how to work around it just yet.

Also in general between archive start and archive end there might be only an invalidated archive there and no done archive? Like CronArchive starts, it invalidates an archive, then the tracker starts, and say 30 min later it writes the done archive (it may take a while for the archive to finish)?

That's what I thought, if there's only one core:archive process then the invalidation should only occur once and not interfere with archiving. But looking at the archives on demo.matomo.org, there's a day archive that's invalidated w/ a timestamp similar to the week archive, and another day archive w/ DONE_OK 2 mins later that I think corresponds to the month archive. I'm not 100% sure how it happens, but I think something needs to be reworked in how invalidation is done.

@johsin18
Copy link
Contributor

Not sure my problem is related, but I have another inconsistent statistic. Matomo counts only 1285 visits for the week in Visits over Time (which was already over when viewing this), but the Visits by Day Of Week sum up to 1480.

Matomo wrong sum

@diosmosis
Copy link
Member

@johsin18 Does this happen with relatively recent data or just old data?

@johsin18
Copy link
Contributor

Well, as you can see it was the week of Nov 25th, 2019. I noticed the problem already the Monday after. I had the impression that it was caused by the Matomo update to 3.13.0, which I installed at some point in time in that week (released on 27th).

@diosmosis
Copy link
Member

@johsin18 just to confirm, you can't see the issue with old data, correct?

@johsin18
Copy link
Contributor

No, old data is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants