@kiliandr opened this Issue on November 18th 2019

In Matomo 3.12.0 a bug appeared that i have described here:
https://forum.matomo.org/t/visitors-overview-not-updating/34804/7

@tsteur commented on November 18th 2019 Member

@kiliandr check your webserver (and PHP) error logs if there's any error in there. And I recommend you echo the output of the cron into a log file as well to see if there are ever any errors or so.

Otherwise from our side everything seems fine and not sure there's much we can do. Be good to check for errors and double check if you have any custom configuration in config/config.ini.php

@betarobot commented on November 21st 2019

Not sure if it is connected, but I also see some odd calculation after updating to 3.12.0.
More here: https://forum.matomo.org/t/iframe-widget-showing-wrong-results/34954

No errors in logs, cron runs smooth.

@kiliandr commented on November 27th 2019

@tsteur thanks. It is a standard configuration on a standard webhoster package.
if i export the data as .csv it seems to be correct. The panel still shows me wrong data.

@tsteur commented on November 27th 2019 Member

Any chance you can send us access to your Matomo install together with some instructions on where/how to reproduce this issue to hello at matomo.org ?

@tsteur commented on November 27th 2019 Member

When you say visits over time do you sum eg the days manually? When you say visitors do you mean unique visitors or visits? Do you sum days, weeks or months?

@tsteur commented on December 5th 2019 Member

Thanks for sending us the details @kiliandr

I can reproduce the issue and the sum of each individual day is indeed not the sum of the shown visits for month.

Since we have a few reports around this recently, I'm thinking this may be the issue:

https://github.com/matomo-org/matomo/blob/3.13.0/core/ArchiveProcessor/Rules.php#L298-L302

Since we removed temporary archives in 3.11 or 3.12, we need to always include them just like we did with temporary archives. Otherwise these archives won't be included in https://github.com/matomo-org/matomo/blob/3.13.0/core/DataAccess/ArchiveSelector.php#L160-L165

I suppose the logic will need to make sure we also fetch invalidated archives but always use the most recent archiveId and prefer a done archive if there is one.

This might be also related to https://github.com/matomo-org/matomo/issues/15226

Just a guess but reckon the issue could be around there.

@diosmosis any thoughts maybe? I'm not so much into this right now.

@diosmosis commented on December 7th 2019 Member

@tsteur I tried to reproduce this in a test:

    public function test_multiPeriodArchiving_includesLatestUsableArchivesForSubperiods()
    {
        $idSite = Fixture::createWebsite('2012-02-02 00:00:00');

        // track across 3 days
        $tracker = Fixture::getTracker($idSite, '2019-10-10 03:44:45');
        $tracker->enableBulkTracking();
        $tracker->setUrl('http://example.com/page1.html');
        $this->assertTrue($tracker->doTrackPageView('page 1'));

        $tracker->setForceVisitDateTime('2019-10-11 12:02:30');
        $tracker->setUrl('http://example.com/page2.html');
        $this->assertTrue($tracker->doTrackPageView('page 2'));

        $tracker->setForceVisitDateTime('2019-10-12 12:02:30');
        $tracker->setUrl('http://example.com/page3.html');
        $this->assertTrue($tracker->doTrackPageView('page 3'));

        Fixture::checkBulkTrackingResponse($tracker->doBulkTrack());

        // create day archives
        API::getInstance()->get($idSite, 'day', '2019-10-10');
        API::getInstance()->get($idSite, 'day', '2019-10-11');
        API::getInstance()->get($idSite, 'day', '2019-10-12');

        // track another visit for one day & invalidate archive
        $tracker->setForceVisitDateTime('2019-10-11 16:02:30');
        $tracker->setUrl('http://example.com/page4.html');
        $this->assertTrue($tracker->doTrackPageView('page 4'));
        Fixture::checkBulkTrackingResponse($tracker->doBulkTrack());

        $cronArchive = new CronArchive();
        $cronArchive->invalidateArchivedReportsForSitesThatNeedToBeArchivedAgain();

        $invalidatedArchives = Db::fetchOne('SELECT COUNT(*) FROM ' . Common::prefixTable('archive_numeric_2019_10') . ' WHERE name LIKE "done%" AND value = ' . ArchiveWriter::DONE_INVALIDATED);
        $this->assertGreaterThan(0, $invalidatedArchives);

        // create week archive && check visits
        $result = API::getInstance()->get($idSite, 'week', '2019-10-10');
        $visits = $result->getFirstRow()->getColumn('nb_visits');
        $this->assertEquals(4, $visits);
    }

but it's passing for me w/ the current code. Can you provide some steps to how you reproduced?

I wonder if this is somehow a race condition where archives are invalidated while archiving is going on...

@tsteur commented on December 7th 2019 Member

@diosmosis not sure how to reproduce but it looks like there could be various edge cases where it maybe doesn't work. Like

It is trying to archive week, and doesn't include today because it only has an invalid archive. Then because of some race conditions it might set the week as done when it is processing the previous week even though the last day within the week might only have an invalid archive so far for some reason. Maybe people select the week as a range and it gets archived as done or something.

In general, the invalid archives have basically become the temporary archives and we used to always look at them so it sounds to be like we should always also look at an invalid archive. But the code seems to not do this under some conditions.

@diosmosis commented on December 7th 2019 Member

It is trying to archive week, and doesn't include today because it only has an invalid archive.

In this case it should initiate archiving for the invalidated day archive (that is what my test ended up doing).

In general, the invalid archives have basically become the temporary archives and we used to always look at them so it sounds to be like we should always also look at an invalid archive.

Strangely when I add that as a possible value and ran the test, I ended up getting a completely incorrect value... I'll keep looking.

@diosmosis commented on December 8th 2019 Member

Ok, I think I found the race condition:

  1. day period gets archived
  2. week period starts archiving
  3. tracking visit comes in for today and day period is invalidated. week period isn't done archiving, so it doesn't get invalidated.
  4. week period finishes archiving. at this point there is an invalid day period and valid week period.
  5. month period gets archived, it uses a day period causing the day period to get re-archived, without the week being aware. the day period now has the new visit in 3., but the week period does not.

I was going to attempt a fix by making sure invalidated archives are used in ArchiveProcessor. However that's not the whole fix since we need to be able to invalidate in progress archives as well, so the week/month/etc. in this case eventually gets reprocessed. Or perhaps we don't perform actual invalidation of archives if archiving is on-going?

@tsteur commented on December 8th 2019 Member

week period finishes archiving. at this point there is an invalid day period and valid week period.

It should invalidate that week period as well. I guess that's maybe why we had temporary archives in the first place. Do you have an idea to workaround it?

@tsteur commented on December 8th 2019 Member

Also in general between archive start and archive end there might be only an invalidated archive there and no done archive? Like CronArchive starts, it invalidates an archive, then the tracker starts, and say 30 min later it writes the done archive (it may take a while for the archive to finish)?

@diosmosis commented on December 8th 2019 Member

It should invalidate that week period as well. I guess that's maybe why we had temporary archives in the first place. Do you have an idea to workaround it?

Temp archives were around well before invalidated archives were. I suspect we just never ran into the race condition when implementing invalidating archives since we never had to think about it... Don't know how to work around it just yet.

Also in general between archive start and archive end there might be only an invalidated archive there and no done archive? Like CronArchive starts, it invalidates an archive, then the tracker starts, and say 30 min later it writes the done archive (it may take a while for the archive to finish)?

That's what I thought, if there's only one core:archive process then the invalidation should only occur once and not interfere with archiving. But looking at the archives on demo.matomo.org, there's a day archive that's invalidated w/ a timestamp similar to the week archive, and another day archive w/ DONE_OK 2 mins later that I think corresponds to the month archive. I'm not 100% sure how it happens, but I think something needs to be reworked in how invalidation is done.

@johsin18 commented on December 10th 2019 Contributor

Not sure my problem is related, but I have another inconsistent statistic. Matomo counts only 1285 visits for the week in Visits over Time (which was already over when viewing this), but the Visits by Day Of Week sum up to 1480.

Matomo wrong sum

@diosmosis commented on December 10th 2019 Member

@johsin18 Does this happen with relatively recent data or just old data?

@johsin18 commented on December 10th 2019 Contributor

Well, as you can see it was the week of Nov 25th, 2019. I noticed the problem already the Monday after. I had the impression that it was caused by the Matomo update to 3.13.0, which I installed at some point in time in that week (released on 27th).

@diosmosis commented on December 10th 2019 Member

@johsin18 just to confirm, you can't see the issue with old data, correct?

@johsin18 commented on December 12th 2019 Contributor

No, old data is correct.

Powered by GitHub Issue Mirror