@toredash opened this Issue on October 31st 2019 Contributor

Hi,

There seems to be an issue with auto archiving not picking up the correct date for when the last archive was completed successfully. This results in the pre-processing of period = day to be set to date = last52 rather than the actual last date/time.

Matomo version: 3.12.0
MySQL version: 5.6.10-log
PHP version: 7.3.11

First indication of this error was that all archive logs show period = day being set to date = last52:

07:38:17] 25966 Will pre-process for website id = 8, period = day, date = last52
07:48:39] 25966 Will pre-process for website id = 10, period = day, date = last52
08:00:09] 25966 Will pre-process for website id = 14, period = day, date = last52
09:00:36] 4592 Will pre-process for website id = 8, period = day, date = last52
09:00:40] 4592 Will pre-process for website id = 10, period = day, date = last52
09:49:25] 4592 Will pre-process for website id = 14, period = day, date = last52

Testing this a low-volume site was chosen, all segments / historical data was purged for that that site (14). Then normal archiving was forced for site 14:
```# ./console core:archive --skip-idsites 8,10,15,16 --force-idsites=14
13:52:14] 25562 ---------------------------
13:52:14] 25562 INIT
13:52:14] 25562 Running Matomo 3.12.0 as Super User
13:52:14] 25562 ---------------------------
13:52:14] 25562 NOTES
13:52:14] 25562 - Reports for today will be processed at most every 3500 seconds. You can change this value in Matomo UI > Settings > General Settings.
13:52:14] 25562 - Reports for the current week/month/year will be requested at most every 3600 seconds.
13:52:14] 25562 - Archiving was last executed without error 45s ago
13:52:14] 25562 - Will invalidate archived reports for 2019-10-31 for following websites ids: 10,8
13:52:14] 25562 - Will process 1 websites (--force-idsites)
13:52:14] 25562 ---------------------------
13:52:14] 25562 START
13:52:14] 25562 Starting Matomo reports archiving...
13:52:14] 25562 Will pre-process for website id = 14, period = day, date = last52
13:52:14] 25562 - pre-processing all visits
13:52:46] 25562 Archived website id = 14, period = day, 0 segments, 6583 visits in last 52 days, 62 visits today, Time elapsed: 32.086s
13:52:46] 25562 Will pre-process for website id = 14, period = week, date = last2
13:52:46] 25562 - pre-processing all visits
13:52:49] 25562 Archived website id = 14, period = week, 0 segments, 1074 visits in last 2 weeks, 365 visits this week, Time elapsed: 2.554s
13:52:49] 25562 Will pre-process for website id = 14, period = month, date = last2
13:52:49] 25562 - pre-processing all visits
13:53:27] 25562 Archived website id = 14, period = month, 0 segments, 8307 visits in last 2 months, 3045 visits this month, Time elapsed: 38.378s
13:53:27] 25562 Will pre-process for website id = 14, period = year, date = last2
13:53:27] 25562 - pre-processing all visits
13:54:55] 25562 Archived website id = 14, period = year, 0 segments, 29498 visits in last 2 years, 29498 visits this year, Time elapsed: 87.848s
13:54:55] 25562 Archived website id = 14, 4 API requests, Time elapsed: 160.888s [1/1 done]
13:54:55] 25562 Done archiving!
13:54:55] 25562 ---------------------------
13:54:55] 25562 SUMMARY
13:54:55] 25562 Total visits for today across archived websites: 62
13:54:55] 25562 Archived today's reports for 1 websites
13:54:55] 25562 Archived week/month/year for 1 websites
13:54:55] 25562 Skipped 6 websites
13:54:55] 25562 - 0 skipped because no new visit since the last script execution
13:54:55] 25562 - 0 skipped because existing daily reports are less than 3500 seconds old
13:54:55] 25562 - 0 skipped because existing week/month/year periods reports are less than 3600 seconds old
13:54:55] 25562 Total API requests: 4
13:54:55] 25562 done: 1/1 100%, 62 vtoday, 1 wtoday, 1 wperiods, 4 req, 160896 ms, no error
13:54:55] 25562 Time elapsed: 160.896s
13:54:55] 25562 ---------------------------
13:54:55] 25562 SCHEDULED TASKS
13:54:55] 25562 Starting Scheduled tasks...
13:54:55] 25562 done
13:54:55] 25562 ---------------------------


Then fake tracking data was added to the site, so that the archive job would start on next schedule.

Notice that on the next run, for site 14 the pre-process values for day: `period = day, date = last52`

./console core:archive --skip-idsites 8,10,15,16

14:06:15] 29730 ---------------------------
14:06:15] 29730 INIT
14:06:15] 29730 Running Matomo 3.12.0 as Super User
14:06:15] 29730 ---------------------------
14:06:15] 29730 NOTES
14:06:15] 29730 - Reports for today will be processed at most every 1 seconds. You can change this value in Matomo UI > Settings > General Settings.
14:06:15] 29730 - Reports for the current week/month/year will be requested at most every 3600 seconds.
14:06:15] 29730 - Archiving was last executed without error 2 min 51s ago
14:06:15] 29730 - Will invalidate archived reports for 2019-10-31 for following websites ids: 10,14,8
14:06:15] 29730 - Will process 4 other websites because some old data reports have been invalidated (eg. using the Log Import script or the InvalidateReports plugin) , IDs: 2,
10, 8, 14
14:06:16] 29730 ---------------------------
14:06:16] 29730 START
14:06:16] 29730 Starting Matomo reports archiving...
14:06:16] 29730 Skipped website id 8, found in --skip-idsites
14:06:16] 29730 - no new tracking data for website id 9 since 2019-10-31 14:03:24 UTC (since the last successful archiving)
14:06:16] 29730 Skipped website id 9 as archiving is not needed
14:06:16] 29730 Skipped website id 10, found in --skip-idsites
14:06:16] 29730 - no new tracking data for website id 12 since 2019-10-31 14:03:24 UTC (since the last successful archiving)
14:06:16] 29730 Skipped website id 12 as archiving is not needed
14:06:16] 29730 Old report was invalidated for website id 14
14:06:16] 29730 Will pre-process for website id = 14, period = day, date = last52
14:06:16] 29730 - pre-processing all visits
14:06:19] 29730 Archived website id = 14, period = day, 0 segments, 6599 visits in last 52 days, 78 visits today, Time elapsed: 3.331s
14:06:19] 29730 Will pre-process for website id = 14, period = week, date = last2
14:06:19] 29730 - pre-processing all visits
14:06:21] 29730 Archived website id = 14, period = week, 0 segments, 1090 visits in last 2 weeks, 381 visits this week, Time elapsed: 1.545s
14:06:21] 29730 Will pre-process for website id = 14, period = month, date = last2
14:06:21] 29730 - pre-processing all visits
14:06:24] 29730 Archived website id = 14, period = month, 0 segments, 8323 visits in last 2 months, 3061 visits this month, Time elapsed: 3.068s
14:06:24] 29730 Will pre-process for website id = 14, period = year, date = last2
14:06:24] 29730 - pre-processing all visits
14:06:26] 29730 Archived website id = 14, period = year, 0 segments, 29514 visits in last 2 years, 29514 visits this year, Time elapsed: 2.306s
14:06:26] 29730 Archived website id = 14, 4 API requests, Time elapsed: 10.303s [5/7 done]
14:06:26] 29730 Skipped website id 15, found in --skip-idsites
14:06:26] 29730 Skipped website id 16, found in --skip-idsites
14:06:26] 29730 Done archiving!
14:06:26] 29730 ---------------------------
14:06:26] 29730 SUMMARY
14:06:26] 29730 Total visits for today across archived websites: 78
14:06:26] 29730 Archived today's reports for 1 websites
14:06:26] 29730 Archived week/month/year for 1 websites
14:06:26] 29730 Skipped 6 websites
14:06:26] 29730 - 2 skipped because no new visit since the last script execution
14:06:26] 29730 - 0 skipped because existing daily reports are less than 1 seconds old
14:06:26] 29730 - 0 skipped because existing week/month/year periods reports are less than 3600 seconds old
14:06:26] 29730 Total API requests: 4
14:06:26] 29730 done: 1/7 14%, 78 vtoday, 1 wtoday, 1 wperiods, 4 req, 10611 ms, no error
14:06:26] 29730 Time elapsed: 10.611s
14:06:26] 29730 ---------------------------
14:06:26] 29730 SCHEDULED TASKS
14:06:26] 29730 Starting Scheduled tasks...
14:06:26] 29730 done
14:06:26] 29730 ---------------------------


The archive code indicates that last52 is the default value for days and months (https://github.com/matomo-org/matomo/blob/3.x-dev/core/CronArchive.php#L52-L54). There is also a function `getDateLastN` (https://github.com/matomo-org/matomo/blob/3.x-dev/core/CronArchive.php#L1617-L1641) that returns the expected `lastX` value. Same method also defines `last2` as the minimum value to return.

Looking at db table `matomo_options`, I find `'lastRunArchiveday_14', '1572530779', '0'` where the EPOCH value is `10/31/2019 @ 2:06pm (UTC)`, which matches the last archive run as stated above. It seems matomo is storing the correct completion value, but for some reason uses `last52` rather than `last2` which seems to be the correct value.

Running the archive job a third time for site 14, updates the `lastRunArchiveday_14` value in matomo_options, but still the archiver wants to proceed with a date value of `last52`:

./console core:archive --skip-idsites 8,10,15,16

...
14:16:55] 30305 Will pre-process for website id = 14, period = day, date = last52


I'm not good with PHP so I'm having trouble figuring out where there might be a bug, but I'm suspecting that this is the code that causes issue:
    if (empty($lastTimestampWebsiteProcessed)) {
        $creationDateFor = \Piwik\Site::getCreationDateFor($idSite);
        $lastTimestampWebsiteProcessed = strtotime($creationDateFor);
    }

https://github.com/matomo-org/matomo/blob/3.x-dev/core/CronArchive.php#L1625

Let me know if there is anything missing to help with resolving the issue
@mikkeschiren commented on November 2nd 2019

(edited, had an unrelated environment bug)

I hope this could get prioritized as an 3.12.1 release, it has an huge impact on our setup right now, and a lot of manual work is needed.

@tsteur commented on November 3rd 2019 Member

I reckon this could explain maybe some timeouts in the tests. 👍

@tsteur commented on November 3rd 2019 Member

I've only had a very quick look. I reckon it might be related to https://github.com/matomo-org/matomo/pull/14639/files#diff-9bca48e482124d44a3d910acc809194eR561-R567 and that we're now invalidating reports for today which was not the case before.

It seems as a result, the last process date is being ignored here and set to false: https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L869-L875

I don't know if I understand correctly but it seems to always fallback to 52 if there was some archive invalidated which kind of would have been a bug for quite a while cause it should rather take into consideration what date(s) has been invalidated etc.

Be good if someone could have a look.

@mikkeschiren commented on November 4th 2019

Yeah I have seen that last52 in some other cases also.

@toredash commented on November 4th 2019 Contributor

Our current workaround for this is to run the archive process with --force-date-last-n 2

Powered by GitHub Issue Mirror