Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto Archiving falls back to DEFAULT_DATE_LAST even if last archiving ran until completion #15086

Closed
toredash opened this issue Oct 31, 2019 · 5 comments · Fixed by #15154
Closed
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@toredash
Copy link
Contributor

toredash commented Oct 31, 2019

Hi,

There seems to be an issue with auto archiving not picking up the correct date for when the last archive was completed successfully. This results in the pre-processing of period = day to be set to date = last52 rather than the actual last date/time.

Matomo version: 3.12.0
MySQL version: 5.6.10-log
PHP version: 7.3.11

First indication of this error was that all archive logs show period = day being set to date = last52:

07:38:17] 25966 Will pre-process for website id = 8, period = day, date = last52
07:48:39] 25966 Will pre-process for website id = 10, period = day, date = last52
08:00:09] 25966 Will pre-process for website id = 14, period = day, date = last52
09:00:36] 4592 Will pre-process for website id = 8, period = day, date = last52
09:00:40] 4592 Will pre-process for website id = 10, period = day, date = last52
09:49:25] 4592 Will pre-process for website id = 14, period = day, date = last52

Testing this a low-volume site was chosen, all segments / historical data was purged for that that site (14). Then normal archiving was forced for site 14:

13:52:14] 25562  ---------------------------
13:52:14] 25562  INIT
13:52:14] 25562  Running Matomo 3.12.0 as Super User
13:52:14] 25562  ---------------------------
13:52:14] 25562  NOTES
13:52:14] 25562  - Reports for today will be processed at most every 3500 seconds. You can change this value in Matomo UI > Settings > General Settings.
13:52:14] 25562  - Reports for the current week/month/year will be requested at most every 3600 seconds.
13:52:14] 25562  - Archiving was last executed without error 45s ago
13:52:14] 25562  - Will invalidate archived reports for 2019-10-31 for following websites ids: 10,8
13:52:14] 25562  - Will process 1 websites (--force-idsites)
13:52:14] 25562  ---------------------------
13:52:14] 25562  START
13:52:14] 25562  Starting Matomo reports archiving...
13:52:14] 25562  Will pre-process for website id = 14, period = day, date = last52
13:52:14] 25562  - pre-processing all visits
13:52:46] 25562  Archived website id = 14, period = day, 0 segments, 6583 visits in last 52 days, 62 visits today, Time elapsed: 32.086s
13:52:46] 25562  Will pre-process for website id = 14, period = week, date = last2
13:52:46] 25562  - pre-processing all visits
13:52:49] 25562  Archived website id = 14, period = week, 0 segments, 1074 visits in last 2 weeks, 365 visits this week, Time elapsed: 2.554s
13:52:49] 25562  Will pre-process for website id = 14, period = month, date = last2
13:52:49] 25562  - pre-processing all visits
13:53:27] 25562  Archived website id = 14, period = month, 0 segments, 8307 visits in last 2 months, 3045 visits this month, Time elapsed: 38.378s
13:53:27] 25562  Will pre-process for website id = 14, period = year, date = last2
13:53:27] 25562  - pre-processing all visits
13:54:55] 25562  Archived website id = 14, period = year, 0 segments, 29498 visits in last 2 years, 29498 visits this year, Time elapsed: 87.848s
13:54:55] 25562  Archived website id = 14, 4 API requests, Time elapsed: 160.888s [1/1 done]
13:54:55] 25562  Done archiving!
13:54:55] 25562  ---------------------------
13:54:55] 25562  SUMMARY
13:54:55] 25562  Total visits for today across archived websites: 62
13:54:55] 25562  Archived today's reports for 1 websites
13:54:55] 25562  Archived week/month/year for 1 websites
13:54:55] 25562  Skipped 6 websites
13:54:55] 25562  - 0 skipped because no new visit since the last script execution
13:54:55] 25562  - 0 skipped because existing daily reports are less than 3500 seconds old
13:54:55] 25562  - 0 skipped because existing week/month/year periods reports are less than 3600 seconds old
13:54:55] 25562  Total API requests: 4
13:54:55] 25562  done: 1/1 100%, 62 vtoday, 1 wtoday, 1 wperiods, 4 req, 160896 ms, no error
13:54:55] 25562  Time elapsed: 160.896s
13:54:55] 25562  ---------------------------
13:54:55] 25562  SCHEDULED TASKS
13:54:55] 25562  Starting Scheduled tasks... 
13:54:55] 25562  done
13:54:55] 25562  ---------------------------

Then fake tracking data was added to the site, so that the archive job would start on next schedule.

Notice that on the next run, for site 14 the pre-process values for day: period = day, date = last52

# ./console core:archive --skip-idsites 8,10,15,16 
14:06:15] 29730  ---------------------------
14:06:15] 29730  INIT
14:06:15] 29730  Running Matomo 3.12.0 as Super User
14:06:15] 29730  ---------------------------
14:06:15] 29730  NOTES
14:06:15] 29730  - Reports for today will be processed at most every 1 seconds. You can change this value in Matomo UI > Settings > General Settings.
14:06:15] 29730  - Reports for the current week/month/year will be requested at most every 3600 seconds.
14:06:15] 29730  - Archiving was last executed without error 2 min 51s ago
14:06:15] 29730  - Will invalidate archived reports for 2019-10-31 for following websites ids: 10,14,8
14:06:15] 29730  - Will process 4 other websites because some old data reports have been invalidated (eg. using the Log Import script or the InvalidateReports plugin) , IDs: 2,
 10, 8, 14
14:06:16] 29730  ---------------------------
14:06:16] 29730  START
14:06:16] 29730  Starting Matomo reports archiving...
14:06:16] 29730  Skipped website id 8, found in --skip-idsites 
14:06:16] 29730  - no new tracking data for website id 9 since 2019-10-31 14:03:24 UTC (since the last successful archiving)
14:06:16] 29730  Skipped website id 9 as archiving is not needed
14:06:16] 29730  Skipped website id 10, found in --skip-idsites 
14:06:16] 29730  - no new tracking data for website id 12 since 2019-10-31 14:03:24 UTC (since the last successful archiving)
14:06:16] 29730  Skipped website id 12 as archiving is not needed
14:06:16] 29730  Old report was invalidated for website id 14
14:06:16] 29730  **Will pre-process for website id = 14, period = day, date = last52**
14:06:16] 29730  - pre-processing all visits
14:06:19] 29730  Archived website id = 14, period = day, 0 segments, 6599 visits in last 52 days, 78 visits today, Time elapsed: 3.331s
14:06:19] 29730  Will pre-process for website id = 14, period = week, date = last2
14:06:19] 29730  - pre-processing all visits
14:06:21] 29730  Archived website id = 14, period = week, 0 segments, 1090 visits in last 2 weeks, 381 visits this week, Time elapsed: 1.545s
14:06:21] 29730  Will pre-process for website id = 14, period = month, date = last2
14:06:21] 29730  - pre-processing all visits
14:06:24] 29730  Archived website id = 14, period = month, 0 segments, 8323 visits in last 2 months, 3061 visits this month, Time elapsed: 3.068s
14:06:24] 29730  Will pre-process for website id = 14, period = year, date = last2
14:06:24] 29730  - pre-processing all visits
14:06:26] 29730  Archived website id = 14, period = year, 0 segments, 29514 visits in last 2 years, 29514 visits this year, Time elapsed: 2.306s
14:06:26] 29730  Archived website id = 14, 4 API requests, Time elapsed: 10.303s [5/7 done]
14:06:26] 29730  Skipped website id 15, found in --skip-idsites 
14:06:26] 29730  Skipped website id 16, found in --skip-idsites 
14:06:26] 29730  Done archiving!
14:06:26] 29730  ---------------------------
14:06:26] 29730  SUMMARY
14:06:26] 29730  Total visits for today across archived websites: 78
14:06:26] 29730  Archived today's reports for 1 websites
14:06:26] 29730  Archived week/month/year for 1 websites
14:06:26] 29730  Skipped 6 websites
14:06:26] 29730  - 2 skipped because no new visit since the last script execution
14:06:26] 29730  - 0 skipped because existing daily reports are less than 1 seconds old
14:06:26] 29730  - 0 skipped because existing week/month/year periods reports are less than 3600 seconds old
14:06:26] 29730  Total API requests: 4
14:06:26] 29730  done: 1/7 14%, 78 vtoday, 1 wtoday, 1 wperiods, 4 req, 10611 ms, no error
14:06:26] 29730  Time elapsed: 10.611s
14:06:26] 29730  ---------------------------
14:06:26] 29730  SCHEDULED TASKS
14:06:26] 29730  Starting Scheduled tasks... 
14:06:26] 29730  done
14:06:26] 29730  ---------------------------

The archive code indicates that last52 is the default value for days and months (https://github.com/matomo-org/matomo/blob/3.x-dev/core/CronArchive.php#L52-L54). There is also a function getDateLastN (https://github.com/matomo-org/matomo/blob/3.x-dev/core/CronArchive.php#L1617-L1641) that returns the expected lastX value. Same method also defines last2 as the minimum value to return.

Looking at db table matomo_options, I find 'lastRunArchiveday_14', '1572530779', '0' where the EPOCH value is 10/31/2019 @ 2:06pm (UTC), which matches the last archive run as stated above. It seems matomo is storing the correct completion value, but for some reason uses last52 rather than last2 which seems to be the correct value.

Running the archive job a third time for site 14, updates the lastRunArchiveday_14 value in matomo_options, but still the archiver wants to proceed with a date value of last52:

# ./console core:archive --skip-idsites 8,10,15,16 
...
 14:16:55] 30305  Will pre-process for website id = 14, period = day, date = last52

I'm not good with PHP so I'm having trouble figuring out where there might be a bug, but I'm suspecting that this is the code that causes issue:

        if (empty($lastTimestampWebsiteProcessed)) {
            $creationDateFor = \Piwik\Site::getCreationDateFor($idSite);
            $lastTimestampWebsiteProcessed = strtotime($creationDateFor);
        }

https://github.com/matomo-org/matomo/blob/3.x-dev/core/CronArchive.php#L1625

Let me know if there is anything missing to help with resolving the issue

@mattab mattab added this to the 3.13.0 milestone Nov 1, 2019
@mattab mattab added the Regression Indicates a feature used to work in a certain way but it no longer does even though it should. label Nov 1, 2019
@mikkeschiren
Copy link
Contributor

mikkeschiren commented Nov 2, 2019

(edited, had an unrelated environment bug)

I hope this could get prioritized as an 3.12.1 release, it has an huge impact on our setup right now, and a lot of manual work is needed.

@tsteur
Copy link
Member

tsteur commented Nov 3, 2019

I reckon this could explain maybe some timeouts in the tests. 👍

@tsteur
Copy link
Member

tsteur commented Nov 3, 2019

I've only had a very quick look. I reckon it might be related to https://github.com/matomo-org/matomo/pull/14639/files#diff-9bca48e482124d44a3d910acc809194eR561-R567 and that we're now invalidating reports for today which was not the case before.

It seems as a result, the last process date is being ignored here and set to false: https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L869-L875

I don't know if I understand correctly but it seems to always fallback to 52 if there was some archive invalidated which kind of would have been a bug for quite a while cause it should rather take into consideration what date(s) has been invalidated etc.

Be good if someone could have a look.

@mikkeschiren
Copy link
Contributor

Yeah I have seen that last52 in some other cases also.

@mattab mattab added Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. labels Nov 4, 2019
@toredash
Copy link
Contributor Author

toredash commented Nov 4, 2019

Our current workaround for this is to run the archive process with --force-date-last-n 2

tsteur added a commit to matomo-org/matomo-for-wordpress that referenced this issue Nov 5, 2019
@katebutler katebutler assigned katebutler and unassigned katebutler Nov 8, 2019
@diosmosis diosmosis self-assigned this Nov 12, 2019
tsteur added a commit to matomo-org/matomo-for-wordpress that referenced this issue Nov 28, 2019
tsteur added a commit to matomo-org/matomo-for-wordpress that referenced this issue Nov 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants