New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite cron archiving process for easier maintenance and performance #15499
Conversation
…optimization for no visits for period + segment process from handling.
@tsteur it's needed to know when an archiving job was killed or was stopped for some other reason. in that case the archive row is set to DONE_IN_PROGRESS, but never gets finalized. So we use the lock + it's expiration time to know if we can restart it. Edit: it might be possible to remove the need if we have a start or expiration time in the archive and update that instead. Though that might just move the I/O from the lock table to the archive table. Edit 2: actually, maybe we can fix this by adding the invalidation table, can you see my comment in the other issue? |
@tsteur ready for another review, there is one test w/ a random failure (fails due to day change), and a couple minor TODO, but can be reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good... left only a few comments @diosmosis
@@ -299,6 +299,21 @@ public function getTablesCreateSql() | |||
) ENGINE=$engine DEFAULT CHARSET=utf8 | |||
", | |||
|
|||
'archive_invalidations' => "CREATE TABLE `{$prefixTables}archive_invalidations` ( | |||
idinvalidation BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, | |||
idarchive INTEGER UNSIGNED NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw is it supposed to be nullable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is in case it's for an archive that does not exist. Eg, if we invalidate a day archive, but there is no week, month or year (because here are no visits for them or it hasn't been processed), we still want to process them. In this case we have to set idarchive to NULL.
core/Db/Schema/Mysql.php
Outdated
ts_invalidated DATETIME NULL, | ||
status TINYINT(1) UNSIGNED DEFAULT 0, | ||
PRIMARY KEY(idinvalidation), | ||
INDEX index_idsite_dates_period_name(idsite, date1, date2, period, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw be the index maybe better idsite,period,date1
eg as used in the getNextInvalidatedArchive ()
query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that might work, will have to check it's ok for ranges in case I query for individual invalidations... I don't think I do...
// + 1 Done flag per Plugin, for each "Last N" date | ||
'archive_numeric_2010_01' => 138, | ||
// for each "Last N" date that has data (just one date) | ||
'archive_numeric_2010_01' => 22, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just double checking @diosmosis do you maybe know why it were before so many entries? is it less now because we delete some archives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has a big optimization for sites/periods that have no visits (see PR description). If we see there will be no visits for an archive, we don't even initiate archiving, so no archive is created.
core/DataAccess/Model.php
Outdated
} | ||
|
||
$fields = ['idarchive', 'name', 'idsite', 'date1', 'date2', 'period', 'ts_invalidated']; | ||
Db\BatchInsert::tableInsertBatch('archive_invalidations', $fields, $dummyArchives); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @diosmosis seems we need to pass the table with a prefixed name here... got this error:
Try #1: LOAD DATA INFILE : SQLSTATE[42S02]: Base table or view not found: 1146 Table 'piwik3.archive_invalidations' doesn't exist[42],
Try #2: LOAD DATA LOCAL INFILE : SQLSTATE[42000]: Syntax error or access violation: 1148 The used command is not allowed with this MySQL version[42000]
ERROR [2020-05-06 04:20:56] 5905 Uncaught exception: libs/Zend/Db/Statement/Pdo.php(234): SQLSTATE[HY093]: Invalid parameter number: parameter was not defined
SQLSTATE[HY093]: Invalid parameter number: parameter was not defined
#0 libs/Zend/Db/Statement.php(300): Zend_Db_Statement_Pdo->_execute(Array)
#1 libs/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array)
#2 libs/Zend/Db/Adapter/Pdo/Abstract.php(238): Zend_Db_Adapter_Abstract->query('INSERT IGNORE I...', Array)
#3 core/Db/Adapter/Pdo/Mysql.php(309): Zend_Db_Adapter_Pdo_Abstract->query('INSERT IGNORE I...', Array)
#4 core/Db.php(276): Piwik\Db\Adapter\Pdo\Mysql->query('INSERT IGNORE I...', Array)
#5 core/Db/BatchInsert.php(41): Piwik\Db::query('INSERT IGNORE I...', Array)
#6 core/Db/BatchInsert.php(133): Piwik\Db\BatchInsert::tableInsertBatchIterate('archive_invalid...', Array, Array)
#7 core/DataAccess/Model.php(202): Piwik\Db\BatchInsert::tableInsertBatch('archive_invalid...', Array, Array)
#8 core/Archive/ArchiveInvalidator.php(433): Piwik\DataAccess\Model->updateArchiveAsInvalidated('piwik_archive_n...', Array, Array, NULL, false)
#9 core/Archive/ArchiveInvalidator.php(285): Piwik\Archive\ArchiveInvalidator->markArchivesInvalidated(Array, Array, NULL, true, false)
#10 plugins/CoreAdminHome/API.php(163): Piwik\Archive\ArchiveInvalidator->markArchivesAsInvalidated(Array, Array, 'day', NULL, false, false)
#11 core/CronArchive.php(955): Piwik\Plugins\CoreAdminHome\API->invalidateArchivedReports(Array, Array, 'day')
#12 core/CronArchive.php(875): Piwik\CronArchive->invalidateRecentDate('yesterday')
#13 core/CronArchive.php(341): Piwik\CronArchive->invalidateArchivedReportsForSitesThatNeedToBeArchivedAgain()
#14 core/CronArchive.php(258): Piwik\CronArchive->run()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the prefix locally in my test but then still got the bound parameter wrong error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the content of $dummyArchives
which actually looked good:
array (
0 =>
array (
'idarchive' => '22',
'name' => 'done',
'idsite' => 1,
'date1' => '2020-05-04 00:00:00',
'date2' => '2020-05-04 00:00:00',
'period' => 1,
'ts_invalidated' => '2020-05-06 04:25:23',
),
1 =>
array (
'idarchive' => '21',
'name' => 'done',
'idsite' => 1,
'date1' => '2020-05-04 00:00:00',
'date2' => '2020-05-10 00:00:00',
'period' => 2,
'ts_invalidated' => '2020-05-06 04:25:23',
),
2 =>
array (
'idarchive' => '19',
'name' => 'done',
'idsite' => 1,
'date1' => '2020-05-01 00:00:00',
'date2' => '2020-05-31 00:00:00',
'period' => 3,
'ts_invalidated' => '2020-05-06 04:25:23',
),
)
resulting in Uncaught exception: parameter number: parameter was not defined
It generated this query:
INSERT IGNORE INTO piwik_archive_invalidations
(idarchive,name,idsite,date1,date2,period,ts_invalidated)
VALUES (?,?,?,?,?,?,?)
The bind value looks good so not quite sure what causes it
Not quite understanding it... Maybe you can reproduce when eg disabling [General] enable_load_data_infile = 1
. Otherwise must be something with my install maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, weird, will look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this in another PR: #15919
Works for me now @diosmosis and reckon we can merge if tests pass (currently they are all failing). BTW... I was thinking with this new table if we should change the logic in https://github.com/matomo-org/matomo/blob/3.13.5/core/Archive/ArchiveInvalidator.php#L76-L91 ? In The only thing I can think of, is that when many archive invalidations are queued eg because of a changed segment, then the tracker cache might be getting a bit big but this shouldn't be the case normally. Not sure it's clear what I mean? Any thoughts? |
If you mean query archive_invalidations in the tracker for rememberToInvalidateArchivedReportsLater, then it means it might make archive_invalidations a bottleneck? I guess it's hard to say how it would behave... the invalidations table will generally be smaller though (except when core:archive isn't called), so maybe it would be better... could try it in another PR? |
I meant calling it when populating the tracker cache... and in tracking mode we would check the tracker cache (not the DB directly). Pretty much like it works now just we don't look at |
Actually @diosmosis I realise this might not be best to do since we'd need to invalidate directly also the week, month, year and per tracking request create several invalidation entries... so maybe we keep the option table for now... and PR should be good to merge once tests pass |
New workflow
The current workflow for core:archive is to check if there are visits for a site, then launching archiving w/ lastN days/weeks/months/years. The new workflow is: invalidate archive, run core:archive and let it process the queue.
Relevant changes
Notes
Fixes #15117