During archiving we notice a lot of read queries causing CPU spikes on master DB:
SELECT `arc1` . `idarchive` , `arc1` . `value` , `arc1` . `name` , `arc1` . `ts_archived` , `arc1` . `date1` AS `startDate` , `arc2` . `value` AS `nb_visits` , `arc3` . `value` AS `nb_visits_converted` FROM `matomo_archive_numeric_2022_06` `arc1` LEFT JOIN `matomo_archive_numeric_2022_06` `arc2` ON `arc2` . `idarchive` = `arc1` . `idarchive` AND ( `arc2` . `name` = ? ) LEFT JOIN `matomo_archive_numeric_2022_06` `arc3` ON `arc3` . `idarchive` = `arc1` . `idarchive` AND ( `arc3` . `name` = ? ) WHE
We face performance issues toward the end of the month when a matomo_archive_numeric_2022_**
table is becoming too big probably.
The function that call the query: https://github.com/matomo-org/matomo/blob/296de5d46102cfcc7dddd322972edbc631bf2b54/core/DataAccess/Model.php#L436
$sqlQuery = "SELECT arc1.idarchive, arc1.value, arc1.name, arc1.ts_archived, arc1.date1 as startDate, arc2.value as " . ArchiveSelector::NB_VISITS_RECORD_LOOKED_UP . ", arc3.value as " . ArchiveSelector::NB_VISITS_CONVERTED_RECORD_LOOKED_UP . "
FROM $numericTable arc1
LEFT JOIN $numericTable arc2 on arc2.idarchive = arc1.idarchive and (arc2.name = '" . ArchiveSelector::NB_VISITS_RECORD_LOOKED_UP . "')
LEFT JOIN $numericTable arc3 on arc3.idarchive = arc1.idarchive and (arc3.name = '" . ArchiveSelector::NB_VISITS_CONVERTED_RECORD_LOOKED_UP . "')
WHERE arc1.idsite = ?
AND arc1.date1 = ?
AND arc1.date2 = ?
AND arc1.period = ?
AND ($sqlWhereArchiveName)
$timeStampWhere
AND arc1.ts_archived IS NOT NULL
ORDER BY arc1.ts_archived DESC, arc1.idarchive DESC";
$results = Db::fetchAll($sqlQuery, $bindSQL);
Why does it query master DB and not a read replica? Thank you!
Hi @MorganFujimaka, I can't say for sure, but loading the list of archives from the master database is probably to avoid issues any with ongoing archiving processes and read replica lag.
How many records are typically in matomo_archive_numeric_2022_*
by the of the month?
Hi @bx80, thank you for the prompt response!
We have ~140 000 rows in the matomo_archive_numeric_2022_*
tables toward the end of the month. Most of the rows are for our main website, other websites are not impacted by performance issues. We have 30 funnels, 95 goals, and 3 segments for our main website (which increase greatly the number of rows).
Thanks @MorganFujimaka, these details will be really helpful for understanding how we can improve performance in this area :+1: