@MorganFujimaka opened this Issue on June 22nd 2022

During archiving we notice a lot of read queries causing CPU spikes on master DB:

SELECT `arc1` . `idarchive` , `arc1` . `value` , `arc1` . `name` , `arc1` . `ts_archived` , `arc1` . `date1` AS `startDate` , `arc2` . `value` AS `nb_visits` , `arc3` . `value` AS `nb_visits_converted` FROM `matomo_archive_numeric_2022_06` `arc1` LEFT JOIN `matomo_archive_numeric_2022_06` `arc2` ON `arc2` . `idarchive` = `arc1` . `idarchive` AND ( `arc2` . `name` = ? ) LEFT JOIN `matomo_archive_numeric_2022_06` `arc3` ON `arc3` . `idarchive` = `arc1` . `idarchive` AND ( `arc3` . `name` = ? ) WHE

We face performance issues toward the end of the month when a matomo_archive_numeric_2022_** table is becoming too big probably.

The function that call the query: https://github.com/matomo-org/matomo/blob/296de5d46102cfcc7dddd322972edbc631bf2b54/core/DataAccess/Model.php#L436

        $sqlQuery = "SELECT arc1.idarchive, arc1.value, arc1.name, arc1.ts_archived, arc1.date1 as startDate, arc2.value as " . ArchiveSelector::NB_VISITS_RECORD_LOOKED_UP . ", arc3.value as " . ArchiveSelector::NB_VISITS_CONVERTED_RECORD_LOOKED_UP . "
                     FROM $numericTable arc1
                     LEFT JOIN $numericTable arc2 on arc2.idarchive = arc1.idarchive and (arc2.name = '" . ArchiveSelector::NB_VISITS_RECORD_LOOKED_UP . "')
                     LEFT JOIN $numericTable arc3 on arc3.idarchive = arc1.idarchive and (arc3.name = '" . ArchiveSelector::NB_VISITS_CONVERTED_RECORD_LOOKED_UP . "')
                     WHERE arc1.idsite = ?
                         AND arc1.date1 = ?
                         AND arc1.date2 = ?
                         AND arc1.period = ?
                         AND ($sqlWhereArchiveName)
                         AND arc1.ts_archived IS NOT NULL
                     ORDER BY arc1.ts_archived DESC, arc1.idarchive DESC";

        $results = Db::fetchAll($sqlQuery, $bindSQL);

Why does it query master DB and not a read replica? Thank you!

  • Matomo Version: 4.9.1
@bx80 commented on June 22nd 2022 Contributor

Hi @MorganFujimaka, I can't say for sure, but loading the list of archives from the master database is probably to avoid issues any with ongoing archiving processes and read replica lag.

How many records are typically in matomo_archive_numeric_2022_* by the of the month?

@MorganFujimaka commented on June 30th 2022

Hi @bx80, thank you for the prompt response!
We have ~140 000 rows in the matomo_archive_numeric_2022_* tables toward the end of the month. Most of the rows are for our main website, other websites are not impacted by performance issues. We have 30 funnels, 95 goals, and 3 segments for our main website (which increase greatly the number of rows).

@bx80 commented on July 4th 2022 Contributor

Thanks @MorganFujimaka, these details will be really helpful for understanding how we can improve performance in this area :+1:

Powered by GitHub Issue Mirror