Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiving process causes CPU spikes on Master DB #19384

Open
MorganFujimaka opened this issue Jun 22, 2022 · 3 comments
Open

Archiving process causes CPU spikes on Master DB #19384

MorganFujimaka opened this issue Jun 22, 2022 · 3 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo.

Comments

@MorganFujimaka
Copy link

MorganFujimaka commented Jun 22, 2022

During archiving we notice a lot of read queries causing CPU spikes on master DB:

SELECT `arc1` . `idarchive` , `arc1` . `value` , `arc1` . `name` , `arc1` . `ts_archived` , `arc1` . `date1` AS `startDate` , `arc2` . `value` AS `nb_visits` , `arc3` . `value` AS `nb_visits_converted` FROM `matomo_archive_numeric_2022_06` `arc1` LEFT JOIN `matomo_archive_numeric_2022_06` `arc2` ON `arc2` . `idarchive` = `arc1` . `idarchive` AND ( `arc2` . `name` = ? ) LEFT JOIN `matomo_archive_numeric_2022_06` `arc3` ON `arc3` . `idarchive` = `arc1` . `idarchive` AND ( `arc3` . `name` = ? ) WHE

We face performance issues toward the end of the month when a matomo_archive_numeric_2022_** table is becoming too big probably.

The function that call the query:

public function getArchiveIdAndVisits($numericTable, $idSite, $period, $dateStartIso, $dateEndIso, $minDatetimeIsoArchiveProcessedUTC,

        $sqlQuery = "SELECT arc1.idarchive, arc1.value, arc1.name, arc1.ts_archived, arc1.date1 as startDate, arc2.value as " . ArchiveSelector::NB_VISITS_RECORD_LOOKED_UP . ", arc3.value as " . ArchiveSelector::NB_VISITS_CONVERTED_RECORD_LOOKED_UP . "
                     FROM $numericTable arc1
                     LEFT JOIN $numericTable arc2 on arc2.idarchive = arc1.idarchive and (arc2.name = '" . ArchiveSelector::NB_VISITS_RECORD_LOOKED_UP . "')
                     LEFT JOIN $numericTable arc3 on arc3.idarchive = arc1.idarchive and (arc3.name = '" . ArchiveSelector::NB_VISITS_CONVERTED_RECORD_LOOKED_UP . "')
                     WHERE arc1.idsite = ?
                         AND arc1.date1 = ?
                         AND arc1.date2 = ?
                         AND arc1.period = ?
                         AND ($sqlWhereArchiveName)
                         $timeStampWhere
                         AND arc1.ts_archived IS NOT NULL
                     ORDER BY arc1.ts_archived DESC, arc1.idarchive DESC";

        $results = Db::fetchAll($sqlQuery, $bindSQL);

Why does it query master DB and not a read replica? Thank you!

  • Matomo Version: 4.9.1
@MorganFujimaka MorganFujimaka added the Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. label Jun 22, 2022
@bx80
Copy link
Contributor

bx80 commented Jun 22, 2022

Hi @MorganFujimaka, I can't say for sure, but loading the list of archives from the master database is probably to avoid issues any with ongoing archiving processes and read replica lag.

How many records are typically in matomo_archive_numeric_2022_* by the of the month?

@bx80 bx80 added c: Performance For when we could improve the performance / speed of Matomo. and removed Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. labels Jun 22, 2022
@bx80 bx80 added this to the For Prioritization milestone Jun 22, 2022
@MorganFujimaka
Copy link
Author

MorganFujimaka commented Jun 30, 2022

Hi @bx80, thank you for the prompt response!
We have ~140 000 rows in the matomo_archive_numeric_2022_* tables toward the end of the month. Most of the rows are for our main website, other websites are not impacted by performance issues. We have 30 funnels, 95 goals, and 3 segments for our main website (which increase greatly the number of rows).

@bx80
Copy link
Contributor

bx80 commented Jul 4, 2022

Thanks @MorganFujimaka, these details will be really helpful for understanding how we can improve performance in this area 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

No branches or pull requests

2 participants