A customer of mine has occasional performance problems in Matomo. The system hangs because all 150 database connections to MySQL are in use.
My analysis revealed that these problems could be related to the generation of segment reports, which put too much load on the MySQL DB.
To the background:
The customer has created about 40 segments. The reports are updated hourly by a cron job. Browser updates are disabled.
Analysis approach
From the analysis approach, we enabled the MySQL Slow Query Log and analyzed the outputs.
Of all slow SQL queries (defined as longer than 10sec), the statementSELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('?', '?', '?') AND type = ? )
was determined in 90 percent.
Analysis results
An identified problem is certainly that the customer uses the matching URL with "contains" instead of "is" far too often in the segment definition. We will optimize this.
But when analyzing the Slow Query Log, I noticed that the same SQL query was issued almost simultaneously up to 50 times, which always returned the same results.
Sending a query to the database from Matomo 50 times more or less at the same time is very unusual.
My question is whether this can be optimized by caching?
Important to know: This problem does not always occur. So we don't have the problem every hour when the cron job starts.
Here is an anonymous example of the slow queries during segment creation as a short excerpt from the MySQL Slow Query Log. This query was made a total of 50 times and required an average of 13 seconds per query.
For clarity: The logging takes place after the statement has been completed.
# Time: 220217 9:03:02
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 730998 Schema: d_piwik QC_hit: No
# Query_time: 12.612506 Lock_time: 0.000089 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084982;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# Time: 220217 9:03:03
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 731173 Schema: d_piwik QC_hit: No
# Query_time: 13.035878 Lock_time: 0.000073 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 731194 Schema: d_piwik QC_hit: No
# Query_time: 13.359759 Lock_time: 0.000089 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 731008 Schema: d_piwik QC_hit: No
# Query_time: 13.136281 Lock_time: 0.000043 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 730993 Schema: d_piwik QC_hit: No
# Query_time: 12.772465 Lock_time: 0.000071 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 731144 Schema: d_piwik QC_hit: No
# Query_time: 12.358530 Lock_time: 0.000057 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 731189 Schema: d_piwik QC_hit: No
# Query_time: 13.400541 Lock_time: 0.000086 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 730992 Schema: d_piwik QC_hit: No
# Query_time: 13.420816 Lock_time: 0.000084 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
# Time: 220217 9:03:04
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @ [127.0.0.1]
# Thread_id: 731150 Schema: d_piwik QC_hit: No
# Query_time: 13.856118 Lock_time: 0.000081 Rows_sent: 31 Rows_examined: 184510
# Rows_affected: 0 Bytes_sent: 440
SET timestamp=1645084984;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%') AND type = 1 );
Hi @utrautmann
Thanks for creating the issue. Segments using matches on URLs are known to be slow. There might be possibilities to improve that. Maybe @tsteur can help to give some advice.
@utrautmann you could try and see if enabling the enable_segments_subquery_cache
feature helps see https://github.com/matomo-org/matomo/blob/4.x-dev/config/global.ini.php#L228 helps. Generally these queries can be slow though. Above query makes me almost think it might be enabled already in which case you would maybe want to disable it.
I would not expect such a query to be executed though that often in parallel or in parallel at all unless you launch heaps of archivers and many of the segments check if the url contains the url.
page url contains ...the url
defined? @tsteur
I checked the parameters.
enable_segments_subquery_cache=1
enable_segments_cache = 0
The customer explicitly requested this setting (enable_segments_cache =0) to get the old behavior. Can that be a reason for the problems?
To your questions:
One further amendment:
The customer has now reduced the number of equal segments definitions and made them them available to all users.
The definitions have also been adjusted and the comparison parameters "starts with" or "is" are now used where possible. This brought significant improvements.
It's great on the segment reduction 👍
The settings above should be fine to keep if you specifically needed that behaviour that the customer required. I assume in this case it may be an issue of how many archivers run in parallel. There are options to limit the amount of archivers that run in parallel see https://matomo.org/docs/setup-auto-archiving/#help-for-corearchive-command
Eg --concurrent-archivers
and then also --concurrent-requests-per-website
. Setting for example --concurrent-requests-per-website=1
could improve the situation further.