@utrautmann opened this Issue on February 18th 2022

A customer of mine has occasional performance problems in Matomo. The system hangs because all 150 database connections to MySQL are in use.
My analysis revealed that these problems could be related to the generation of segment reports, which put too much load on the MySQL DB.

To the background:
The customer has created about 40 segments. The reports are updated hourly by a cron job. Browser updates are disabled.

Analysis approach
From the analysis approach, we enabled the MySQL Slow Query Log and analyzed the outputs.
Of all slow SQL queries (defined as longer than 10sec), the statement
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('?', '?', '?') AND type = ? )
was determined in 90 percent.

Analysis results
An identified problem is certainly that the customer uses the matching URL with "contains" instead of "is" far too often in the segment definition. We will optimize this.

But when analyzing the Slow Query Log, I noticed that the same SQL query was issued almost simultaneously up to 50 times, which always returned the same results.
Sending a query to the database from Matomo 50 times more or less at the same time is very unusual.

My question is whether this can be optimized by caching?

Important to know: This problem does not always occur. So we don't have the problem every hour when the cron job starts.

Here is an anonymous example of the slow queries during segment creation as a short excerpt from the MySQL Slow Query Log. This query was made a total of 50 times and required an average of 13 seconds per query.

For clarity: The logging takes place after the statement has been completed.

# Time: 220217  9:03:02
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 730998  Schema: d_piwik  QC_hit: No
# Query_time: 12.612506  Lock_time: 0.000089  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084982;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# Time: 220217  9:03:03
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 731173  Schema: d_piwik  QC_hit: No
# Query_time: 13.035878  Lock_time: 0.000073  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 731194  Schema: d_piwik  QC_hit: No
# Query_time: 13.359759  Lock_time: 0.000089  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 731008  Schema: d_piwik  QC_hit: No
# Query_time: 13.136281  Lock_time: 0.000043  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 730993  Schema: d_piwik  QC_hit: No
# Query_time: 12.772465  Lock_time: 0.000071  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 731144  Schema: d_piwik  QC_hit: No
# Query_time: 12.358530  Lock_time: 0.000057  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 731189  Schema: d_piwik  QC_hit: No
# Query_time: 13.400541  Lock_time: 0.000086  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 730992  Schema: d_piwik  QC_hit: No
# Query_time: 13.420816  Lock_time: 0.000084  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084983;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
# Time: 220217  9:03:04
# User<a class='mention' href='https://github.com/Host'>@Host</a>: d_piwik[d_piwik] @  [127.0.0.1]
# Thread_id: 731150  Schema: d_piwik  QC_hit: No
# Query_time: 13.856118  Lock_time: 0.000081  Rows_sent: 31  Rows_examined: 184510
# Rows_affected: 0  Bytes_sent: 440
SET timestamp=1645084984;
SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('%', 'domain.de/orientieren/meine-startposition/dual-studieren/', '%')  AND type = 1 );
@sgiehl commented on March 2nd 2022 Member

Hi @utrautmann
Thanks for creating the issue. Segments using matches on URLs are known to be slow. There might be possibilities to improve that. Maybe @tsteur can help to give some advice.

@tsteur commented on March 2nd 2022 Member

@utrautmann you could try and see if enabling the enable_segments_subquery_cache feature helps see https://github.com/matomo-org/matomo/blob/4.x-dev/config/global.ini.php#L228 helps. Generally these queries can be slow though. Above query makes me almost think it might be enabled already in which case you would maybe want to disable it.

I would not expect such a query to be executed though that often in parallel or in parallel at all unless you launch heaps of archivers and many of the segments check if the url contains the url.

  • Do you know if maybe many segments have this same page url contains ...the url defined?
  • Do you know if you launch heaps of archivers in parallel?
@utrautmann commented on March 3rd 2022

@tsteur
I checked the parameters.

enable_segments_subquery_cache=1
enable_segments_cache = 0

The customer explicitly requested this setting (enable_segments_cache =0) to get the old behavior. Can that be a reason for the problems?

To your questions:

  1. Yes, there were multiple segments from different users defined for the same URL. But no more than 3-4.
  2. I'll have to check it if multiple archivers are running in parallel

One further amendment:
The customer has now reduced the number of equal segments definitions and made them them available to all users.
The definitions have also been adjusted and the comparison parameters "starts with" or "is" are now used where possible. This brought significant improvements.

@tsteur commented on March 3rd 2022 Member

It's great on the segment reduction 👍

The settings above should be fine to keep if you specifically needed that behaviour that the customer required. I assume in this case it may be an issue of how many archivers run in parallel. There are options to limit the amount of archivers that run in parallel see https://matomo.org/docs/setup-auto-archiving/#help-for-corearchive-command

Eg --concurrent-archivers and then also --concurrent-requests-per-website. Setting for example --concurrent-requests-per-website=1 could improve the situation further.

Powered by GitHub Issue Mirror