Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segments containing log_action.name (for example page_url) contains are very slow #19581

Open
tsteur opened this issue Jul 28, 2022 · 1 comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.

Comments

@tsteur
Copy link
Member

tsteur commented Jul 28, 2022

refs #8850 and #8867

When someone creates a segment like page_url contains '%...%' or page_title contains '%...%' or event_name contains '%...%' then these kind of queries are very slow as it requires a full table scan on log_action. They result in a where clause for example like this ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'njw', '%') AND type = 10 )) ) ). With a lot of data in log_action table these queries can be extremely slow, cause a lot of CPU usage etc.

As it's one of Matomo's biggest bottleneck, we have investigated a few times in the past whether we can optimise this query but couldn't find any solution.

In #8867 there was a config enable_segments_subquery_cache created but this doesn't help all that much as the cache might become outdated (it caches the data for 1 hour) and the cache re-usage would be limited during that hour as we already use temporary tables when archiving to only needing to execute this query once.

I'm now creating this issue again in case there are other ways to make this faster. Like maybe we can maintain MySQL tables for each such segment definition so that we not only write log_action entries in log_action but also eg in a table log_action_pageurl_contains_foo and we reuse the same log_action IDs etc in that table. This way we wouldn't need to run the contains query at all but could change it to join that table instead. This of course might cause more writes during tracking and not sure this is beneficial or worth it. There might be also other ideas.

@tsteur tsteur added the c: Performance For when we could improve the performance / speed of Matomo. label Jul 28, 2022
@sgiehl sgiehl added this to the For Prioritization milestone Jul 29, 2022
@bx80
Copy link
Contributor

bx80 commented Aug 1, 2022

It might not be too hard to test the performance benefits of using a segment <-> action link table, something like the following on a large dataset with a complex segment could be tested with and without using the link table to compare performance. The trade off would be efficiently matching new actions created by the tracker to existing segments. Perhaps if we could do this in code then it might just be a single read (get the segment definitions) and a single write (bulk insert into the link table).

-- Migration
CREATE TABLE log_link_segment_action (
  idaction    bigint(10) UNSIGNED,
  idsegment   bigint(10) UNSIGNED,
  PRIMARY KEY (idaction, idsegment)
);
-- Migration or when a new segment is added (the where clause is the segment definition)
INSERT INTO log_link_segment_action 
(SELECT idaction, 1 AS idsegement FROM log_action WHERE ( name LIKE CONCAT('%', 'njw', '%') AND type = 10 ));
// When a new log_action record is added by the tracker
$segments = Piwik\Plugins\SegmentEditor\AAPI\API::getInstance()->getAll(1);
foreach ($segments as $s) {
    if (matchSegmentInCode($s['definition], $action){
        // INSERT INTO log_link_segment_action (idaction, idsegment) VALUES ($s[id], $idaction);
    }
}
-- Analytic queries using join instead of expensive contains query
SELECT * FROM log_link_visit_action va
LEFT JOIN log_link_segment_action sa ON sa.idaction = va.idaction_url
WHERE sa.idsegment = 1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

No branches or pull requests

3 participants