New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archiving performance: cache the slow segment subqueries that are using LIKE constraint #8850
Comments
Not easily do-able. and not sure that we want to try this... It's complex: a segment like |
A thought: one of the issue is that the subquery is slow because full text searching in the large table can be slow. To improve the speed of full text searching (ie. Contains, Does not contain) idea suggested by @diosmosis - maybe we could use a FULL TEXT index on the
(we should use Boolean full-text searches to go above 50% threshold, and tweak appropriately innodb_ft_cache_size and innodb_ft_total_cache_size to size the fulltext index enough for the log_action table) |
First thing to check when experiencing this issue is:
Fact:
In general, I didn't jump into implementing this solution because it has challenges
One use case where caching IDs won't workImagine you create a edge case segment "Page URL contains 'i'" on demo.piwik.org (a medium sized piwik) Creating a temp table with idaction has 1,255,519 results (took ~ 3 seconds).
Running the query re-using the temporary table (took 35 seconds:)
---> in this case, because there are 1.2 M ids, printing IDs in query would not work and exceed |
it's do-able but needs some consideration. Will take a look at the code and see if it can be implemented quickly |
PR work in progress: #8861 |
Follow up issue: Enable segment subquery cache (queries with Contains/Does not contain on log_action) #8867 Before closing this issue, must follow up and improve as per @diosmosis review: #8861 (diff) |
@mattab have you tested it with browser archiving? I received feedback that it doesn't help in that case. Is it possible that each concurrent report request is trying to run segment subquery to warmup the cache? Currently it helps for sure with CLI archiving but we need to check browser archiving as well. |
I tested it locally and segment query was executed only once. Would be nice to test how it works if this query is slow (e.g. takes longer than 2 sec). |
@quba it should work. if there is a problem, please get someone to write a good bug report here / re-open ticket |
@mattab Are you still going to apply my review? |
Yes in #8892 |
Hi,
if we are using segment with
CONTAINS
option using e.g. page URL, it's not possible to narrow down actions to a single ID. We have to execute a subquery that will get the list of IDs. In some cases it can be lots of IDs, even more than 100 000. Currently Piwik is executing aWHERE IN
clause with a subquery querying log_action table. Such queries are really slow when using MySQL/MariaDB.See:
https://www.percona.com/blog/2010/10/25/mysql-limitations-part-3-subqueries/
http://dba.stackexchange.com/questions/14565/mysql-subquery-slows-down-drastically-but-they-work-fine-independently/14581#14581
My proposal is to run such subqueries at the beginning of archiving process and cache them in a file. Therefore query for each report in Piwik will use the list of IDs instead of executing a subquery which may be really slow.
Here are some results for a subquery that returns more than 2000 action IDs:
FYI: I replaced real URL with an example.
Some calculations. In the best case we save here 21 seconds. Assuming that in Piwik we have 100 reports, it would speed up the archiving process by more than 30mins.
Some additional info:
Subquery cache is enabled.
The text was updated successfully, but these errors were encountered: