Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regressions since 2.9.2 #7440

Closed
sdosg opened this issue Mar 16, 2015 · 20 comments
Closed

Performance regressions since 2.9.2 #7440

sdosg opened this issue Mar 16, 2015 · 20 comments
Labels
answered For when a question was asked and we referred to forum or answered it. c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@sdosg
Copy link

sdosg commented Mar 16, 2015

After upgrading to 2.12.b5 i have still big performance issues.

2.9.2: 740 req/s
2.11.2: 370 req/s
2.12.b5: 400 req/s

I can still see high cpu consumption by the php fcgi processes (~ 20-25% cpu p. process under load, with 2.9.2 it was ~10-15%).

see Issue 7416

@mattab
Copy link
Member

mattab commented Mar 16, 2015

Hi there,

The queries you pasted in this comment: #7416 (comment) - the SQL queries haven't changed recently (in months or years) so these should not be the cause of slowdown... unless of course it's possible that your mysql has reached some limits and maybe it starts swapping or other things. Do you have more information on what is slow? Can you maybe enable mysql slow queries?

@mattab
Copy link
Member

mattab commented Mar 16, 2015

Update to my previous comment: actually one of the SQL query has changed (in #6436) - as @diosmosis suggested, @sdosg could you please try this query:

SELECT idaction as idaction, type, name FROM piwik_log_action WHERE ( hash = CRC32('Gallery | Inkbunny, the Furry Art Community') AND name = 'Gallery | Inkbunny, the Furry Art Community' AND type = '4' )  OR ( hash = CRC32('inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5=') AND name = 'inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5=' AND type = '1' )  

How many rows are returned? (could you send us the full return rows set of this query?)

@GreenReaper
Copy link

Hi @mattab and @diosmosis - I think that query (from #7416) is for me, not sdosg, and the issue for me with it seems to be the GROUP BY. When I use EXPLAIN with the GROUP BY query it shows " Using where; Using temporary; Using filesort" as opposed to just "Using where".

With the query above, I get one row:

| idaction | type | name                
|        1 |    1 | inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5=

With the original query involving the GROUP BY I get two rows:

| idaction | type | name 
|     3665 |    1 | inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5= |
|        1 |    4 | Gallery | Inkbunny, the Furry Art Community

@GreenReaper
Copy link

Per MySQL documentation:

"By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well. If you include an explicit ORDER BY clause that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL."

Perhaps that is what is needed here? I don't know if you need them in a specific order. Note that using GROUP BY order for ORDER BY is also deprecated (because it can be a performance issue), so if you do need it, it should be specified.

If I add ORDER BY NULL after the GROUP BY clause, I get "using temporary" but not "using filesort" in explain.

@GreenReaper
Copy link

I tried adding ORDER BY NULL after the GROUP BY clause in:
public function getIdsAction($actionsNameAndType)
of core/Tracker/Model.php

For me, this does not have a significant impact the temporary table statistic (on disk or otherwise). I guess "filesort" doesn't actually mean "using files to do it". I suspect the tables come from other queries.

Using ORDER BY NULL to avoid the sort might still be a worthwhile optimization, of course - assuming you don't actually rely on the order anywhere - and it might be worth @sdosg trying it.

@diosmosis
Copy link
Member

Did you remove the MIN(idaction) and replace w/ idaction in the first query you executed?

@mattab
Copy link
Member

mattab commented Mar 17, 2015

(my bad, I now removed the MIN() in the query above - can you try again?)

@GreenReaper
Copy link

I tried it and it made no difference for me, at least in the number of temporary tables; maybe it would improve things for the original poster, or be a general improvement, though?

To be clear: I'm not facing the specific performance issues which sdosg reports; I just noticed temp disk tables where before there were none and I'm trying to eliminate them. Our two issues may not be related. (I should probably not have hopped onto his bug last time, sorry!) Unfortunately I'm not on Solaris/OS X or able to use MySQL 5.6 easily on Debian, or it might be easier to trace these tables down.

@diosmosis
Copy link
Member

I am looking for the number results so we can see if MySQL is traversing over a lot of rows, thus needing disk space. Can you run the query and post the number of rows returned? The second query was introduced due to a race condition that created duplicate actions. If there are many duplicate actions, the query would have to do more work. This seems unlikely, but it is something to check.

@GreenReaper
Copy link

I decided just to log everything for a few seconds by setting long_query_time = 0 and see what took time. It does not appear to be the query above which selects idaction. These examined 2, 4 or 0 rows.

Instead, the queries that were taking taking 0.1 or 0.2 seconds looked similar to the following. Many were examining 300 or 400 rows. They typically not take more than 0.005 seconds until they started examining 100-150 rows, although there were one or two exceptions. There were a lot of them.

# Query_time: 0.203814  Lock_time: 0.048132 Rows_sent: 1  Rows_examined: 116
SET timestamp=1426562315;
( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                0 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND config_id = '^Z?^\? D?' AND user_id IS NULL
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                UNION
                ( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                1 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND idvisitor = '^U\"?^^0^M?U'
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                ORDER BY priority DESC
                LIMIT 1;

The explain was the following:

| id | select_type  | table           | type  | possible_keys                                                             | key                          | key_len | ref         | rows | Extra                       |
+----+--------------+-----------------+-------+---------------------------------------------------------------------------+------------------------------+---------+-------------+------+-----------------------------+
|  1 | PRIMARY      | piwik_log_visit | range | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_config_datetime | 20      | NULL        |    1 | Using where                 |
|  2 | UNION        | piwik_log_visit | ref   | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_idvisitor       | 12      | const,const |    1 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2>      | ALL   | NULL                                                                      | NULL                         | NULL    | NULL        | NULL | Using filesort              |

While lock_time was relatively large in the example above, in general it was more like 0.0002 for such queries.

@GreenReaper
Copy link

The code in question (public function findVisitor() ) appears to be making two queries at once and then using a UNION on them because it prefers the result of one of them.

I think you might be trying to do too much in a single query. Maybe it would be better to break it out into two queries, and only execute the one involving config_id if the one involving idvisitor fails to return a result? This would avoid what might be an expensive UNION and resulting temporary table creation.

Generally available versions of MySQL appear to create a temporary table for all UNIONs. They only changed that in 5.7.3, and then only for UNION ALL (which could have been used here) and where there is no ORDER BY (there is here) and it is not the top-level query block of a SELECT (it is).

@GreenReaper
Copy link

The code in question has moved around a bit, but was created in commit 0abac3b

It therefore isn't a recent change. But it does seem to be taking some time!

@mattab
Copy link
Member

mattab commented Mar 17, 2015

It's not a recent change like you point out, and you mention you see a regression since 2.9.2. So in this particular issue we would like to focus on anything that may have changed since 2.9.2. otherwise we lose focus :-)

The problem is that I don't know what info we need from you to troubleshoot your issue... it's tricky.

@mattab
Copy link
Member

mattab commented Mar 17, 2015

Btw can you try disabling the Provider plugin if that's not disabled yet? it should help performance a lot in case DNS server is slow (which often it is).

we need to move this plugin to the Marketplace for this reason (it's slow) - covered in #5341

@GreenReaper
Copy link

I can certainly try that, and will do so. But I will open another issue for the UNION as it appears to have a very significant impact, and is perhaps easily fixed.

@mattab
Copy link
Member

mattab commented Mar 17, 2015

before creating the issue could you enable slow query log and confirm it is actually slow? AFAIK it's fast and mysql won't create TMP table on disk since it's very small result set and fits in RAM. From explain statement I don't see that it will create tmp table. I see using filesort but that does not mean that it will create a tmp table - see this blog post for more info: http://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/

@GreenReaper
Copy link

Yes, it is actually slow. :-) That is a 200ms query above, examining 100+ rows. The idvisitor query by itself is a 1ms-2ms query, while the config_id query is maybe 0.5ms; neither examines more than one row.

Changing it to just do one select (either idvisitor or config_id, it doesn't matter) eliminates half of the temporary tables (the ones which did not appear recently) and also removes all scan selects in my select types munin graph.

A big part of the slowdown may be in the idvisitor query, as documented later in the issue below. I think it's still worth getting rid of the UNION, not least because it usually means avoiding the config_id query.

@GreenReaper
Copy link

Now, as for the idaction query . . . the issue is not the MIN(), but the GROUP BY clause. It seems that grouping by name causes it to split to a temporary table (probably because it is a text field). If the query ends "GROUP BY type, hash" [and optionally "ORDER BY NULL"] then no temporary table is created.

This appears to have been introduced in e43ed30 to both getSelectQueryWhereNameContains() [core/Tracker/TableLogAction.php] and getIdsAction() [core/Tracker/Model.php], the latter called per-hit.

Commit 3e390e5 adds 'hash' to the GROUP BY but does not remove name. Perhaps it would be possible to use hash instead of name? This eliminates the temporary disk table per hit, as so:
mysql_tmp_tables-day-notext

@diosmosis
Copy link
Member

No, we can't group by hash only, since different names can have the same hash. We could select everything (w/ a sensible limit) and do the group by in PHP though... Will have to think about it.

Thanks for doing the research/debugging! I think you've found the issues at fault here.

@mattab mattab added the Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change. label Mar 18, 2015
@mattab mattab changed the title Still bad performance with 2.12.b5 Performance regressions since 2.9.2 Mar 18, 2015
@mattab mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Mar 18, 2015
@mattab mattab added this to the Short term milestone Mar 18, 2015
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Apr 7, 2015
@mattab mattab modified the milestones: Short term, Mid term Apr 7, 2015
@mattab
Copy link
Member

mattab commented Jun 18, 2015

Hi everyone,

We've made huge improvements to Performance in recent months. See all closed issues for an overview of the progress we've done: https://github.com/piwik/piwik/issues?q=label%3A%22c%3A+Performance%22+is%3Aclosed+sort%3Aupdated-desc

If you have further suggestions please open a new issue, we would love to hear. Performance is a top priority for us at Piwik 👍

@mattab mattab closed this as completed Jun 18, 2015
@mattab mattab added the answered For when a question was asked and we referred to forum or answered it. label Jun 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it. c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

4 participants