Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in tracker for high traffic website #2944

Closed
anonymous-matomo-user opened this issue Feb 17, 2012 · 8 comments
Closed
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@anonymous-matomo-user
Copy link

i have also a problem with very high load.

on my mysql server i see lots of querys like the following one, and each one is running over 1 minute ....
SELECT idvisitor,
visit_last_action_time,
visit_first_action_time,
idvisit,
visit_exit_idaction_url,
visit_exit_idaction_name,
visitor_returning,
visitor_days_since_first,
visitor_days_since_order,
location_country,
location_continent,
referer_name,
referer_keyword,
referer_type,
case when idvisitor = 'W?O????' then 1 else 0 end AS priority,
visitor_count_visits,
visit_goal_buyer

            , custom_var_k1, custom_var_v1,
            custom_var_k2, custom_var_v2,
            custom_var_k3, custom_var_v3,
            custom_var_k4, custom_var_v4,
            custom_var_k5, custom_var_v5
            FROM piwik_log_visit WHERE visit_last_action_time >= '2012-02-16 13:39:17'
                AND idsite = '3' AND (idvisitor = 'W?O????' OR config_id = '^?j?_6?') 
            ORDER BY priority DESC, visit_last_action_time DESC
            LIMIT 1

i have broken it down to the fact that according to an explain there are thousands of estimated rows:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: piwik_log_visit
type: ref
possible_keys: index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor
key: index_idsite_datetime
key_len: 4
ref: const
rows: 145600
Extra: Using where; Using filesort

the complete piwik_log_visit table have 781971 entries in total

i played a little bit with the select statement and ended up with an interresting observation, after i told the select to ignore all indexes, the statement runs in 0,8 seconds "IGNORE INDEX (index_idsite_idvisitor,index_idsite_datetime,index_idsite_config_datetime)" which is a more acceptable time and got my server again up and running, but still is not optimal.

i don't understand why this query much better without indexes as with the given ones ...

I have found a changeset that might have changed the behavior to this: r5531

i think one of the problems is that with the OR in the WHERE part the indexes didn't match as good and the second is that for the ordering on priority the server will have examine each row for the case on idvisitors, which will take its time with that much estimatet rows ... :(

@mattab
Copy link
Member

mattab commented Feb 19, 2012

Thanks for the report. there is a "quick fix" in the forum thread at: http://forum.piwik.org/read.php?2,85727

But I'm not sure how to fix the issue keeping the current logic, while not issuing a slow SQL query.

@anonymous-matomo-user
Copy link
Author

thank you for the hotfix.
As i wrote, an IGNORE off the current indexes had worked for me ( the query time went under 1 second runtime ) and your query logic don't wasn't changed this way.
i'm sorry that i didn't post this diff on ticket create ... :(

--- ./core/Tracker/Visit.php 2012-02-15 03:42:31.000000000 +0100
+++ ./core/Tracker/Visit.php 2012-02-17 16:04:02.000000000 +0100
@@ -986,7 +986,7 @@
visit_goal_buyer
$selectCustomVariables
FROM ".Piwik_Common::prefixTable('log_visit').

  •                           " WHERE ".$where."
    
  •                           " WHERE ".$where." IGNORE INDEX (index_idsite_idvisitor,index_idsite_datetime,index_idsite_config_datetime)
                            ORDER BY priority DESC, visit_last_action_time DESC
                            LIMIT 1";
            $visitRow = Piwik_Tracker::getDatabase()->fetch($sql, $bindSql);
    

in my simple mind it should be a qestion of indexing ...
unfortunately i'm not as good with mysql and index logick, so i wasn't able to create an index that works better, otherwise i had postet it willingly :)

@mattab
Copy link
Member

mattab commented Feb 23, 2012

(In [5886]) Refs #2944
Applying the so called "hot fix" (since I can't find a better solution for now):

  • Removing the ORDER BY priority which was resulting in a filesort which is horribly slow
  • This solution is correct in terms of logic
  • However it is not "optimal", because we don't necessarily "reuse" the optimal past visit, but at least we won't create artificial visit. No need to reopen Regression: duplicated visits are created when IP is different every page view #2785
  • Now looking back only 1 hour instead of 24 hours. Less rows to look through.

Note: after the fix, it is still not perfect performance wise:

  • The query does AND (idvisitor = ? OR config_id = ?)
    which results in matching potentially thousands of rows, then mysql has to look through these and sort by time DESC.
    The sorting does not use the index since it can't.
  • Maybe we should issue a UNION query, the first one matching WHERE visitor_id = ?
    The second one matching WHERE config_id = ?
    I think that would work better, since both queries could each use an INDEX.

@mattab
Copy link
Member

mattab commented Feb 23, 2012

(In [5887]) Refs #2944
Fixing build: Since we now look back only 1 hour, stats will be slightly less accurate in terms of recognizing visitors.
In this integration test for example, there is no first party cookie support, so we are not able to match the visit 1 hour later to the previous visit, so the visitor appears as "new".
this is a reasonnable tradeoff for performance...
In this case if first party cookies were supported it would match the visitor and appear as "returning" since the "_idvc" parameter would be found (indicating number of visits)

@mattab
Copy link
Member

mattab commented Feb 24, 2012

(In [5892]) Fixes #2944

  • Now issuing the UNION query so that each subquery uses its own index, for fast and optimal performance, while also providing optimal visitor matching algorithm results.
  • Reverting [and 5887 and the "hot fix" in [5886]

PLEASE EVERYONE LISTENING would you mind testing the new "core/Tracker/Visit.php" file simply replacing your existing file?
You can grab it from: https://github.com/piwik/piwik/blob/master/core/Tracker/Visit.php

It would be great if you could confirm what my tests show, that the code is now very fast. I don't have access to an enormous Piwik as you might have, so your test is very appreciated!!
Thanks!

@mattab
Copy link
Member

mattab commented Feb 24, 2012

(In [5897]) Refs #2944
This sort is slow at least (showing up as slow query on the demo)... Still testing, I'm not 100% sure the code is working well

@mattab
Copy link
Member

mattab commented Feb 24, 2012

(In [5898]) Refs #2944

Aggressively only looking in the past for how long a visit can be. This is more efficient than looking further back.

Note: the build will fail, i will commit test fixes separately

@mattab
Copy link
Member

mattab commented Feb 24, 2012

It looks like it is working fine now.

@Ghosts can you please test the new file and confirm it is very fast on your setup too?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

2 participants