Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNION query in findVisitor() is regularly 100ms+, involves temporary tables and scans #7454

Closed
GreenReaper opened this issue Mar 17, 2015 · 3 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo. duplicate For issues that already existed in our issue tracker and were reported previously. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@GreenReaper
Copy link

As noted in #7440, commit 0abac3b introduced a UNION into findVisitor() within the visit code (now in core/Tracker/Model.php), which results in the creation of a temporary table on each call ("UNION queries use temporary tables.").

Separately, the idvisitor query normally takes 1-2ms and the config_id query takes around 0.5ms. The combined query is normally around 5ms. However, the number of rows examined may reach the hundreds, and it then starts taking significantly longer - up to 200ms on a dedicated machine.

The impact of these queries vary depending on their size and may not be visible on test or low-visit production websites.

This is what happened when I switched it to checking just the idvisitor part (see last hour):
mysql_tmp_tables-day-idvisitor
[The other temporary tables which are disk tables appeared more recently]
mysql_select_types-day

When I switched the code around to just use just the select on config_id, the decrease in activity remained. This suggests to me that it is the UNION of the two selects that is causing objects and scans.

An example of a slow query from the log (there are several 100ms+ ones per minute):

# Query_time: 0.203814  Lock_time: 0.048132 Rows_sent: 1  Rows_examined: 116
SET timestamp=1426562315;
( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                0 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND config_id = '^Z?^\? D?' AND user_id IS NULL
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                UNION
                ( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                1 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND idvisitor = '^U\"?^^0^M?U'
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                ORDER BY priority DESC
                LIMIT 1;

An EXPLAIN of the query (the last entry involves the temporary table created by the UNION):

| id | select_type  | table           | type  | possible_keys                                                             | key                          | key_len | ref         | rows | Extra                       |
+----+--------------+-----------------+-------+---------------------------------------------------------------------------+------------------------------+---------+-------------+------+-----------------------------+
|  1 | PRIMARY      | piwik_log_visit | range | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_config_datetime | 20      | NULL        |    1 | Using where                 |
|  2 | UNION        | piwik_log_visit | ref   | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_idvisitor       | 12      | const,const |    1 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2>      | ALL   | NULL                                                                      | NULL                         | NULL    | NULL        | NULL | Using filesort              |

The UNION is intended to check visits against both idvisitor and config_id and return the first if it is present, other than the second. I propose that two queries be performed - check idvisitor first, and if that does not return a result, do the config_id query.

@GreenReaper
Copy link
Author

This seems to work, although it should be checked thoroughly:

        // We do each query in turn so that each sql query uses its own INDEX
        // We don't use UNION because it would create a temporary table
        else {

            // will use INDEX index_idsite_idvisitor (idsite, idvisitor)
            $where     = ' AND idvisitor = ?';
            $bindSql[] = $idVisitor;
            $sqlVisitorId = "$select
                $from
                WHERE $whereCommon $where
                ORDER BY visit_last_action_time DESC
                LIMIT 1
            ";
            $visitRow = $this->getDb()->fetch($sqlVisitorId, $bindSql);

            if (!$visitRow) {

                // will use INDEX index_idsite_config_datetime (idsite, config_id, visit_last_action_time)
                array_pop($bindSql);
                $where       = ' AND config_id = ? AND user_id IS NULL ';
                $bindSql[]   = $configId;
                $sqlConfigId = "$select
                    $from
                    WHERE $whereCommon $where
                    ORDER BY visit_last_action_time DESC
                    LIMIT 1
                ";
                $visitRow = $this->getDb()->fetch($sqlConfigId, $bindSql);
            }
        }

        return $visitRow;

@GreenReaper
Copy link
Author

Eliminating the temporary tables is nice but doesn't completely eliminated the database time used. Separating these two queries has, however, helped identify where rows originate from - the idvisitor subquery. Here's a 34ms outlier example (from a 5.7Gb database, with MySQL 5.5.41 - the database is dedicated to piwik but is under memory pressure from other systems on the same machine):

# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 0.033887  Lock_time: 0.000255 Rows_sent: 0  Rows_examined: 442
SET timestamp=1426591134;
SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches
                FROM piwik_log_visit
               WHERE visit_last_action_time >= '2015-03-17 10:48:54' AND visit_last_action_time <= '2015-03-17 11:48:54' AND idsite = '1'  AND idvisitor = '�Nʈ9lGx'
                ORDER BY visit_last_action_time DESC
                LIMIT 1;
# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 0.001082  Lock_time: 0.000136 Rows_sent: 0  Rows_examined: 0
SET timestamp=1426591134;
SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches
                  FROM piwik_log_visit
                  WHERE visit_last_action_time >= '2015-03-17 10:48:54' AND visit_last_action_time <= '2015-03-17 11:48:54' AND idsite = '1'  AND config_id = '1����Ko�' AND user_id IS NULL
                  ORDER BY visit_last_action_time DESC
                  LIMIT 1;

I should emphasize that most of the time we're still talking a 1-3ms query (which is maybe 0.5-1ms less without the config_id part). There are regular outliers (most likely because the database is not completely in memory), and when you're getting five to ten hits a second they add up.

@GreenReaper GreenReaper changed the title UNION in findVisitor() results in regular 100ms+ queries, temporary tables and scans UNION query in findVisitor() is regularly 100ms+, involves temporary tables and scans Mar 17, 2015
@mattab mattab added Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change. c: Performance For when we could improve the performance / speed of Matomo. labels Apr 8, 2015
@mattab mattab added this to the Mid term milestone Apr 8, 2015
@mattab
Copy link
Member

mattab commented May 1, 2015

Hi @GreenReaper thanks for the report. We've opened a new issue as forgot there was this one: Tracker: Faster visitor recognition #7779

@mattab mattab closed this as completed May 1, 2015
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label May 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. duplicate For issues that already existed in our issue tracker and were reported previously. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

2 participants