New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNION query in findVisitor() is regularly 100ms+, involves temporary tables and scans #7454
Comments
This seems to work, although it should be checked thoroughly: // We do each query in turn so that each sql query uses its own INDEX
// We don't use UNION because it would create a temporary table
else {
// will use INDEX index_idsite_idvisitor (idsite, idvisitor)
$where = ' AND idvisitor = ?';
$bindSql[] = $idVisitor;
$sqlVisitorId = "$select
$from
WHERE $whereCommon $where
ORDER BY visit_last_action_time DESC
LIMIT 1
";
$visitRow = $this->getDb()->fetch($sqlVisitorId, $bindSql);
if (!$visitRow) {
// will use INDEX index_idsite_config_datetime (idsite, config_id, visit_last_action_time)
array_pop($bindSql);
$where = ' AND config_id = ? AND user_id IS NULL ';
$bindSql[] = $configId;
$sqlConfigId = "$select
$from
WHERE $whereCommon $where
ORDER BY visit_last_action_time DESC
LIMIT 1
";
$visitRow = $this->getDb()->fetch($sqlConfigId, $bindSql);
}
}
return $visitRow; |
Eliminating the temporary tables is nice but doesn't completely eliminated the database time used. Separating these two queries has, however, helped identify where rows originate from - the idvisitor subquery. Here's a 34ms outlier example (from a 5.7Gb database, with MySQL 5.5.41 - the database is dedicated to piwik but is under memory pressure from other systems on the same machine):
I should emphasize that most of the time we're still talking a 1-3ms query (which is maybe 0.5-1ms less without the config_id part). There are regular outliers (most likely because the database is not completely in memory), and when you're getting five to ten hits a second they add up. |
Hi @GreenReaper thanks for the report. We've opened a new issue as forgot there was this one: Tracker: Faster visitor recognition #7779 |
As noted in #7440, commit 0abac3b introduced a UNION into findVisitor() within the visit code (now in core/Tracker/Model.php), which results in the creation of a temporary table on each call ("UNION queries use temporary tables.").
Separately, the idvisitor query normally takes 1-2ms and the config_id query takes around 0.5ms. The combined query is normally around 5ms. However, the number of rows examined may reach the hundreds, and it then starts taking significantly longer - up to 200ms on a dedicated machine.
The impact of these queries vary depending on their size and may not be visible on test or low-visit production websites.
This is what happened when I switched it to checking just the idvisitor part (see last hour):
[The other temporary tables which are disk tables appeared more recently]
When I switched the code around to just use just the select on config_id, the decrease in activity remained. This suggests to me that it is the UNION of the two selects that is causing objects and scans.
An example of a slow query from the log (there are several 100ms+ ones per minute):
An EXPLAIN of the query (the last entry involves the temporary table created by the UNION):
The UNION is intended to check visits against both idvisitor and config_id and return the first if it is present, other than the second. I propose that two queries be performed - check idvisitor first, and if that does not return a result, do the config_id query.
The text was updated successfully, but these errors were encountered: