@tsteur opened this Issue on July 10th 2021 Member

Seeing queries like these:


As I think we mostly only calculate unique visitors for week|month|year|range I wonder if it's actually needed to create a temporary table for these metrics? Might be a lot faster to just get the unique numbers directly?

@tsteur commented on July 10th 2021 Member

I didn't test at all, and not even sure it's the right place, and where it be fine to do it but meaning maybe we can do something like this (again, might be wrong place or we might not want it)

diff --git a/core/ArchiveProcessor.php b/core/ArchiveProcessor.php
index b231d57eb4..11e00c17da 100644
--- a/core/ArchiveProcessor.php
+++ b/core/ArchiveProcessor.php
@@ -527,6 +527,7 @@ class ArchiveProcessor
     protected function computeNbUniques($metrics, $sites)
         $logAggregator = $this->getLogAggregator();
+        $previous = $logAggregator->disallowUsageSegmentCache();
         $sitesBackup = $logAggregator->getSites();

@@ -534,6 +535,9 @@ class ArchiveProcessor
             $query = $logAggregator->queryVisitsByDimension(array(), false, array(), $metrics);
         } finally {
+            if ($previous) {
+                $logAggregator->allowUsageSegmentCache();
+            }
         $data = $query->fetch();
         return $data;
diff --git a/core/DataAccess/LogAggregator.php b/core/DataAccess/LogAggregator.php
index e283af2691..6e2d6b0b17 100644
--- a/core/DataAccess/LogAggregator.php
+++ b/core/DataAccess/LogAggregator.php
@@ -265,6 +265,14 @@ class LogAggregator
         $this->allowUsageSegmentCache = true;

+    public function disallowUsageSegmentCache()
+    {
+        $previous = $this->allowUsageSegmentCache;
+        $this->allowUsageSegmentCache = false;
+        return $previous;
+    }
@diosmosis commented on July 11th 2021 Member

Might merit testing first. From testing hyperloglog, it seemed as if the DISTINCT handling was the cause of slow performance. (The queries there were slow and had no segment. I guess we could compare a bare DISTINCT query vs. a DISTINCT query with a segment.)

@tsteur commented on July 11th 2021 Member

Just fyi @diosmosis it might in the end do the same query again pretty much anyway as the temp segment table would have only the distinct idvisits so it would basically do yet another distinct for the idvisitor etc.

@tsteur commented on July 28th 2021 Member

This particular query is actually coming from Cohorts. There is also the unique visitors and unique users query from core which is a slightly different query.

@tsteur commented on July 29th 2021 Member

Closing this issue as a wontfix. If there wasn't the Cohorts plugin, then we would apply https://github.com/matomo-org/matomo/pull/17827 . I've been debugging and when Cohorts is installed, then the same created temporary segment table will be reused thus likely saving us time by creating this temporary table (as the segment query can be very slow, not meaning the distinct query).

If there wasn't cohorts then we would be applying this patch and make it a bit faster for On-Premise users. We can't have an if/else depending on if Cohorts is installed because it would make test results quite unreliable as you would need to execute all tests around this with and without that plugin. Also by having the temporary table if ever any other plugin was to use unique data then this will help there too.

@diosmosis commented on July 29th 2021 Member

@tsteur if it helps there could be a setting to disable the logic set via DI, would that work?

This Issue was closed on July 29th 2021
