Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need to create a temporary segment table for unique week|month|year periods? #17750

Closed
tsteur opened this issue Jul 10, 2021 · 6 comments
Closed
Labels
c: Performance For when we could improve the performance / speed of Matomo. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Jul 10, 2021

Seeing queries like these:

image

As I think we mostly only calculate unique visitors for week|month|year|range I wonder if it's actually needed to create a temporary table for these metrics? Might be a lot faster to just get the unique numbers directly?

@tsteur tsteur added the c: Performance For when we could improve the performance / speed of Matomo. label Jul 10, 2021
@tsteur tsteur added this to the 4.4.0 milestone Jul 10, 2021
@tsteur
Copy link
Member Author

tsteur commented Jul 10, 2021

I didn't test at all, and not even sure it's the right place, and where it be fine to do it but meaning maybe we can do something like this (again, might be wrong place or we might not want it)

diff --git a/core/ArchiveProcessor.php b/core/ArchiveProcessor.php
index b231d57eb4..11e00c17da 100644
--- a/core/ArchiveProcessor.php
+++ b/core/ArchiveProcessor.php
@@ -527,6 +527,7 @@ class ArchiveProcessor
     protected function computeNbUniques($metrics, $sites)
     {
         $logAggregator = $this->getLogAggregator();
+        $previous = $logAggregator->disallowUsageSegmentCache();
         $sitesBackup = $logAggregator->getSites();
 
         $logAggregator->setSites($sites);
@@ -534,6 +535,9 @@ class ArchiveProcessor
             $query = $logAggregator->queryVisitsByDimension(array(), false, array(), $metrics);
         } finally {
             $logAggregator->setSites($sitesBackup);
+            if ($previous) {
+                $logAggregator->allowUsageSegmentCache();
+            }
         }
         $data = $query->fetch();
         return $data;
diff --git a/core/DataAccess/LogAggregator.php b/core/DataAccess/LogAggregator.php
index e283af2691..6e2d6b0b17 100644
--- a/core/DataAccess/LogAggregator.php
+++ b/core/DataAccess/LogAggregator.php
@@ -265,6 +265,14 @@ class LogAggregator
         $this->allowUsageSegmentCache = true;
     }
 
+
+    public function disallowUsageSegmentCache()
+    {
+        $previous = $this->allowUsageSegmentCache;
+        $this->allowUsageSegmentCache = false;
+        return $previous;
+    }
+

@diosmosis
Copy link
Member

Might merit testing first. From testing hyperloglog, it seemed as if the DISTINCT handling was the cause of slow performance. (The queries there were slow and had no segment. I guess we could compare a bare DISTINCT query vs. a DISTINCT query with a segment.)

@tsteur
Copy link
Member Author

tsteur commented Jul 11, 2021

Just fyi @diosmosis it might in the end do the same query again pretty much anyway as the temp segment table would have only the distinct idvisits so it would basically do yet another distinct for the idvisitor etc.

@mattab mattab modified the milestones: 4.4.0, 4.5.0 Jul 28, 2021
@tsteur
Copy link
Member Author

tsteur commented Jul 28, 2021

This particular query is actually coming from Cohorts. There is also the unique visitors and unique users query from core which is a slightly different query.

@tsteur tsteur added the wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. label Jul 29, 2021
@tsteur
Copy link
Member Author

tsteur commented Jul 29, 2021

Closing this issue as a wontfix. If there wasn't the Cohorts plugin, then we would apply #17827 . I've been debugging and when Cohorts is installed, then the same created temporary segment table will be reused thus likely saving us time by creating this temporary table (as the segment query can be very slow, not meaning the distinct query).

If there wasn't cohorts then we would be applying this patch and make it a bit faster for On-Premise users. We can't have an if/else depending on if Cohorts is installed because it would make test results quite unreliable as you would need to execute all tests around this with and without that plugin. Also by having the temporary table if ever any other plugin was to use unique data then this will help there too.

@tsteur tsteur closed this as completed Jul 29, 2021
@diosmosis
Copy link
Member

@tsteur if it helps there could be a setting to disable the logic set via DI, would that work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

3 participants