Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When requesting Date Range or Custom Segment, only archive the requested record #7573

Closed
mattab opened this issue Mar 30, 2015 · 7 comments
Closed
Assignees
Labels
c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Mar 30, 2015

The goal of this issue is to make archiving faster for Date Range requests and Custom Segments requests. This follows up: #4768

Currently when a date range API query is issued, it will pre-process all reports for the given plugin being requested. For example when requesting the "Page URLs" report for a custom date range or segment, it will pre-process all Actions plugin reports including Page URLs, Page titles, Downloads, Outlinks.

In this issue we would like to improve the algorithm so that only the requested report (ie. a Blob Record) is archived.

When requesting for example the Page URL report, then it should only archive the page URL report in the actions plugin but no other reports within the same plugin. An exception to this is when a DB query is being used to query the data for multiple reports at once (we do this eg in Actions). Then we would also still archive all reports coming from that specific DB query. We wouldn't split up the DB queries into different queries.

If a report for goal ID =5 is being requested, then it should only archive this specific report for this specific goal but no other goal reports and no other reports for the same goal.

Similarly this should also work for other plugins like custom reports etc.

This may need to be done at the same time as #7470

@mattab mattab added Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change. c: Performance For when we could improve the performance / speed of Matomo. labels Mar 30, 2015
@mattab mattab added this to the Short term milestone Mar 30, 2015
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Apr 7, 2015
@mattab mattab modified the milestones: Piwik 3.0.0, Short term Apr 8, 2015
@mattab mattab removed the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Oct 1, 2015
@mattab
Copy link
Member Author

mattab commented Jul 21, 2016

refs #9532

@tsteur
Copy link
Member

tsteur commented May 3, 2021

@mattab @diosmosis I think now that we have the ability to archive only requested reports (and support partial archives) maybe we could use this also for ranges (which are aggregated during browser archiving usually).

Background:
Say you request a custom report and you have 50 custom reports configured, it means it will aggregate the ranges for all 50 custom reports. If some of them need to process unique metrics it will do this also for all of them even though you might never look at the data.

Same applies to goals, forms, funnels, ...

Each plugin's archivers could support isRequestedReport() maybe and we'd need to set setArchiveOnlyReport() in the archive parameters.

It might not be the best idea though to reuse this logic and maybe would need to use a different logic like in https://github.com/matomo-org/matomo/blob/4.3.0-b3/core/Archive.php#L538 forward record names or so to the archiver so it will only archive requested record names. Benefit be plugin developers won't need to change anything when requesting a report and only would need to check if a specific record name was requested in the archiver.

@mattab eventually should look at this maybe. Also from the background that it would not only make ranges faster but it could maybe at the same time eventually also make browser archiving faster in general (any period). Even if we can't fix it this may be more of a major issue

@mattab
Copy link
Member Author

mattab commented May 5, 2021

@tsteur this would be a high value improvement for UI performance and reducing DB usage, significantly in some cases. Sounds good to schedule it in a 4.x.0 🚀

@tsteur tsteur added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label May 5, 2021
@tsteur
Copy link
Member

tsteur commented May 5, 2021

I was more meaning only adding a major issue but not scheduling it yet but added it for now to 4.5 milestone as it would be good to have but not 100% sure on it's impact yet (in some cases it be quite a big impact though). To be seen how easy it is to develop.

@tsteur
Copy link
Member

tsteur commented Aug 31, 2021

Below a patch that seems to improve the most tricky situations without needing huge changes. Same could work for other entities like Funnels. Reports with entities are the biggest performance problem since there could be like 100 goals vs eg an action archiver always has a static number of archives.

diff --git a/core/Archive.php b/core/Archive.php
index 6ef1ccb09a..7257af6e73 100644
--- a/core/Archive.php
+++ b/core/Archive.php
@@ -827,10 +827,7 @@ class Archive implements ArchiveQuery
     {
         $coreAdminHomeApi = API::getInstance();
 
-        $requestedReport = null;
-        if (SettingsServer::isArchivePhpTriggered()) {
-            $requestedReport = Common::getRequestVar('requestedReport', '', 'string');
-        }
+        $requestedReport = Common::getRequestVar('requestedReport', '', 'string');
 
         $periodString = $period->getRangeString();
         $periodDateStr = $period->getLabel() == 'range' ? $periodString : $period->getDateStart()->toString();
diff --git a/plugins/Goals/API.php b/plugins/Goals/API.php
index ad0bd4121f..e663a7e307 100644
--- a/plugins/Goals/API.php
+++ b/plugins/Goals/API.php
@@ -26,6 +26,7 @@ use Piwik\Plugin\ReportsProvider;
 use Piwik\Plugins\Goals\Columns\Metrics\GoalConversionRate;
 use Piwik\Segment;
 use Piwik\Segment\SegmentExpression;
+use Piwik\SettingsServer;
 use Piwik\Site;
 use Piwik\Tracker\Cache;
 use Piwik\Tracker\GoalManager;
@@ -528,6 +529,9 @@ class API extends \Piwik\Plugin\API
     public function getMetrics($idSite, $period, $date, $segment = false, $idGoal = false, $columns = array(), $showAllGoalSpecificMetrics = false)
     {
         Piwik::checkUserHasViewAccess($idSite);
+
+        $this->setRequestedReportIfNeeded($idGoal, $period);
+
         $archive = Archive::build($idSite, $period, $date, $segment);
 
         $showAllGoalSpecificMetrics = $showAllGoalSpecificMetrics && $idGoal === false;
@@ -582,6 +586,8 @@ class API extends \Piwik\Plugin\API
         }, $columnsToGet);
         $dataTable = $archive->getDataTableFromNumeric($inDbMetricNames);
 
+        $this->unsetRequestedReportIfNeeded();
+
         if (count($columnsToGet) > 0) {
             $newNameMapping = array_combine($inDbMetricNames, $columnsToGet);
         } else {
@@ -681,6 +687,22 @@ class API extends \Piwik\Plugin\API
         return $this->getNumeric($idSite, $period, $date, $segment, Archiver::getRecordName('revenue', $idGoal));
     }
 
+    private function setRequestedReportIfNeeded($idGoal, $period)
+    {
+        if (!empty($idGoal) && $idGoal > 0 && $period === 'range'
+            && !SettingsServer::isArchivePhpTriggered()
+            && !isset($_GET['requestedReport'])) { // don't overwrite requested report if it's already requested for a different plugin or so
+            $_GET['requestedReport'] = 'goalreport_' . $idGoal;
+        }
+    }
+
+    private function unsetRequestedReportIfNeeded()
+    {
+        if (isset($_GET['requestedReport'])) { // todo only unset if we set it above
+            unset($_GET['requestedReport']);
+        }
+    }
+
     /**
      * Utility method that retrieve an archived DataTable for a specific site, date range,
      * segment and goal. If not goal is specified, this method will retrieve and sum the
@@ -699,6 +721,8 @@ class API extends \Piwik\Plugin\API
     {
         Piwik::checkUserHasViewAccess($idSite);
 
+        $this->setRequestedReportIfNeeded($idGoal, $period);
+
         $archive = Archive::build($idSite, $period, $date, $segment);
 
         // check for the special goal ids
@@ -708,6 +732,7 @@ class API extends \Piwik\Plugin\API
         $dataTable = $archive->getDataTable(Archiver::getRecordName($recordName, $realGoalId), $idSubtable = null);
         $dataTable->queueFilter('ReplaceColumnNames');
 
+        $this->unsetRequestedReportIfNeeded();
         return $dataTable;
     }
 
diff --git a/plugins/Goals/Archiver.php b/plugins/Goals/Archiver.php
index dc32c68538..1d49eda376 100644
--- a/plugins/Goals/Archiver.php
+++ b/plugins/Goals/Archiver.php
@@ -468,6 +468,9 @@ class Archiver extends \Piwik\Plugin\Archiver
 
         $fieldsToSum = array();
         foreach ($goalIdsToSum as $goalId) {
+            if (!$this->isRequestedReport($goalId)) {
+               continue;
+            }
             $metricsToSum = Goals::getGoalColumns($goalId);
             foreach ($metricsToSum as $metricName) {
                 $fieldsToSum[] = self::getRecordName($metricName, $goalId);
@@ -478,6 +481,9 @@ class Archiver extends \Piwik\Plugin\Archiver
         $columnsAggregationOperation = null;
 
         foreach ($goalIdsToSum as $goalId) {
+            if (!$this->isRequestedReport($goalId)) {
+                continue;
+            }
             // sum up the visits to conversion data table & the days to conversion data table
             $this->getProcessor()->aggregateDataTableRecords(
                 array(self::getRecordName(self::VISITS_UNTIL_RECORD_NAME, $goalId),
@@ -505,4 +511,11 @@ class Archiver extends \Piwik\Plugin\Archiver
         $this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::NEW_VISITOR_SEGMENT);
         $this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::RETURNING_VISITOR_SEGMENT);
     }
+
+    protected function isRequestedReport(string $reportName)
+    {
+        $requestedReport = $this->getProcessor()->getParams()->getArchiveOnlyReport();
+
+        return empty($requestedReport) || $requestedReport == 'goalreport_' . $reportName;
+    }
 }

Not sure re possible side effects.

I don't think the requestedReport would be forwarded to the processDependentArchive and would possibly need to be implemented there.

@tsteur
Copy link
Member

tsteur commented Oct 21, 2021

In case we can't solve it easily with the current way the archivers work, then we would maybe wait until Matomo 5 to refactor the archiver as needed and change it properly.

@tsteur
Copy link
Member

tsteur commented Nov 6, 2022

We're seeing daily issues around this problem mostly for one account only though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

6 participants