@mattab opened this Issue on March 30th 2015 Member

The goal of this issue is to make archiving faster for Date Range requests and Custom Segments requests. This follows up: #4768

Currently when a date range API query is issued, it will pre-process all reports for the given plugin being requested. For example when requesting the "Page URLs" report for a custom date range or segment, it will pre-process all Actions plugin reports including Page URLs, Page titles, Downloads, Outlinks.

In this issue we would like to improve the algorithm so that only the requested report (ie. a Blob Record) is archived.

This may need to be done at the same time as #7470

@mattab commented on July 21st 2016 Member
@tsteur commented on May 3rd 2021 Member

@mattab @diosmosis I think now that we have the ability to archive only requested reports (and support partial archives) maybe we could use this also for ranges (which are aggregated during browser archiving usually).

Background:
Say you request a custom report and you have 50 custom reports configured, it means it will aggregate the ranges for all 50 custom reports. If some of them need to process unique metrics it will do this also for all of them even though you might never look at the data.

Same applies to goals, forms, funnels, ...

Each plugin's archivers could support isRequestedReport() maybe and we'd need to set setArchiveOnlyReport() in the archive parameters.

It might not be the best idea though to reuse this logic and maybe would need to use a different logic like in https://github.com/matomo-org/matomo/blob/4.3.0-b3/core/Archive.php#L538 forward record names or so to the archiver so it will only archive requested record names. Benefit be plugin developers won't need to change anything when requesting a report and only would need to check if a specific record name was requested in the archiver.

@mattab eventually should look at this maybe. Also from the background that it would not only make ranges faster but it could maybe at the same time eventually also make browser archiving faster in general (any period). Even if we can't fix it this may be more of a major issue

@mattab commented on May 5th 2021 Member

@tsteur this would be a high value improvement for UI performance and reducing DB usage, significantly in some cases. Sounds good to schedule it in a 4.x.0 :rocket:

@tsteur commented on May 5th 2021 Member

I was more meaning only adding a major issue but not scheduling it yet but added it for now to 4.5 milestone as it would be good to have but not 100% sure on it's impact yet (in some cases it be quite a big impact though). To be seen how easy it is to develop.

@tsteur commented on August 31st 2021 Member

Below a patch that seems to improve the most tricky situations without needing huge changes. Same could work for other entities like Funnels. Reports with entities are the biggest performance problem since there could be like 100 goals vs eg an action archiver always has a static number of archives.

diff --git a/core/Archive.php b/core/Archive.php
index 6ef1ccb09a..7257af6e73 100644
--- a/core/Archive.php
+++ b/core/Archive.php
@@ -827,10 +827,7 @@ class Archive implements ArchiveQuery
     {
         $coreAdminHomeApi = API::getInstance();

-        $requestedReport = null;
-        if (SettingsServer::isArchivePhpTriggered()) {
-            $requestedReport = Common::getRequestVar('requestedReport', '', 'string');
-        }
+        $requestedReport = Common::getRequestVar('requestedReport', '', 'string');

         $periodString = $period->getRangeString();
         $periodDateStr = $period->getLabel() == 'range' ? $periodString : $period->getDateStart()->toString();
diff --git a/plugins/Goals/API.php b/plugins/Goals/API.php
index ad0bd4121f..e663a7e307 100644
--- a/plugins/Goals/API.php
+++ b/plugins/Goals/API.php
@@ -26,6 +26,7 @@ use Piwik\Plugin\ReportsProvider;
 use Piwik\Plugins\Goals\Columns\Metrics\GoalConversionRate;
 use Piwik\Segment;
 use Piwik\Segment\SegmentExpression;
+use Piwik\SettingsServer;
 use Piwik\Site;
 use Piwik\Tracker\Cache;
 use Piwik\Tracker\GoalManager;
@@ -528,6 +529,9 @@ class API extends \Piwik\Plugin\API
     public function getMetrics($idSite, $period, $date, $segment = false, $idGoal = false, $columns = array(), $showAllGoalSpecificMetrics = false)
     {
         Piwik::checkUserHasViewAccess($idSite);
+
+        $this->setRequestedReportIfNeeded($idGoal, $period);
+
         $archive = Archive::build($idSite, $period, $date, $segment);

         $showAllGoalSpecificMetrics = $showAllGoalSpecificMetrics && $idGoal === false;
@@ -582,6 +586,8 @@ class API extends \Piwik\Plugin\API
         }, $columnsToGet);
         $dataTable = $archive->getDataTableFromNumeric($inDbMetricNames);

+        $this->unsetRequestedReportIfNeeded();
+
         if (count($columnsToGet) > 0) {
             $newNameMapping = array_combine($inDbMetricNames, $columnsToGet);
         } else {
@@ -681,6 +687,22 @@ class API extends \Piwik\Plugin\API
         return $this->getNumeric($idSite, $period, $date, $segment, Archiver::getRecordName('revenue', $idGoal));
     }

+    private function setRequestedReportIfNeeded($idGoal, $period)
+    {
+        if (!empty($idGoal) && $idGoal > 0 && $period === 'range'
+            && !SettingsServer::isArchivePhpTriggered()
+            && !isset($_GET['requestedReport'])) { // don't overwrite requested report if it's already requested for a different plugin or so
+            $_GET['requestedReport'] = 'goalreport_' . $idGoal;
+        }
+    }
+
+    private function unsetRequestedReportIfNeeded()
+    {
+        if (isset($_GET['requestedReport'])) { // todo only unset if we set it above
+            unset($_GET['requestedReport']);
+        }
+    }
+
     /**
      * Utility method that retrieve an archived DataTable for a specific site, date range,
      * segment and goal. If not goal is specified, this method will retrieve and sum the
@@ -699,6 +721,8 @@ class API extends \Piwik\Plugin\API
     {
         Piwik::checkUserHasViewAccess($idSite);

+        $this->setRequestedReportIfNeeded($idGoal, $period);
+
         $archive = Archive::build($idSite, $period, $date, $segment);

         // check for the special goal ids
@@ -708,6 +732,7 @@ class API extends \Piwik\Plugin\API
         $dataTable = $archive->getDataTable(Archiver::getRecordName($recordName, $realGoalId), $idSubtable = null);
         $dataTable->queueFilter('ReplaceColumnNames');

+        $this->unsetRequestedReportIfNeeded();
         return $dataTable;
     }

diff --git a/plugins/Goals/Archiver.php b/plugins/Goals/Archiver.php
index dc32c68538..1d49eda376 100644
--- a/plugins/Goals/Archiver.php
+++ b/plugins/Goals/Archiver.php
@@ -468,6 +468,9 @@ class Archiver extends \Piwik\Plugin\Archiver

         $fieldsToSum = array();
         foreach ($goalIdsToSum as $goalId) {
+            if (!$this->isRequestedReport($goalId)) {
+               continue;
+            }
             $metricsToSum = Goals::getGoalColumns($goalId);
             foreach ($metricsToSum as $metricName) {
                 $fieldsToSum[] = self::getRecordName($metricName, $goalId);
@@ -478,6 +481,9 @@ class Archiver extends \Piwik\Plugin\Archiver
         $columnsAggregationOperation = null;

         foreach ($goalIdsToSum as $goalId) {
+            if (!$this->isRequestedReport($goalId)) {
+                continue;
+            }
             // sum up the visits to conversion data table & the days to conversion data table
             $this->getProcessor()->aggregateDataTableRecords(
                 array(self::getRecordName(self::VISITS_UNTIL_RECORD_NAME, $goalId),
@@ -505,4 +511,11 @@ class Archiver extends \Piwik\Plugin\Archiver
         $this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::NEW_VISITOR_SEGMENT);
         $this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::RETURNING_VISITOR_SEGMENT);
     }
+
+    protected function isRequestedReport(string $reportName)
+    {
+        $requestedReport = $this->getProcessor()->getParams()->getArchiveOnlyReport();
+
+        return empty($requestedReport) || $requestedReport == 'goalreport_' . $reportName;
+    }
 }

Not sure re possible side effects.

I don't think the requestedReport would be forwarded to the processDependentArchive and would possibly need to be implemented there.

Powered by GitHub Issue Mirror