@mattab opened this Issue on March 30th 2015 Member

The goal of this issue is to make archiving faster for Date Range requests and Custom Segments requests. This follows up: #4768

Currently when a date range API query is issued, it will pre-process all reports for the given plugin being requested. For example when requesting the "Page URLs" report for a custom date range or segment, it will pre-process all Actions plugin reports including Page URLs, Page titles, Downloads, Outlinks.

In this issue we would like to improve the algorithm so that only the requested report (ie. a Blob Record) is archived.

When requesting for example the Page URL report, then it should only archive the page URL report in the actions plugin but no other reports within the same plugin. An exception to this is when a DB query is being used to query the data for multiple reports at once (we do this eg in Actions). Then we would also still archive all reports coming from that specific DB query. We wouldn't split up the DB queries into different queries.

If a report for goal ID =5 is being requested, then it should only archive this specific report for this specific goal but no other goal reports and no other reports for the same goal.

Similarly this should also work for other plugins like custom reports etc.

This may need to be done at the same time as #7470

@mattab commented on July 21st 2016 Member
@tsteur commented on May 3rd 2021 Member

@mattab @diosmosis I think now that we have the ability to archive only requested reports (and support partial archives) maybe we could use this also for ranges (which are aggregated during browser archiving usually).

Say you request a custom report and you have 50 custom reports configured, it means it will aggregate the ranges for all 50 custom reports. If some of them need to process unique metrics it will do this also for all of them even though you might never look at the data.

Same applies to goals, forms, funnels, ...

Each plugin's archivers could support isRequestedReport() maybe and we'd need to set setArchiveOnlyReport() in the archive parameters.

It might not be the best idea though to reuse this logic and maybe would need to use a different logic like in https://github.com/matomo-org/matomo/blob/4.3.0-b3/core/Archive.php#L538 forward record names or so to the archiver so it will only archive requested record names. Benefit be plugin developers won't need to change anything when requesting a report and only would need to check if a specific record name was requested in the archiver.

@mattab eventually should look at this maybe. Also from the background that it would not only make ranges faster but it could maybe at the same time eventually also make browser archiving faster in general (any period). Even if we can't fix it this may be more of a major issue

@mattab commented on May 5th 2021 Member

@tsteur this would be a high value improvement for UI performance and reducing DB usage, significantly in some cases. Sounds good to schedule it in a 4.x.0 :rocket:

@tsteur commented on May 5th 2021 Member

I was more meaning only adding a major issue but not scheduling it yet but added it for now to 4.5 milestone as it would be good to have but not 100% sure on it's impact yet (in some cases it be quite a big impact though). To be seen how easy it is to develop.

@tsteur commented on August 31st 2021 Member

Below a patch that seems to improve the most tricky situations without needing huge changes. Same could work for other entities like Funnels. Reports with entities are the biggest performance problem since there could be like 100 goals vs eg an action archiver always has a static number of archives.

diff --git a/core/Archive.php b/core/Archive.php
index 6ef1ccb09a..7257af6e73 100644
--- a/core/Archive.php
+++ b/core/Archive.php
@@ -827,10 +827,7 @@ class Archive implements ArchiveQuery
         $coreAdminHomeApi = API::getInstance();

-        $requestedReport = null;
-        if (SettingsServer::isArchivePhpTriggered()) {
-            $requestedReport = Common::getRequestVar('requestedReport', '', 'string');
-        }
+        $requestedReport = Common::getRequestVar('requestedReport', '', 'string');

         $periodString = $period->getRangeString();
         $periodDateStr = $period->getLabel() == 'range' ? $periodString : $period->getDateStart()->toString();
diff --git a/plugins/Goals/API.php b/plugins/Goals/API.php
index ad0bd4121f..e663a7e307 100644
--- a/plugins/Goals/API.php
+++ b/plugins/Goals/API.php
@@ -26,6 +26,7 @@ use Piwik\Plugin\ReportsProvider;
 use Piwik\Plugins\Goals\Columns\Metrics\GoalConversionRate;
 use Piwik\Segment;
 use Piwik\Segment\SegmentExpression;
+use Piwik\SettingsServer;
 use Piwik\Site;
 use Piwik\Tracker\Cache;
 use Piwik\Tracker\GoalManager;
@@ -528,6 +529,9 @@ class API extends \Piwik\Plugin\API
     public function getMetrics($idSite, $period, $date, $segment = false, $idGoal = false, $columns = array(), $showAllGoalSpecificMetrics = false)
+        $this->setRequestedReportIfNeeded($idGoal, $period);
         $archive = Archive::build($idSite, $period, $date, $segment);

         $showAllGoalSpecificMetrics = $showAllGoalSpecificMetrics && $idGoal === false;
@@ -582,6 +586,8 @@ class API extends \Piwik\Plugin\API
         }, $columnsToGet);
         $dataTable = $archive->getDataTableFromNumeric($inDbMetricNames);

+        $this->unsetRequestedReportIfNeeded();
         if (count($columnsToGet) > 0) {
             $newNameMapping = array_combine($inDbMetricNames, $columnsToGet);
         } else {
@@ -681,6 +687,22 @@ class API extends \Piwik\Plugin\API
         return $this->getNumeric($idSite, $period, $date, $segment, Archiver::getRecordName('revenue', $idGoal));

+    private function setRequestedReportIfNeeded($idGoal, $period)
+    {
+        if (!empty($idGoal) && $idGoal > 0 && $period === 'range'
+            && !SettingsServer::isArchivePhpTriggered()
+            && !isset($_GET['requestedReport'])) { // don't overwrite requested report if it's already requested for a different plugin or so
+            $_GET['requestedReport'] = 'goalreport_' . $idGoal;
+        }
+    }
+    private function unsetRequestedReportIfNeeded()
+    {
+        if (isset($_GET['requestedReport'])) { // todo only unset if we set it above
+            unset($_GET['requestedReport']);
+        }
+    }
      * Utility method that retrieve an archived DataTable for a specific site, date range,
      * segment and goal. If not goal is specified, this method will retrieve and sum the
@@ -699,6 +721,8 @@ class API extends \Piwik\Plugin\API

+        $this->setRequestedReportIfNeeded($idGoal, $period);
         $archive = Archive::build($idSite, $period, $date, $segment);

         // check for the special goal ids
@@ -708,6 +732,7 @@ class API extends \Piwik\Plugin\API
         $dataTable = $archive->getDataTable(Archiver::getRecordName($recordName, $realGoalId), $idSubtable = null);

+        $this->unsetRequestedReportIfNeeded();
         return $dataTable;

diff --git a/plugins/Goals/Archiver.php b/plugins/Goals/Archiver.php
index dc32c68538..1d49eda376 100644
--- a/plugins/Goals/Archiver.php
+++ b/plugins/Goals/Archiver.php
@@ -468,6 +468,9 @@ class Archiver extends \Piwik\Plugin\Archiver

         $fieldsToSum = array();
         foreach ($goalIdsToSum as $goalId) {
+            if (!$this->isRequestedReport($goalId)) {
+               continue;
+            }
             $metricsToSum = Goals::getGoalColumns($goalId);
             foreach ($metricsToSum as $metricName) {
                 $fieldsToSum[] = self::getRecordName($metricName, $goalId);
@@ -478,6 +481,9 @@ class Archiver extends \Piwik\Plugin\Archiver
         $columnsAggregationOperation = null;

         foreach ($goalIdsToSum as $goalId) {
+            if (!$this->isRequestedReport($goalId)) {
+                continue;
+            }
             // sum up the visits to conversion data table & the days to conversion data table
                 array(self::getRecordName(self::VISITS_UNTIL_RECORD_NAME, $goalId),
@@ -505,4 +511,11 @@ class Archiver extends \Piwik\Plugin\Archiver
         $this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::NEW_VISITOR_SEGMENT);
         $this->getProcessor()->processDependentArchive('Goals', VisitFrequencyAPI::RETURNING_VISITOR_SEGMENT);
+    protected function isRequestedReport(string $reportName)
+    {
+        $requestedReport = $this->getProcessor()->getParams()->getArchiveOnlyReport();
+        return empty($requestedReport) || $requestedReport == 'goalreport_' . $reportName;
+    }

Not sure re possible side effects.

I don't think the requestedReport would be forwarded to the processDependentArchive and would possibly need to be implemented there.

@tsteur commented on October 21st 2021 Member

In case we can't solve it easily with the current way the archivers work, then we would maybe wait until Matomo 5 to refactor the archiver as needed and change it properly.

@tsteur commented on November 6th 2022 Member

We're seeing daily issues around this problem mostly for one account only though.

Powered by GitHub Issue Mirror