Compute processed metrics only when needed #7389
Labels
c: Performance
For when we could improve the performance / speed of Matomo.
Task
Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone
A while ago the processed metrics were refactored and are now defined in classes. We have one class for each processed metrics. That class defines how to compute a metric and how to format it.
These metrics are usually processed after the generic filters are executed. This is good as it is fast. Processing those metrics after the generic filters means we will executed them only on the rows that are returned / displayed as the DataTable was limited/truncated during the generic filters. Instead of eg 50k rows we will compute the processed metrics only on 100 rows which can easily make a difference of > 5 seconds per request.
In some cases the metrics are currently processed earlier. For example if someone sorts a table by a processed metric or if the pivoting feature is used. In this case we need to compute the processed metric before sorting, truncating and filtering, otherwise it would not be possible to sort by this metric (This means we sometimes compute all processed metrics for all 50k rows although not needed) There are currently three problems with that:
nb_actions_per_visit
depends onnb_actions
andnb_visits
. Once the processed metricnb_actions_per_visit
is computed it will be no longer updated meaning we might return wrong values depending on at which point this metrics is computed. Those dependent metrics can be eg changed byGroupBy
,Truncate
, ... but also by other filters and custom logic. Meaning changing anb_visits
ornb_actions
value doesn't updatenb_actions_per_visit
. This can result in bugs see eg this report and you will notice that egavg_time_on_page
is wrong because of this: http://demo.piwik.org/index.php?module=API&method=Actions.getPageUrls&idSite=7&period=day&date=2015-03-08&format=JSON&token_auth=anonymous&filter_truncate=1&filter_sort_column=avg_time_on_pageWe can split this ticket into three tickets. Improving the performance by only calculating the needed processed metrics will also improve how often the bugs occur that's why I created only one issue. Also we need to find a solution that actually fixes all those problems so might be easier to work on all issues at once.
The text was updated successfully, but these errors were encountered: