@tsteur opened this Issue on March 9th 2015 Member

A while ago the processed metrics were refactored and are now defined in classes. We have one class for each processed metrics. That class defines how to compute a metric and how to format it.

These metrics are usually processed after the generic filters are executed. This is good as it is fast. Processing those metrics after the generic filters means we will executed them only on the rows that are returned / displayed as the DataTable was limited/truncated during the generic filters. Instead of eg 50k rows we will compute the processed metrics only on 100 rows which can easily make a difference of > 5 seconds per request.

In some cases the metrics are currently processed earlier. For example if someone sorts a table by a processed metric or if the pivoting feature is used. In this case we need to compute the processed metric before sorting, truncating and filtering, otherwise it would not be possible to sort by this metric (This means we sometimes compute all processed metrics for all 50k rows although not needed) There are currently three problems with that:

  • Although we might need only one processed metric, the logic always computes all metrics which is slow. Instead we should only compute the one metric that is needed and all others after the generic filters are run => performance improvement
  • The processed metrics might be wrong in case one of the dependent metrics is changed after the metric was processed. For example the metric nb_actions_per_visit depends on nb_actions and nb_visits. Once the processed metric nb_actions_per_visit is computed it will be no longer updated meaning we might return wrong values depending on at which point this metrics is computed. Those dependent metrics can be eg changed by GroupBy, Truncate, ... but also by other filters and custom logic. Meaning changing a nb_visits or nb_actions value doesn't update nb_actions_per_visit. This can result in bugs see eg this report and you will notice that eg avg_time_on_page is wrong because of this: http://demo.piwik.org/index.php?module=API&method=Actions.getPageUrls&idSite=7&period=day&date=2015-03-08&format=JSON&token_auth=anonymous&filter_truncate=1&filter_sort_column=avg_time_on_page
  • For some rows no processed metrics will be calculated if the rows were added after the metrics were processed initially.

We can split this ticket into three tickets. Improving the performance by only calculating the needed processed metrics will also improve how often the bugs occur that's why I created only one issue. Also we need to find a solution that actually fixes all those problems so might be easier to work on all issues at once.

Powered by GitHub Issue Mirror