The goal of this ticket is to discuss how we could improve the speed and efficiency of the custom date range report aggregation. Currently, archiving custom date ranges is slow.
Example today's email to firstname.lastname@example.org: on the All Websites dashboard when default URL is
module=MultiSites&action=index&idSite=2&period=range&date=last30 then user experiences slowness because the data is not pre-process and wil be processed . User said
takes forever to load, only 10 websites. Almost faster to open each one... and this is one of many.
Update dec 2014: was fixed in #6672
@tsteur this request for example is slow: http://demo.piwik.org/index.php?module=CoreHome&action=index&date=2013-03-05,2015-03-11&period=range&idSite=1#/module=Actions&action=menuGetPageUrls&date=2013-03-05,2015-03-11&period=range&idSite=1 - it took about 40 seconds to archive. Maybe we could make this kind of large date range much faster?
This request takes 2.3 seconds when I request it... and it should be even faster once #7409 is merged
One thing that I noticed and took me a while to figure out was that, if someone actually uses range dates, one should disable browser archiving. Otherwise it will always re-archive the last day, week, month or year depending on the range. We might have to do this automatically (disable browser archiving for some subperiods if range is used and an archive already exists)
Note: Once we do pre-archive range dates this can become a problem as it would always pre-archive the last year / month / week / day as it will be always authorized to archive
A lot of improvements were made here. We have to decide next week how we want to continue with this problem. It might make sense to make further improvements when working on #7470 (refactoring the Archiver). One idea was for example to build the range only the requested record. This is not easy to add to the current implementation of the archiver but would bring quite a bit of improvement.
Another idea could be to sometimes substract range dates. Eg if today is
2015-03-19 and one fetches
2014-12-20,yesterday (yesterday will be very often the case) we could fetch the year of
2015 and substract the
2015-03-19 archive. Same if we have 10 months. Instead of fetching 10 monthly archives, we could fetch 1 yearly archive and substract 2 monthly archives. This is quite hard to implement though.
Easiest to implement and probably the fastest solution as well would be to only fetch the requested recordName and only the requested 1st level table or only the requested subtable. I tested it and it is very fast and easy to implement. Problem is it does not work with subtableIds. We'd have to use labels as subtables as we can generate the subtableIds only if we build the expanded table. Building the expanded table is expensive (in terms of needed time) again.