Introduce additional cache layer for heavy API calls #7143

mgazdzik · 2015-02-05T11:35:13Z

For big instances some API calls are extremely slow (for ex. ranges, flattened Action or Events reports, etc.). It may take significant ammount of time to process even having them pre-processed in archives.
To tackle this, there could be additional caching layer above archives, which would store this data and make it easily available for end-user.

Here are some ideas what would be required to acomplish this:

a way to 'order' caching for specific report (as there can be many varying params like limit, label filter, offset, flattening etc.)
way to inform user that his reports are ready for download,
storage to keep those reports,
layer to look up for cached reports,
way to schedule recurring delivery of certaing reports/requests (in many cases those reqeuests are recurring on weekly basis, or something like that); this could also include some kind of UI to schedule given requests for all sites, and/or all segments, and/or many different date ranges (i.e. last7/last10 etc),
probably a mangement panel to define cache purging to allow better control over cache size

Please let me know what do you think about it ?

RMastop · 2015-02-05T13:50:17Z

I get a lot of requests on reports for custom date ranges. In 'larger' environments they can be quite time and resource consuming. Being able to schedule a custom date range report and be notified when data is ready would be of great use!
That would also come in handy when using segments, creating some segments on large instances will archive all historic data, that could make the archiving of other more recent data fail or postpone, where most of the time a user only wants the segment data for a certain period. But that must be something to be added to the segment plugin, I guess.

mattab · 2015-02-19T04:06:46Z

For big instances some API calls are extremely slow (for ex. ranges, flattened Action or Events reports, etc.). It may take significant ammount of time to process even having them pre-processed in archives.

In general it would be good if we managed to solve the performance problems without using API level caching when possible. For example for large flat=1 requests, hopefully we can find a solution in #6763 where for example, maybe we could pre-process the flat=1 and archive them like other reports (just an idea we discussed today with @tsteur ). If this doesn't work i'm still hoping we can find other algorithmic solutions to this challenge.

Date range is another use case where the requests are time+resource intensive. I would love if we could find a smart solution to make processing date ranges very fast like they are in google analytics. I'm sure it's possible if we think about alternative implementations and profile the code better. Hopefully we can make progress as well in #6763 -- but that issue is specifically for sub-tables. Maybe we also have problems with "date range" requests that involve sub-tables? if so, we should create a separate issue specifically for date range requests which don't involve sub-tables.

@RMastop @mgazdzik what would help as a next step for this issue is if you could post here all HTTP requests that are "heavy" as per your experience with clients and large environment?

mgazdzik · 2015-02-19T11:27:07Z

I think that pre-processing all reports for flattened versions may cause significant increase in archive tables size (basically every report would be stored twice). Also this won't solve problem with custom date ranges processing.

My general thought was that maybe there could be easy mechanism to pick which requests user wants to have pre-processed, and therefore process only what is really needed instead of processing everything that can be. A single mechanism would solve also further problems (like processing certaing segment only for one plugin, instead of processing it for all).

After all maybe it will not be very big effort to prepare something like that, to provide initial functionality before actual problems will be resolved ?

RMastop · 2015-03-05T13:00:11Z

hi @mattab, thanks for followup on this issue.
The one I have the most problems with, is this API call:
module=API&method=Live.getLastVisitsDetails
I do use segment, filter_limit and filter_offset in this call, however the memory usage on these calls are going through the roof. 1 GB of memory is not enough to get these datasets.
We have for instance a site that has 120.000-ish visits, with 1000s of unique pages.
We get the data in chunks of 1000 records and as long as the offset is not that high, everything is well.
However at some point the system tries to allocate more memory than available, even when you set the filter_limit to 1.

mattab · 2015-03-05T23:25:53Z

Hi Richard,

could you report this issue with details, URLs, etc. to Piwik support?
it would be good to investigate and find out what is causing this memory
usage. Hopefully this is something we can improve on its own (without
having to introduce caching which would not actually solve this problem)

Update: issue is described in #7458

RMastop · 2015-03-06T08:18:59Z

Thanks @mattab, email has been sent.

mattab · 2015-03-24T10:34:08Z

this API performance issue will be fixed by making the Live API code more efficient: #7458

mattab · 2015-03-24T10:36:01Z

For big instances some API calls are extremely slow (for ex. ranges, flattened Action or Events reports, etc.).

Flattened performance was improved / fixed in 2.12.0 👍

If anyone experiences performance issues with a particular API, please report here or create an issue with the details. (ideally we would postpone implementing API caching until required)

mattab mentioned this issue Feb 15, 2015

"Action" caching #7198

Closed

mattab added this to the Short term milestone Feb 19, 2015

mattab modified the milestones: Mid term, Short term Mar 13, 2015

mattab closed this as completed Nov 26, 2015

mattab added the answered For when a question was asked and we referred to forum or answered it. label Nov 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce additional cache layer for heavy API calls #7143

Introduce additional cache layer for heavy API calls #7143

mgazdzik commented Feb 5, 2015

RMastop commented Feb 5, 2015

mattab commented Feb 19, 2015

mgazdzik commented Feb 19, 2015

RMastop commented Mar 5, 2015

mattab commented Mar 5, 2015 •

edited

RMastop commented Mar 6, 2015

mattab commented Mar 24, 2015

mattab commented Mar 24, 2015

Introduce additional cache layer for heavy API calls #7143

Introduce additional cache layer for heavy API calls #7143

Comments

mgazdzik commented Feb 5, 2015

RMastop commented Feb 5, 2015

mattab commented Feb 19, 2015

mgazdzik commented Feb 19, 2015

RMastop commented Mar 5, 2015

mattab commented Mar 5, 2015 • edited

RMastop commented Mar 6, 2015

mattab commented Mar 24, 2015

mattab commented Mar 24, 2015

mattab commented Mar 5, 2015 •

edited