Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce additional cache layer for heavy API calls #7143

Closed
mgazdzik opened this issue Feb 5, 2015 · 8 comments
Closed

Introduce additional cache layer for heavy API calls #7143

mgazdzik opened this issue Feb 5, 2015 · 8 comments
Labels
answered For when a question was asked and we referred to forum or answered it. c: Performance For when we could improve the performance / speed of Matomo. c: Usability For issues that let users achieve a defined goal more effectively or efficiently. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Milestone

Comments

@mgazdzik
Copy link
Contributor

mgazdzik commented Feb 5, 2015

For big instances some API calls are extremely slow (for ex. ranges, flattened Action or Events reports, etc.). It may take significant ammount of time to process even having them pre-processed in archives.
To tackle this, there could be additional caching layer above archives, which would store this data and make it easily available for end-user.

Here are some ideas what would be required to acomplish this:

  • a way to 'order' caching for specific report (as there can be many varying params like limit, label filter, offset, flattening etc.)
  • way to inform user that his reports are ready for download,
  • storage to keep those reports,
  • layer to look up for cached reports,
  • way to schedule recurring delivery of certaing reports/requests (in many cases those reqeuests are recurring on weekly basis, or something like that); this could also include some kind of UI to schedule given requests for all sites, and/or all segments, and/or many different date ranges (i.e. last7/last10 etc),
  • probably a mangement panel to define cache purging to allow better control over cache size

Please let me know what do you think about it ?

@mgazdzik mgazdzik added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. c: Performance For when we could improve the performance / speed of Matomo. c: Usability For issues that let users achieve a defined goal more effectively or efficiently. labels Feb 5, 2015
@RMastop
Copy link
Contributor

RMastop commented Feb 5, 2015

I get a lot of requests on reports for custom date ranges. In 'larger' environments they can be quite time and resource consuming. Being able to schedule a custom date range report and be notified when data is ready would be of great use!
That would also come in handy when using segments, creating some segments on large instances will archive all historic data, that could make the archiving of other more recent data fail or postpone, where most of the time a user only wants the segment data for a certain period. But that must be something to be added to the segment plugin, I guess.

@mattab
Copy link
Member

mattab commented Feb 19, 2015

For big instances some API calls are extremely slow (for ex. ranges, flattened Action or Events reports, etc.). It may take significant ammount of time to process even having them pre-processed in archives.

In general it would be good if we managed to solve the performance problems without using API level caching when possible. For example for large flat=1 requests, hopefully we can find a solution in #6763 where for example, maybe we could pre-process the flat=1 and archive them like other reports (just an idea we discussed today with @tsteur ). If this doesn't work i'm still hoping we can find other algorithmic solutions to this challenge.

Date range is another use case where the requests are time+resource intensive. I would love if we could find a smart solution to make processing date ranges very fast like they are in google analytics. I'm sure it's possible if we think about alternative implementations and profile the code better. Hopefully we can make progress as well in #6763 -- but that issue is specifically for sub-tables. Maybe we also have problems with "date range" requests that involve sub-tables? if so, we should create a separate issue specifically for date range requests which don't involve sub-tables.

@RMastop @mgazdzik what would help as a next step for this issue is if you could post here all HTTP requests that are "heavy" as per your experience with clients and large environment?

@mattab mattab added this to the Short term milestone Feb 19, 2015
@mgazdzik
Copy link
Contributor Author

I think that pre-processing all reports for flattened versions may cause significant increase in archive tables size (basically every report would be stored twice). Also this won't solve problem with custom date ranges processing.

My general thought was that maybe there could be easy mechanism to pick which requests user wants to have pre-processed, and therefore process only what is really needed instead of processing everything that can be. A single mechanism would solve also further problems (like processing certaing segment only for one plugin, instead of processing it for all).

After all maybe it will not be very big effort to prepare something like that, to provide initial functionality before actual problems will be resolved ?

@RMastop
Copy link
Contributor

RMastop commented Mar 5, 2015

hi @mattab, thanks for followup on this issue.
The one I have the most problems with, is this API call:
module=API&method=Live.getLastVisitsDetails
I do use segment, filter_limit and filter_offset in this call, however the memory usage on these calls are going through the roof. 1 GB of memory is not enough to get these datasets.
We have for instance a site that has 120.000-ish visits, with 1000s of unique pages.
We get the data in chunks of 1000 records and as long as the offset is not that high, everything is well.
However at some point the system tries to allocate more memory than available, even when you set the filter_limit to 1.

@mattab
Copy link
Member

mattab commented Mar 5, 2015

Hi Richard,

could you report this issue with details, URLs, etc. to Piwik support?
it would be good to investigate and find out what is causing this memory
usage. Hopefully this is something we can improve on its own (without
having to introduce caching which would not actually solve this problem)

Update: issue is described in #7458

@RMastop
Copy link
Contributor

RMastop commented Mar 6, 2015

Thanks @mattab, email has been sent.

@mattab mattab modified the milestones: Mid term, Short term Mar 13, 2015
@mattab
Copy link
Member

mattab commented Mar 24, 2015

this API performance issue will be fixed by making the Live API code more efficient: #7458

@mattab
Copy link
Member

mattab commented Mar 24, 2015

For big instances some API calls are extremely slow (for ex. ranges, flattened Action or Events reports, etc.).

Flattened performance was improved / fixed in 2.12.0 👍

If anyone experiences performance issues with a particular API, please report here or create an issue with the details. (ideally we would postpone implementing API caching until required)

@mattab mattab closed this as completed Nov 26, 2015
@mattab mattab added the answered For when a question was asked and we referred to forum or answered it. label Nov 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it. c: Performance For when we could improve the performance / speed of Matomo. c: Usability For issues that let users achieve a defined goal more effectively or efficiently. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

3 participants