Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible CronArchive (& core archiving) algorithm improvements #6754

Closed
diosmosis opened this issue Dec 1, 2014 · 2 comments
Closed

Possible CronArchive (& core archiving) algorithm improvements #6754

diosmosis opened this issue Dec 1, 2014 · 2 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Milestone

Comments

@diosmosis
Copy link
Member

There are some parts of the CronArchive algorithm that seem inefficient. We could probably optimize and simplify the algorithm with the following changes:

  • Instead of running archiving for lastN days/weeks/months/etc., the date range to initiate archiving should be determined (either by last archiving time or by command line option) and the optimal archiving requests should be made. It makes no sense to archive for last 50 days and then archive for the last 2 years.
    • Based on the date range, day archiving for every day in the range should be scheduled individually, then week, then month, then year. This way, period archiving will not actually initiate archiving for sub-periods and every individual archiving request can be parallelized.
  • Implement bottom up aggregation for period archiving (ie, mimic incremental map reduce). Right now, when cron archive is run again, the year, month, week and day periods for today are re-archived completely. That is to say, when archiving for a year, the data for every month is accessed and aggregated in PHP. This is slow. Instead, we can take the new data for the current day and add it to existing week + month + year periods. May run into issues w/ report limiting features.
@diosmosis diosmosis added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Dec 1, 2014
@mattab mattab added this to the Mid term milestone Dec 1, 2014
@noogen
Copy link

noogen commented Jan 15, 2015

On a similar topic, I'm using 2.10.0 and the website UI is slow 1+ minutes for 6 millions record archive table. I trace it down to the daily archive blob being update/invalidate to "done" every time the UI try to read from archive. Would anyone see an issue in moving one line of code in https://github.com/piwik/piwik/blob/master/core/Archive.php?

Commented out line 561: $this->invalidatedReportsIfNeeded();
Add to line 864: $this->invalidatedReportsIfNeeded();

Only invalidate archive during archive prep? Or is there a better location to move this to? So far, website response great and archive still call the invalidate method.

@mattab
Copy link
Member

mattab commented Jan 15, 2015

Hi @noogen can you please create a new bug report for your problem? it's a bug and we would like to fix it, but this issue is about something else. Thanks!

@mattab mattab closed this as completed Jul 8, 2016
@mattab mattab added wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. c: Performance For when we could improve the performance / speed of Matomo. labels Jul 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

3 participants