Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob table 4x bigger before doing purge-old-archive-data #11322

Closed
gaumondp opened this issue Feb 6, 2017 · 6 comments
Closed

Blob table 4x bigger before doing purge-old-archive-data #11322

gaumondp opened this issue Feb 6, 2017 · 6 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo. duplicate For issues that already existed in our issue tracker and were reported previously.

Comments

@gaumondp
Copy link

gaumondp commented Feb 6, 2017

I'm still wondering how this still happens. My January piwik_archive_blob_2017_01 went from 31.9 GB to 800 MB (12 M rows to 161k rows) after doing ./console core:purge-old-archive-data all

I'm well aware of ticket #7181 (Report archives have tripled in size) but with a database that went from63 GB to 28.1 GB total I'm looking for a long term solution.

Piwik 2.17.1 (can't upgrade to 3.x until few months)
PHP 5.5.x, Apache 2.4.x, MySQL 5.5.x
All tests green in Piwik System Check.
Active non-Core plugins : Logviewer, PlatformReport, RestrictLaguageSelection,SecurityInfo and SimpleSysMon.

Here are my last 3 months (last time I run purge-old-archive-data) :

Table Size Before/after Rows before/after
piwik_archive_blob_2017_02 1.9 GB / 417 MB 501,726 / 379,490
piwik_archive_blob_2017_01 31.9 GB / 800.5 MB 12,179,895 / 161,802
piwik_archive_blob_2016_12 171.8 MB / 85.6 MB 82,220 / 14,695

No errors in my PHP or Apache log nor in Piwik.

@tsteur
Copy link
Member

tsteur commented Feb 6, 2017

I think the results may be a bit bigger in January because it stores yearly archives there etc (I think). Also I don't know how much data PlatformsReport archives but that's quite a difference. I'm not into #7181 but am wondering whether you have "browser archiving" enabled and/or how often your cronjob runs?

@gaumondp
Copy link
Author

gaumondp commented Feb 6, 2017

Thanks for answering, highly appreciated.

  1. Platform Report. I checked and I'm still running the latest version available. But why such a report data would be "flushed" by purge-old-archive-data ?

  2. Browser archiving is at OFF since 2013 and we run Cronjob every 15 minutes. We got around 150,000 actions total every day from 10 different sites.

  3. NEW: Config file difference from the default settings :
    datatable_archiving_maximum_rows_actions = 2000 (500 default)
    datatable_archiving_maximum_rows_events = 2000 (500 default)
    datatable_archiving_maximum_rows_subtable_actions = 2000 (100 default)
    enable_processing_unique_visitors_year = 1

And my DB is on different MySQL server (4 cores, 12 GB of RAM) and nothing else run on that server.

@mattab mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Feb 21, 2017
@mattab
Copy link
Member

mattab commented Feb 21, 2017

Looking at the code it seems that already there is a daily scheduled task which should have the same effect as calling core:purge-old-archive-data

the daily scheduled task is defined here: https://github.com/piwik/piwik/blob/3.0.1/plugins/CoreAdminHome/Tasks.php#L41-L43

@gaumondp when you check your core:archive output logs for 1 or 2 days, do you see this scheduled task purgeOutdatedArchives being executed?

@mattab mattab added this to the Priority Backlog (Help wanted) milestone Feb 21, 2017
@gaumondp
Copy link
Author

Here is my current cronjob :

*/15 * * * * /usr/bin/php /piwik/console core:archive --url=http://stats.site.com >> /logs/piwik-console-cron217-1.log

Looking at the last 2 days I'm seeing an error message I didn't notice:

INFO [2017-02-21 14:45:01] Running Piwik 2.17.1 as Super User INFO [2017-02-21 14:45:01] --------------------------- INFO [2017-02-21 14:45:01] NOTES INFO [2017-02-21 14:45:01] - Reports for today will be processed at most every 600 seconds. You can change this value in Piwik UI > Settings > General Settings. INFO [2017-02-21 14:45:01] - Reports for the current week/month/year will be refreshed at most every 3600 seconds. INFO [2017-02-21 14:45:01] - Archiving was last executed without error 72 days 3 hours ago INFO [2017-02-21 14:45:01] - Will process 19 other websites because the last time they were archived was on a different day (in the website's timezone) , IDs: 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 INFO [2017-02-21 14:45:01] - Will process 1 other websites because some old data reports have been invalidated (eg. using the Log Import script) , IDs: 6 INFO [2017-02-21 14:45:01] --------------------------- INFO [2017-02-21 14:45:01] START

In fact I really erased all my piwik_archive_blob_2016_* and piwik_archive_numeric_2016_* but only website I got a problem is siteId 6 for annual reports only. Other reports are ok and consistant. The log from cron has 0 visit for annual reports (There is no data for this report.) :

INFO [2017-02-21 14:47:08] Archived website id = 6, period = year, 0 segments, 0 visits in last 7 years, 0 visits this year, Time elapsed: 51.443s INFO [2017-02-21 14:47:08] Will pre-process for website id = 6, period = range, date = last7 INFO [2017-02-21 14:47:08] - pre-processing all visits INFO [2017-02-21 14:47:09] Archived website id = 6, period = range, 0 segments, 56352 visits in last 7 ranges, 56352 visits this range, Time elapsed: 1.083s INFO [2017-02-21 14:47:09] Archived website id = 6, 5 API requests, Time elapsed: 102.837s [4/19 done

And today my Report table has February at 13 GB and January at 4 GB. So it looks there are never any purgeOutdatedArchives run.

Am I supposed to see anything about purgeOutdatedArchives in my cron log ? Remember, I'm still using 2.17.1.

Thanks!

@mattab
Copy link
Member

mattab commented Jun 9, 2017

Remember, I'm still using 2.17.1.

Hi @gaumondp
Have you now upgraded to Piwik 3 and if so do you still experience this issue? we would like to get to bottom of this one, if it still occurs

@mattab
Copy link
Member

mattab commented May 29, 2019

Marking as duplicate of #10439

@mattab mattab closed this as completed May 29, 2019
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. duplicate For issues that already existed in our issue tracker and were reported previously.
Projects
None yet
Development

No branches or pull requests

3 participants