New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After updating the 2019_01 archive blob tables are gaining size #10439
Comments
Hi @WebNashr We fixed several issues related to purging old data in #7181 - it's possible there is still some issue left, but in your screenshot it is not obvious that there is an issue, as it could be normal to have such disk space used. How many visits/actions do you get per month? |
Hi
So It's not normal for me to have a database this big.
In the picture I specified the deleted tables which are back and are gaining size. It's not obvious that there is something wrong? |
The tables are back because that's where piwik stores the report data. It's normal and needed for these tables to be there. How many visits/actions do you get per month? |
10,000 visists and 27000 pageviews for 2000 sub-sites, but as I mentioned above my regular database size before the update was 120M. |
This is affecting a few users (already 3-4 duplicated tickets)... and customers too. And the fact that the DB is way bigger than normal will cause all sorts of issues (slow backups, increased costs...). -> Maybe the solution to this issue would be to run the console command In theory it's not needed because there is already a daily scheduled task which has a similar effect as calling |
If so, be good if this was a setting so we can disable it. |
People keep reporting this issue, recently by email, and now in the forums at https://forum.matomo.org/t/size-of-piwik-archive-blob-files-is-astronomical/32048/2 My current answer is:
Once people confirm this solves the problem we could run this in Matomo (and have setting to have it disabled). |
@mattab could we please move this out of 3.13.0? We run pretty much everything already in tasks and not seeing much that it otherwise doesn't do. Be only a minor improvement and be good to not change anything for now.
Not sure what you specifically refer to here. |
Someone tried the command but it didn't help them:
Did the command help anyone at all? Maybe it doesn't help and the issue is somewhere else. Also it seems to be a regression in 3.12.0 as we get quite a few new reports of people who specifically didn't have the issue before. |
It might be expected if there are users having a range date as default period to load. And it might be otherwise caused by #15086 |
FYI: On some of our users' DB we see a lot of archives in calendar week 44. A lot of them with invalidated flag. (value=4 vs OK archive having value=1) Up to 3.11 we used to have temporary archives which were deleted daily. In 3.12 we remove them and only have invalidated and done/OK archives. We still run a logic to delete no longer needed archives daily in the |
We also noticed a massive CPU/Load/Net raise on our DB server(s) after the 3.11 to 3.12 migration (also with the heavy HD usage mentioned here) |
I noticed that the title has changed to highlight the 01_2019 blob and although the piwik_archive_blob_2019_01 is the largest blob for us the piwik_archive_blob_2019_11 is also far larger than expected at 4.7GB with the next largest blob being 210MB |
Hey everyone, this commit appears to fix this issue: 3deb35f After applying the changes, run |
Do we need to run this command specifically, or will it be covered by the scheduled tasks routine ? What are the consequences of running this command ? It this still the correct outcome: |
The scheduled task will not purge much of the accumulated extra archives, it will only purge what has been marked for purging since the last purge. So you would want to run that command to get rid of everything that doesn't need to be there.
It will delete every archive that is not valid and not the latest for the site/date/period/segment. So if there are two archives for 2019-10-23, idSite = 1, all visits segment, where one has ts_archived = 2019-10-23 10:00:00 and the other has ts_archived = 2019-10-24 12:02:00, it will keep the second and delete the first. If you are worried about it deleting archive data for purged raw data, you can also just run it on the tables that are too large, and ignore the rest. Or create backups. |
we just published 3.13.0-b1 which should fix the issue, could you try upgrade to latest beta with https://matomo.org/faq/how-to-update/faq_159/ and let us know if the issue is fixed after 24 hours? Thanks |
Was the issue also the reason for the increased CPU load, directly or indirectly? |
@c-schmitz quite likely yes |
My total database size was 130 MB but after the update it reached more than 1GB overnight. Changing the "Delete old archived reports" settings and "Purge DB now" didn't do a thing.
Due to database size limitations of my host provider I deleted the piwik_archive_blob and piwik_archive_numeric tables except for the last and current months, using phpMyAdmin. The total size of database became 180 MB, more than before the update, but acceptable.
Now a day later, those deleted tables are back and are gaining size! and my database size has reached 1 GB.
Running "console core:purge-old-archive-data" and "console core:run-scheduled-tasks" didn't help at all.
The text was updated successfully, but these errors were encountered: