Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After updating the 2019_01 archive blob tables are gaining size #10439

Closed
WebNashr opened this issue Aug 25, 2016 · 20 comments · Fixed by #15154
Closed

After updating the 2019_01 archive blob tables are gaining size #10439

WebNashr opened this issue Aug 25, 2016 · 20 comments · Fixed by #15154
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@WebNashr
Copy link

My total database size was 130 MB but after the update it reached more than 1GB overnight. Changing the "Delete old archived reports" settings and "Purge DB now" didn't do a thing.

Due to database size limitations of my host provider I deleted the piwik_archive_blob and piwik_archive_numeric tables except for the last and current months, using phpMyAdmin. The total size of database became 180 MB, more than before the update, but acceptable.

Now a day later, those deleted tables are back and are gaining size! and my database size has reached 1 GB.

database usage administration piwik 1

Running "console core:purge-old-archive-data" and "console core:run-scheduled-tasks" didn't help at all.

@mattab
Copy link
Member

mattab commented Sep 27, 2016

Hi @WebNashr
Are you using latest Piwik version? or maybe it is a normal behavior that for each month you have between 4 and 20Mb of archived data...

We fixed several issues related to purging old data in #7181 - it's possible there is still some issue left, but in your screenshot it is not obvious that there is an issue, as it could be normal to have such disk space used.

How many visits/actions do you get per month?

@mattab mattab added the Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. label Sep 27, 2016
@WebNashr
Copy link
Author

Hi
I'm using the latest version (2.16.2)

My total database size was 130 MB but after the update it reached more than 1GB overnight

So It's not normal for me to have a database this big.

I deleted the piwik_archive_blob and piwik_archive_numeric tables except for the last and current months, using phpMyAdmin. The total size of database became 180 MB, more than before the update, but acceptable. Now a day later, those deleted tables are back and are gaining size! and my database size has reached 1 GB.

In the picture I specified the deleted tables which are back and are gaining size. It's not obvious that there is something wrong?

@mattab
Copy link
Member

mattab commented Sep 27, 2016

Now a day later, those deleted tables are back and are gaining size!

The tables are back because that's where piwik stores the report data. It's normal and needed for these tables to be there.

How many visits/actions do you get per month?

@WebNashr
Copy link
Author

WebNashr commented Sep 28, 2016

10,000 visists and 27000 pageviews for 2000 sub-sites, but as I mentioned above my regular database size before the update was 120M.
As for the tables I had deleted the tables belonging to the past months and I have set the piwik to delete old archived reports.
In the "schedule old data deletion" setting I see "estimated database size after purge: 575.8 M" but when I press "purge db now" the db size does not change at all.

@mattab mattab modified the milestones: 3.0.0-rc, Mid term Nov 11, 2016
@mattab mattab added Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. and removed Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. labels May 29, 2019
@mattab
Copy link
Member

mattab commented May 29, 2019

This is affecting a few users (already 3-4 duplicated tickets)... and customers too. And the fact that the DB is way bigger than normal will cause all sorts of issues (slow backups, increased costs...).

-> Maybe the solution to this issue would be to run the console command ./console core:purge-old-archive-data all every week or every month as a scheduled task in Matomo?

In theory it's not needed because there is already a daily scheduled task which has a similar effect as calling core:purge-old-archive-data, the daily scheduled task is defined here: https://github.com/piwik/piwik/blob/3.0.1/plugins/CoreAdminHome/Tasks.php#L41-L43
But the console command ./console core:purge-old-archive-data all does more work and so maybe we should also call this command in a scheduled task?

@mattab mattab modified the milestones: Backlog (Help wanted), 3.12.0 May 29, 2019
@tsteur
Copy link
Member

tsteur commented May 29, 2019

But the console command ./console core:purge-old-archive-data all does more work and so maybe we should also call this command in a scheduled task?

If so, be good if this was a setting so we can disable it.

@mattab mattab changed the title After updating old archive tables are gaining size Run core:purge-old-archive-data regularly to prevent archive tables gaining size Oct 30, 2019
@tsteur tsteur modified the milestones: 3.13.0, 3.14.0 Nov 5, 2019
@mattab mattab modified the milestones: 3.14.0, 4.1.0 Nov 6, 2019
@mattab
Copy link
Member

mattab commented Nov 7, 2019

People keep reporting this issue, recently by email, and now in the forums at https://forum.matomo.org/t/size-of-piwik-archive-blob-files-is-astronomical/32048/2

My current answer is:

Can you please try to run the console command:
./console core:purge-old-archive-data all
and confirm if this helps?

Once people confirm this solves the problem we could run this in Matomo (and have setting to have it disabled).

@mattab mattab modified the milestones: 4.1.0, 3.13.0 Nov 7, 2019
@tsteur
Copy link
Member

tsteur commented Nov 8, 2019

@mattab could we please move this out of 3.13.0?

We run pretty much everything already in tasks and not seeing much that it otherwise doesn't do. Be only a minor improvement and be good to not change anything for now.

does more work and so maybe we should also call this command in a scheduled task?

Not sure what you specifically refer to here.

@mattab mattab changed the title Run core:purge-old-archive-data regularly to prevent archive tables gaining size Run core:purge-old-archive-data regularly to prevent archive blob tables gaining size Nov 11, 2019
@mattab mattab changed the title Run core:purge-old-archive-data regularly to prevent archive blob tables gaining size After updating the 2019_01 archive blob tables are gaining size Nov 11, 2019
@mattab
Copy link
Member

mattab commented Nov 11, 2019

Someone tried the command but it didn't help them:

we ran the command yesterday in the morning, but there was no significant effect. The file became bigger, but the increase was not as high as before.
To be honest, we expected the file to decrease in size significantly.
Before running ./console core:purge-old-archive-data all
41G Nov 6 09:52 piwik_archive_blob_2019_01.ibd
After running ./console core:purge-old-archive-data all
43G Nov 7 16:22 piwik_archive_blob_2019_01.ibd

Did the command help anyone at all? Maybe it doesn't help and the issue is somewhere else.

Also it seems to be a regression in 3.12.0 as we get quite a few new reports of people who specifically didn't have the issue before.

@tsteur
Copy link
Member

tsteur commented Nov 11, 2019

It might be expected if there are users having a range date as default period to load.

And it might be otherwise caused by #15086

@tsteur
Copy link
Member

tsteur commented Nov 12, 2019

FYI: On some of our users' DB we see a lot of archives in calendar week 44. A lot of them with invalidated flag. (value=4 vs OK archive having value=1)

image

Up to 3.11 we used to have temporary archives which were deleted daily. In 3.12 we remove them and only have invalidated and done/OK archives. We still run a logic to delete no longer needed archives daily in the purgeInvalidatedArchives task but I suppose this logic might not fully work anymore and needs to be updated. Not sure if there's maybe an issue with ArchivesToPurgeDistributedList.

@osantiano
Copy link

We also noticed a massive CPU/Load/Net raise on our DB server(s) after the 3.11 to 3.12 migration (also with the heavy HD usage mentioned here)

@mattab mattab added the Regression Indicates a feature used to work in a certain way but it no longer does even though it should. label Nov 13, 2019
@tobiasnteireho
Copy link

I noticed that the title has changed to highlight the 01_2019 blob and although the piwik_archive_blob_2019_01 is the largest blob for us the piwik_archive_blob_2019_11 is also far larger than expected at 4.7GB with the next largest blob being 210MB

@diosmosis
Copy link
Member

Hey everyone, this commit appears to fix this issue: 3deb35f

After applying the changes, run ./console core:purge-old-archive-data all. It may take a while based on how many rows it deletes.

@diosmosis diosmosis self-assigned this Nov 14, 2019
@toredash
Copy link
Contributor

Hey everyone, this commit appears to fix this issue: 3deb35f

After applying the changes, run ./console core:purge-old-archive-data all. It may take a while based on how many rows it deletes.

Do we need to run this command specifically, or will it be covered by the scheduled tasks routine ?

What are the consequences of running this command ? It this still the correct outcome:
1. purge any stale reports in the database as well as reports that were marked for deletion, or were for old date custom ranges, or reports that were invalidated.
https://matomo.org/docs/managing-your-databases-size/#purging-and-optimizing-the-database-using-a-console-command

@toredash
Copy link
Contributor

toredash commented Nov 16, 2019

Just wanted to chime in on the issues other have reported on;

After 3.12 got applied in our env. we have for the past 2 weeks seen a ~30% increase in CPU, compared to baseline from 3.11.

And as others have reported:
image

@diosmosis
Copy link
Member

Do we need to run this command specifically, or will it be covered by the scheduled tasks routine ?

The scheduled task will not purge much of the accumulated extra archives, it will only purge what has been marked for purging since the last purge. So you would want to run that command to get rid of everything that doesn't need to be there.

What are the consequences of running this command ?

It will delete every archive that is not valid and not the latest for the site/date/period/segment. So if there are two archives for 2019-10-23, idSite = 1, all visits segment, where one has ts_archived = 2019-10-23 10:00:00 and the other has ts_archived = 2019-10-24 12:02:00, it will keep the second and delete the first.

If you are worried about it deleting archive data for purged raw data, you can also just run it on the tables that are too large, and ignore the rest. Or create backups.

@mattab
Copy link
Member

mattab commented Nov 20, 2019

we just published 3.13.0-b1 which should fix the issue, could you try upgrade to latest beta with https://matomo.org/faq/how-to-update/faq_159/ and let us know if the issue is fixed after 24 hours?

Thanks

@c-schmitz
Copy link

Was the issue also the reason for the increased CPU load, directly or indirectly?

@tsteur
Copy link
Member

tsteur commented Nov 22, 2019

@c-schmitz quite likely yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants