Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive Blob Tables bloating in size #15930

Closed
James-Oakley opened this issue May 10, 2020 · 15 comments
Closed

Archive Blob Tables bloating in size #15930

James-Oakley opened this issue May 10, 2020 · 15 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo.
Milestone

Comments

@James-Oakley
Copy link

This is a follow-up to #15831.

Since upgrading to 3.13.5, I was able to get my bloated archive tables back to their proper size by running ./console core:purge-old-archive-data all.

However, since then, the tables began to grow in size again.

Things are much better since 3.13.5. I was able to shrink the tables again by running the purge-old-archive-data command again. My largest archive blog table then shrank from 50,456 rows and 236 MiB back to 9,910 rows and 7.0 MiB. So I now have a way to solve the problem of the bloated tables (whereas, before 3.13.5, I had no way to shrink those tables).

However, they are surely not supposed to grow to the point that they need pruning with a manual command. The table referred to above is for January 2020, to which no new data is added, so I'd expect that archive table to remain the same size.

It may well be that the problem is actually my end. Having read the documentation, I'm running a maintenance command once every 10 minutes:

/usr/bin/php ~/public_html/console core:archive --url=https{domain} >> {log-file}

Is there another command that, when Matomo is behaving correctly, needs running on schedule to keep things running smoothly? (I realise I could run purge-old-archive-data with cron, but is it correct behaviour for me to need to do so?)

Or are the tables still bloating when they shouldn't be?

@LaurentOnechai
Copy link

Same issue on my side. After the 3.13.5 update, my database went down back to 25 MiB after having hit 410 MiB with version 3.13.4. Those 20 MiB has been more or less its size for months before v 3.13.4.
But after the expected shrink following the 3.13.5 update, it started again to grow every day by 9 MiB hitting now 112 MiB.

@tsteur tsteur added this to the 3.13.6 milestone May 11, 2020
@tsteur
Copy link
Member

tsteur commented May 11, 2020

Putting it for now into 3.13.6 so we don't forget to have a look. It might be possible that eg archives are cleaned up only weekly and they used to be cleaned up daily (just a guess)

@daeuble
Copy link

daeuble commented May 11, 2020

Same here on our 3.13.5 installation.

@panpomaly
Copy link

After the archive got shrunk by the update to 3.13.5, we are also experiencing the same growth again as before.

But It seems to only affect the archives for the same month as before: January (year-archives), March and April. These are only the months, when the installation ran with Matomo 3.13.4 plus the year-archives. No abnormal growth for the month after updating (at least yet) and before.

@diosmosis
Copy link
Member

Can someone experiencing the problem try to run the following command (from the matomo root directory):

./console --matomo-domain=mamp core:run-scheduled-tasks 'Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives'

and see if it shrinks the tables? And do you all have cron archiving setup?

@tassoman
Copy link
Contributor

I have scheduled tasks running in my staging environment and a MultiGiga sized DB... Lemme try the command line before of this scheduling ... 🤔

@James-Oakley
Copy link
Author

And do you all have cron archiving setup?

As per my op, yes - I posted the cron line I run every 10 minutes

and see if it shrinks the tables?

Here's the output from running this:

INFO [2020-05-18 09:04:15] 17423  Scheduler: executing task Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives...
INFO [2020-05-18 09:04:15] 17423  Found 294 invalidated archives safe to delete in {prefix}_archive_numeric_2020_05.
INFO [2020-05-18 09:04:15] 17423  Scheduler: finished. Time elapsed: 0.072s

The table in my database that bloats the most is always 2020_01, and that didn't change size at all when I ran the Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives command

I then wanted to test to see if the 2020_01 table needed shrinking at this point in time, so ran purge-old-archive-data again. That did shrink it, so it was the command not shrinking the table, rather than the table not needing to be shrunk.

Hopefully @tassoman can confirm soon with his larger setup.

@tassoman
Copy link
Contributor

I started with a global dimension of 41.3G. Then I ran a first core:purge-old-archive-data all, I got a global dimension of 39.8G. Divided into:

  • Data: 36.4G
  • Indexes: 3.3G
  • Rows: 59,223,480

Then I ran Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives command with following output:

bash-4.2$ /opt/rh/rh-php73/root/bin/php /var/www/html/analytics/console core:run-scheduled-tasks 'Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives' --matomo-domain=test
INFO [2020-05-18 09:50:33] 31894  Scheduler: executing task Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives...
INFO [2020-05-18 09:50:33] 31894  Found 27 invalidated archives safe to delete in piwik_archive_numeric_2020_05.
INFO [2020-05-18 09:50:33] 31894  Scheduler: finished. Time elapsed: 0.239s
Time elapsed: 0.239s

After this, I ran again the purge all command ending up with the the same dimension but less rows: 56,600,334

@diosmosis
Copy link
Member

I suspect this change will solve the problem for the start of year table: https://github.com/matomo-org/matomo/pull/15963/files . Could someone give it a try and see if the Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives reduces the size of the January table?

If other tables are also bloated and the task is not taking care of duplicates, then please let me know which months are bloated. I'll have to dig further to figure out why something like that would happen.

@James-Oakley
Copy link
Author

I suspect this change will solve the problem for the start of year table: https://github.com/matomo-org/matomo/pull/15963/files . Could someone give it a try and see if the Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives reduces the size of the January table?

I applied that patch, and then Tasks.purgeInvalidatedArchives did shrink 2020_01.

But, I then ran purge-old-archive-data, and the table size in MiB dropped a further 30% (row count unchanged).

So the patch fixes why purgeInvalidatedArchives wasn't shrinking January, but it seems that purgeInvalidatedArchives is not fixing all of the residual bloating.

@diosmosis
Copy link
Member

purge-old-archive-data will also optimize the tables causing unused space to be released (this is done in a separate scheduled task that is currently done monthly. If the row count didn't change when running the command then it means the patch worked as expected. Thanks for going through the trouble of testing!

@tsteur
Copy link
Member

tsteur commented May 20, 2020

@diosmosis is this issue done with the merge of #15963

@diosmosis
Copy link
Member

@tsteur should be, unless non-january tables are also still growing. That would be due to a different cause.

@tsteur
Copy link
Member

tsteur commented May 20, 2020

👍 I'll close this issue for now. If anyone still has the issue after the next update please comment and we'll be happy to look into it again. Thanks everyone for the feedback and letting us know.

@tsteur tsteur closed this as completed May 20, 2020
@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label May 20, 2020
@mattab mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Jun 4, 2020
@MatomoForumNotifications

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/unable-to-reduce-size-of-bloated-of-matomo-archive-blob-2020-01-table-help/47726/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

No branches or pull requests

9 participants