@James-Oakley opened this Issue on May 10th 2020

This is a follow-up to #15831.

Since upgrading to 3.13.5, I was able to get my bloated archive tables back to their proper size by running ./console core:purge-old-archive-data all.

However, since then, the tables began to grow in size again.

Things are much better since 3.13.5. I was able to shrink the tables again by running the purge-old-archive-data command again. My largest archive blog table then shrank from 50,456 rows and 236 MiB back to 9,910 rows and 7.0 MiB. So I now have a way to solve the problem of the bloated tables (whereas, before 3.13.5, I had no way to shrink those tables).

However, they are surely not supposed to grow to the point that they need pruning with a manual command. The table referred to above is for January 2020, to which no new data is added, so I'd expect that archive table to remain the same size.

It may well be that the problem is actually my end. Having read the documentation, I'm running a maintenance command once every 10 minutes:

/usr/bin/php ~/public_html/console core:archive --url=https{domain} >> {log-file}

Is there another command that, when Matomo is behaving correctly, needs running on schedule to keep things running smoothly? (I realise I could run purge-old-archive-data with cron, but is it correct behaviour for me to need to do so?)

Or are the tables still bloating when they shouldn't be?

@lolobu commented on May 10th 2020

Same issue on my side. After the 3.13.5 update, my database went down back to 25 MiB after having hit 410 MiB with version 3.13.4. Those 20 MiB has been more or less its size for months before v 3.13.4.
But after the expected shrink following the 3.13.5 update, it started again to grow every day by 9 MiB hitting now 112 MiB.

@tsteur commented on May 11th 2020 Member

Putting it for now into 3.13.6 so we don't forget to have a look. It might be possible that eg archives are cleaned up only weekly and they used to be cleaned up daily (just a guess)

@daeuble commented on May 11th 2020

Same here on our 3.13.5 installation.

@panpomaly commented on May 15th 2020

After the archive got shrunk by the update to 3.13.5, we are also experiencing the same growth again as before.

But It seems to only affect the archives for the same month as before: January (year-archives), March and April. These are only the months, when the installation ran with Matomo 3.13.4 plus the year-archives. No abnormal growth for the month after updating (at least yet) and before.

@diosmosis commented on May 18th 2020 Member

Can someone experiencing the problem try to run the following command (from the matomo root directory):

./console --matomo-domain=mamp core:run-scheduled-tasks 'Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives'

and see if it shrinks the tables? And do you all have cron archiving setup?

@tassoman commented on May 18th 2020 Contributor

I have scheduled tasks running in my staging environment and a MultiGiga sized DB... Lemme try the command line before of this scheduling ... :thinking:

@James-Oakley commented on May 18th 2020

And do you all have cron archiving setup?

As per my op, yes - I posted the cron line I run every 10 minutes

and see if it shrinks the tables?

Here's the output from running this:

INFO [2020-05-18 09:04:15] 17423  Scheduler: executing task Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives...
INFO [2020-05-18 09:04:15] 17423  Found 294 invalidated archives safe to delete in {prefix}_archive_numeric_2020_05.
INFO [2020-05-18 09:04:15] 17423  Scheduler: finished. Time elapsed: 0.072s

The table in my database that bloats the most is always 2020_01, and that didn't change size at all when I ran the Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives command

I then wanted to test to see if the 2020_01 table needed shrinking at this point in time, so ran purge-old-archive-data again. That did shrink it, so it was the command not shrinking the table, rather than the table not needing to be shrunk.

Hopefully @tassoman can confirm soon with his larger setup.

@tassoman commented on May 18th 2020 Contributor

I started with a global dimension of 41.3G. Then I ran a first core:purge-old-archive-data all, I got a global dimension of 39.8G. Divided into:

  • Data: 36.4G
  • Indexes: 3.3G
  • Rows: 59,223,480

Then I ran Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives command with following output:

bash-4.2$ /opt/rh/rh-php73/root/bin/php /var/www/html/analytics/console core:run-scheduled-tasks 'Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives' --matomo-domain=test
INFO [2020-05-18 09:50:33] 31894  Scheduler: executing task Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives...
INFO [2020-05-18 09:50:33] 31894  Found 27 invalidated archives safe to delete in piwik_archive_numeric_2020_05.
INFO [2020-05-18 09:50:33] 31894  Scheduler: finished. Time elapsed: 0.239s
Time elapsed: 0.239s

After this, I ran again the purge all command ending up with the the same dimension but less rows: 56,600,334

@diosmosis commented on May 18th 2020 Member

I suspect this change will solve the problem for the start of year table: https://github.com/matomo-org/matomo/pull/15963/files . Could someone give it a try and see if the Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives reduces the size of the January table?

If other tables are also bloated and the task is not taking care of duplicates, then please let me know which months are bloated. I'll have to dig further to figure out why something like that would happen.

@James-Oakley commented on May 19th 2020

I suspect this change will solve the problem for the start of year table: https://github.com/matomo-org/matomo/pull/15963/files . Could someone give it a try and see if the Piwik\Plugins\CoreAdminHome\Tasks.purgeInvalidatedArchives reduces the size of the January table?

I applied that patch, and then Tasks.purgeInvalidatedArchives did shrink 2020_01.

But, I then ran purge-old-archive-data, and the table size in MiB dropped a further 30% (row count unchanged).

So the patch fixes why purgeInvalidatedArchives wasn't shrinking January, but it seems that purgeInvalidatedArchives is not fixing all of the residual bloating.

@diosmosis commented on May 19th 2020 Member

purge-old-archive-data will also optimize the tables causing unused space to be released (this is done in a separate scheduled task that is currently done monthly. If the row count didn't change when running the command then it means the patch worked as expected. Thanks for going through the trouble of testing!

@tsteur commented on May 20th 2020 Member

@diosmosis is this issue done with the merge of https://github.com/matomo-org/matomo/pull/15963

@diosmosis commented on May 20th 2020 Member

@tsteur should be, unless non-january tables are also still growing. That would be due to a different cause.

@tsteur commented on May 20th 2020 Member

👍 I'll close this issue for now. If anyone still has the issue after the next update please comment and we'll be happy to look into it again. Thanks everyone for the feedback and letting us know.

This Issue was closed on May 20th 2020
Powered by GitHub Issue Mirror