Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive Blob Tables bloating in size #15831

Closed
James-Oakley opened this issue Apr 19, 2020 · 6 comments
Closed

Archive Blob Tables bloating in size #15831

James-Oakley opened this issue Apr 19, 2020 · 6 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. duplicate For issues that already existed in our issue tracker and were reported previously. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.

Comments

@James-Oakley
Copy link

This issue is similar to #10439, but that is reported as fixed as of 3.13.0-b1, and I'm running the latest 3.13.4 which postdates that. So I'm opening this here, rather than re-opening that, assuming it's not exactly the same. If someone wants to close this as a duplicate I'll happily re-open that issue instead.

I began getting alerts that my total database size was growing too big, and I was getting mysqldump backup failures. The table that keeps failing to dump is the 2020-01 archive blob table.

Following the comments in the issue I've just linked, I tried running ./console core:purge-old-archive-data all. The total database size shrank by about 1/3, but that 2020-01 blob table remains as large as ever.

So I looked at a backup I had managed to take of the database from 3 days ago.

  • 3 days ago, the table has 53,691 rows and was 162.1 MiB in size.
  • As of now, even having run purge-old-archive-data, the table has grown to 116,055 and 346.2 MiB.

So the table has doubled in size in 3 days. I don't know how the archiving works, but I'd expect the archive of January 2020 to be stable now. Fresh visits should mean the April 2020 table continues to grow, but surely the data within the January table should not be changing.

Weirdly, March 2020 is also big (59 MiB, but hasn't got any larger compared to 3 days ago - presumably there was more traffic that month); April is bigger (already 201.4 MiB, grown from 56.1 MiB 3 days ago). However: February is just 10 MiB, and isn't growing. So only certain months seem to be affected. January and April, but not February or March.

So this is part bug report and part support request.

  • Bug report: The database is growing, and fast, and the inability to take backups is a performance issue.
  • Support request: How do I get the bloated tables back down to the size that they need to be. Probably, even the 162.1 MiB is way larger than necessary (a typical month's archive blob is under 10 MiB), and I know for a fact that 346 MiB is too big because it was half that size 3 days ago. So what command do I run to remove the extraneous data in the table, without losing any ability to analyse visits to the websites being tracked.
@sgiehl
Copy link
Member

sgiehl commented Apr 19, 2020

Hi @James-Oakley. Thanks for creating the issue. We are already aware of that problem and it should be fixed with #15800, which will be included in a new release coming up the next days

@sgiehl sgiehl closed this as completed Apr 19, 2020
@James-Oakley
Copy link
Author

Thanks, @sgiehl. Will #15800 shrink old archive tables back to their proper size as well as ensuring they don't grow any further?

@sgiehl
Copy link
Member

sgiehl commented Apr 19, 2020

Yes it should. If it doesn't work with the next release, please create a new issue for it.

@mattab mattab added Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should. labels Apr 28, 2020
@fvdm
Copy link

fvdm commented Apr 28, 2020

You can force the clean up with console database:optimize-archive-tables 2020-04

For me the updater didn't shrink the archives, at least not right after the update. Given the table size of >4 GB was messing up a lot of things, including the core:archive cronjob that stacked up, I didn't want to wait for the scheduler.

@James-Oakley
Copy link
Author

I ran ./console core:purge-old-archive-data all right after the upgrade, and the tables shrank right back to where they should be (1.5 GB -> 350 MB)

@fvdm
Copy link

fvdm commented Apr 28, 2020

@James-Oakley oops I forgot a word in my sentence: 'it' referred to the updater. The optimizer worked fine.

@mattab mattab added this to the Priority Backlog (Help wanted) milestone Sep 28, 2020
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Performance For when we could improve the performance / speed of Matomo. duplicate For issues that already existed in our issue tracker and were reported previously. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
Development

No branches or pull requests

4 participants