Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large blob size for 2020-01 #16211

Open
dave105010 opened this issue Jul 15, 2020 · 11 comments
Open

Large blob size for 2020-01 #16211

dave105010 opened this issue Jul 15, 2020 · 11 comments

Comments

@dave105010
Copy link

dave105010 commented Jul 15, 2020

Hi,

I'm using matomo 3.13.5 on production and 3.13.6 on development.

For production, our data jumped from 32mb (December 2019) to 4.4gb (January 2020). Then, it fell down to 33mb (February 2020).
(January 2019 was also 30-40mb)

That 4.4gb data from January broke our backup cycle eventually(took couple months) and our data partition ran out of space and our apps and the database crashed.

We could bring our database up again. However, I would like to fix this.

I copied our production data on my dev to test ./console core:purge-old-archive-data january and see if it works or not.

After I ran that command, it took 20-30 minutes to complete. However, piwik_archive_blob_2020_01 database table went from 4.4gb to 16mb.

I wonder if it that command deleted any important information. I read on a documentation that matomo stores annual statistics data on January of every year. I don't want to lose any important data if I run this command on production.

I've seen a lot of other issues opened for the large blob size issue.
I just wanted to confirm that whether it is safe to use ./console core:purge-old-archive-data january for Matomo?
What does it delete?

Thanks

@tsteur , @sgiehl

@sgiehl
Copy link
Member

sgiehl commented Jul 15, 2020

@Dave-Oz As long as you do not delete any of the log data all archives can be rebuilt at any time. Nevertheless purge-old-archive-data should only remove archives that aren't needed anymore.

@dave105010
Copy link
Author

dave105010 commented Jul 15, 2020

@Dave-Oz As long as you do not delete any of the log data all archives can be rebuilt at any time. Nevertheless purge-old-archive-data should only remove archives that aren't needed anymore.

@sgiehl We have Regularly delete old raw data from the database option on and it's set to 60 days in the settings.

Also, there is the "Schedule old data deletion" is set to every week in the settings.

Does that mean matomo was supposed to that for January? However, it didn't do it for some reason (a bug in an older version)?
So, if I do it manually, it should be fine?

Also, we have cron task that runs every couple minutes with the command /matomo/console core:archive

I'm just trying to understand what is the difference in that table before and after running purge-old-archive-data january?
What did it get rid of and the size went from 4.4gb to 16mb?

Which table stores the log data archives?
What is stored in piwik_archive_blob_2020_01? Do you mean that piwik_archive_blob_2020_01 can be created successfully anytime based on the data stored in some other table already in the database without a loss of information/metric data?

@tsteur
Copy link
Member

tsteur commented Jul 15, 2020

If you archive data every few minutes then over time there will be a lot of outdated reports in the DB which some get deleted daily, some weekly, some monthly. If the size is an issue then I recommend running the archive command for example only every hour.

The archive tables store the reports. You can find more info in https://developer.matomo.org/guides/how-piwik-works#data-model-processing-and-storage

Does that mean matomo was supposed to that for January? However, it didn't do it for some reason (a bug in an older version)?

Yes there was a bug in an older version which we fixed in an update and the update should have triggered the cleanup to run eventually (but might take a while until the task to clean up is executed the next time).

@dave105010
Copy link
Author

dave105010 commented Jul 15, 2020

@tsteur Our only issue is the size for January 2020. All the other months are below 30-40mb each. Except January 2020. That one is over 4gb.
After I ran the command below. This is the output.
It spent a lot of time purging the invalidated archives.

# ./console core:purge-old-archive-data january
Purging outdated archives for 2020_01...Done. [Time elapsed: 0.110s]
Purging invalidated archives for 2020_01...Done. [Time elapsed: 3254.697s]
Purging custom range archives for 2020_01...Done. [Time elapsed: 0.618s]
Optimizing archive tables...
Optimizing table piwik_archive_numeric_2020_01...Done. [Time elapsed: 12.164s]
Optimizing table piwik_archive_blob_2020_01...Done. [Time elapsed: 243.220s]

My only worry is that does this break annual reports since they are stored in January 2020?

@dave105010
Copy link
Author

@tsteur Also, which task to clean up is executed? Can I do that manually? Is it equivalent to ./console core:purge-old-archive-data ?

@tsteur
Copy link
Member

tsteur commented Jul 16, 2020

./console core:purge-old-archive-data january

should do 👍 it will only delete unneeded data. You could also try

./console core:purge-old-archive-data --include-year-archives today

Then it does the current month and the 2020_01 table.

@dave105010
Copy link
Author

@tsteur Hi,
I did that. It's been 3-4 days. Everything has been fine. However, 2020_01 started to grow again. It was 16mb when I first executed "./console core:purge-old-archive-data January" and now it reached to 200+ MB. I think annual reports are causing this. Is there anything else that I can use to make it stop growing into gigabytes again?

@tsteur
Copy link
Member

tsteur commented Jul 20, 2020

You could execute this task as a cron or run the archiver less often @Dave-Oz

Running it every few minutes is quite often.

@dave105010
Copy link
Author

dave105010 commented Aug 8, 2020

--

@tsteur
Copy link
Member

tsteur commented Aug 10, 2020

@unkn0wn-developer I see you removed your comment maybe it's all good?

@dave105010
Copy link
Author

dave105010 commented Aug 10, 2020

@unkn0wn-developer I see you removed your comment maybe it's all good?

It's all good for now. I'm monitoring the usage every week or so. I wasn't using the "ls" command in MB or GB mode. I thought I saw that the backup size was 5gb but it was 500mb. Counted the digits wrong in bytes... It was a Friday evening at work so my brain was pretty much fried at the end of the day.😅 Thanks for checking up on me though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants