@unkn0wn-developer opened this Issue on July 15th 2020

Hi,

I'm using matomo 3.13.5 on production and 3.13.6 on development.

For production, our data jumped from 32mb (December 2019) to 4.4gb (January 2020). Then, it fell down to 33mb (February 2020).
(January 2019 was also 30-40mb)

That 4.4gb data from January broke our backup cycle eventually(took couple months) and our data partition ran out of space and our apps and the database crashed.

We could bring our database up again. However, I would like to fix this.

I copied our production data on my dev to test ./console core:purge-old-archive-data january and see if it works or not.

After I ran that command, it took 20-30 minutes to complete. However, piwik_archive_blob_2020_01 database table went from 4.4gb to 16mb.

I wonder if it that command deleted any important information. I read on a documentation that matomo stores annual statistics data on January of every year. I don't want to lose any important data if I run this command on production.

I've seen a lot of other issues opened for the large blob size issue.
I just wanted to confirm that whether it is safe to use ./console core:purge-old-archive-data january for Matomo?
What does it delete?

Thanks

@tsteur , @sgiehl

@sgiehl commented on July 15th 2020 Member

@dave-oz As long as you do not delete any of the log data all archives can be rebuilt at any time. Nevertheless purge-old-archive-data should only remove archives that aren't needed anymore.

@unkn0wn-developer commented on July 15th 2020

@dave-oz As long as you do not delete any of the log data all archives can be rebuilt at any time. Nevertheless purge-old-archive-data should only remove archives that aren't needed anymore.

@sgiehl We have Regularly delete old raw data from the database option on and it's set to 60 days in the settings.

Also, there is the "Schedule old data deletion" is set to every week in the settings.

Does that mean matomo was supposed to that for January? However, it didn't do it for some reason (a bug in an older version)?
So, if I do it manually, it should be fine?

Also, we have cron task that runs every couple minutes with the command /matomo/console core:archive

I'm just trying to understand what is the difference in that table before and after running purge-old-archive-data january?
What did it get rid of and the size went from 4.4gb to 16mb?

Which table stores the log data archives?
What is stored in piwik_archive_blob_2020_01? Do you mean that piwik_archive_blob_2020_01 can be created successfully anytime based on the data stored in some other table already in the database without a loss of information/metric data?

@tsteur commented on July 15th 2020 Member

If you archive data every few minutes then over time there will be a lot of outdated reports in the DB which some get deleted daily, some weekly, some monthly. If the size is an issue then I recommend running the archive command for example only every hour.

The archive tables store the reports. You can find more info in https://developer.matomo.org/guides/how-piwik-works#data-model-processing-and-storage

Does that mean matomo was supposed to that for January? However, it didn't do it for some reason (a bug in an older version)?

Yes there was a bug in an older version which we fixed in an update and the update should have triggered the cleanup to run eventually (but might take a while until the task to clean up is executed the next time).

@unkn0wn-developer commented on July 15th 2020

@tsteur Our only issue is the size for January 2020. All the other months are below 30-40mb each. Except January 2020. That one is over 4gb.
After I ran the command below. This is the output.
It spent a lot of time purging the invalidated archives.

# ./console core:purge-old-archive-data january
Purging outdated archives for 2020_01...Done. [Time elapsed: 0.110s]
Purging invalidated archives for 2020_01...Done. [Time elapsed: 3254.697s]
Purging custom range archives for 2020_01...Done. [Time elapsed: 0.618s]
Optimizing archive tables...
Optimizing table piwik_archive_numeric_2020_01...Done. [Time elapsed: 12.164s]
Optimizing table piwik_archive_blob_2020_01...Done. [Time elapsed: 243.220s]

My only worry is that does this break annual reports since they are stored in January 2020?

@unkn0wn-developer commented on July 15th 2020

@tsteur Also, which task to clean up is executed? Can I do that manually? Is it equivalent to ./console core:purge-old-archive-data ?

@tsteur commented on July 16th 2020 Member

./console core:purge-old-archive-data january

should do 👍 it will only delete unneeded data. You could also try

./console core:purge-old-archive-data --include-year-archives today

Then it does the current month and the 2020_01 table.

@unkn0wn-developer commented on July 20th 2020

@tsteur Hi,
I did that. It's been 3-4 days. Everything has been fine. However, 2020_01 started to grow again. It was 16mb when I first executed "./console core:purge-old-archive-data January" and now it reached to 200+ MB. I think annual reports are causing this. Is there anything else that I can use to make it stop growing into gigabytes again?

@tsteur commented on July 20th 2020 Member

You could execute this task as a cron or run the archiver less often @dave-oz

Running it every few minutes is quite often.

@unkn0wn-developer commented on August 8th 2020

--

Powered by GitHub Issue Mirror