@garas opened this Issue on February 1st 2019

I have set:

  • Delete old aggregated report data after 12 months
  • Keep all data for: Yearly+Monthly reports
  • MySQL has innodb_file_per_table=ON.

After Matomo deleted old data, but left yearly/monthly reports, table size for piwik_archive_blob_2017_01 reported by phpMyAdmin reduced 150MB to 4MB. However actual .ibd size left 150MB.

After running OPTIMIZE piwik_archive_blob_2017_01, the .ibd file was rebuilt and now takes 4MB.

You can see data, index and free space by running

SELECT `TABLE_SCHEMA`, `TABLE_NAME`, `DATA_LENGTH`, `INDEX_LENGTH`, `DATA_FREE`
FROM `information_schema`.`TABLES`
WHERE `TABLE_SCHEMA` = 'piwikdb' AND `TABLE_NAME` LIKE '%archive_blob%'

Could Matomo run OPTIMIZE on archive tables after deleting much data?

In my case it reduced Matomo total DB space 4GB to 1GB.

@tsteur commented on February 1st 2019 Member

There is a command ./console database:optimize-archive-tables to do that.

We also run Optimize automatically sometimes but usually not when it is InnoDB unless it is MariaDB

@tsteur commented on February 1st 2019 Member

I recommend you set up a cronjob maybe to do it monthly for example.

@garas commented on February 1st 2019

I've tried that command. Options 'all', 'now', hidden lastN are not useful, I don't need re-optimize all or recent tables each month.

Last option is specify concrete months. But it is also hard generate month in shell script. But finally I found solution: php console database:optimize-archive-tables 'now - 14 months'. 13th month still has daily reports, 14th is the one I need, better keep not optimized data 1 month longer than forever.

However I still think Matomo could be run in this predictable (deleting daily, but keeping monthly or yearly reports) time when optimization is very useful. Maybe checkbox at Keep all data for.

@tsteur commented on February 3rd 2019 Member

I'm not sure if optimize table will work on all versions of MySQL 4.1+ (or 5.1+) with InnoDb (I know it's basically just rebuilding the table, but not sure what happens in older versions).

We could enable it for InnoDb for archive tables only. Log tables be too risky re downtime etc. Would for sure not enable it for log tables. In general, personally, I would disable enable_sql_optimize_queries = 0 in config and only run the command regularly. Might be good to recommend this some day or make it default (you don't want to lock log tables)... @mattab

Re the command: I think in general lastMonth, and first month of the year (because it stores yearly archives there) be useful. When running this regularly, it should be a lot more efficient than optimizing all archives tables all over again.

So it would be definitely good to add an option for first month of the year (last month should be doable with last1).

@tsteur commented on February 3rd 2019 Member

Put it into 3.10 as it would make it our optimize tables much more efficient by not optimizing all tables again and again.

@garas commented on February 3rd 2019

Yes, it should be enabled for archive tables only because of predictable high data savings and because of locking of archive tables would only disrupt Matomo dashboard, not tracking.

For predictable, I mean it could shrink tables ~35 times! (30 day + 4 weeks + 1 month ->1 month aggregated info. It is assuming 1 day takes about same amount of space as 1 week or 1 month). For January, it would be ~18x savings, as it will contain 1 year + 1 month.

After removing daily data, these tables usually won't be written anymore, so it is good time prepare for long term storage by running one last final optimization and this would remove need for optimizing all tables again and again without predicted outcome.

For log tables, locking is problem, also savings are unpredictable because sites traffic can increase/decrease, also space will be reused after deleting data. Optimization here is manual decision.

@katebutler commented on May 7th 2019 Member

If the dates option doesn't match any of the predefined values ("all", "now" or the lastN options) then it is passed to the DateFactory - this means that "january" will already work.

There's no guide for the optimize tables command on matomo.org at the moment, so perhaps we should write some so that we can document this properly.

@mattab commented on July 15th 2019 Member

For now created a short new section in
https://matomo.org/docs/managing-your-databases-size/#purging-and-optimizing-the-database-using-a-console-command

edited After reading https://github.com/matomo-org/matomo/issues/10439 also updated the doc to list another command as well, that may free DB space:
Screenshot from 2019-07-15 13-18-39

What do you think is the next step for this issue @katebutler ?

@garas commented on July 15th 2019

php console core:purge-old-archive-data already optimizes tables after purging.

But it just optimizes all dates independently if it was purged or not any data.
I should filter $dates to optimize only if ($archivePurger->purgeArchivesWithPeriodRange($date) > 0)).

https://github.com/matomo-org/matomo/blob/52f423008a761195b8316871d92131687b91f246/plugins/CoreAdminHome/Commands/PurgeOldArchiveData.php#L110-L124

@katebutler commented on July 15th 2019 Member

@mattab Thanks, I'll flesh out that doc a bit more and explain the options that are available, think that's all we need to do.

@mattab commented on October 22nd 2019 Member

Do you maybe have an update on this one @katebutler?

@katebutler commented on November 6th 2019 Member

Suggested addition to the https://matomo.org/docs/managing-your-databases-size/#purging-and-optimizing-the-database-using-a-console-command section - I don't have WP access so @tsteur can you please add it?

To reduce execution time, you can select specific date periods to run these scripts for. This means that you can target just the recent months, for which report data is likely to have changed. Note that data for annual reports is stored in the archives for January of the appropriate year, so it is a good idea to optimize these regularly as well, e.g.

./console database:optimize-archive-tables last2
./console database:optimize-archive-tables january

@tsteur commented on November 7th 2019 Member

Cheers, done.

This Issue was closed on November 7th 2019
Powered by GitHub Issue Mirror