@garas opened this Issue on February 1st 2019

I have set:

  • Delete old aggregated report data after 12 months
  • Keep all data for: Yearly+Monthly reports
  • MySQL has innodb_file_per_table=ON.

After Matomo deleted old data, but left yearly/monthly reports, table size for piwik_archive_blob_2017_01 reported by phpMyAdmin reduced 150MB to 4MB. However actual .ibd size left 150MB.

After running OPTIMIZE piwik_archive_blob_2017_01, the .ibd file was rebuilt and now takes 4MB.

You can see data, index and free space by running

SELECT `TABLE_SCHEMA`, `TABLE_NAME`, `DATA_LENGTH`, `INDEX_LENGTH`, `DATA_FREE`
FROM `information_schema`.`TABLES`
WHERE `TABLE_SCHEMA` = 'piwikdb' AND `TABLE_NAME` LIKE '%archive_blob%'

Could Matomo run OPTIMIZE on archive tables after deleting much data?

In my case it reduced Matomo total DB space 4GB to 1GB.

@tsteur commented on February 1st 2019 Member

There is a command ./console database:optimize-archive-tables to do that.

We also run Optimize automatically sometimes but usually not when it is InnoDB unless it is MariaDB

@tsteur commented on February 1st 2019 Member

I recommend you set up a cronjob maybe to do it monthly for example.

@garas commented on February 1st 2019

I've tried that command. Options 'all', 'now', hidden lastN are not useful, I don't need re-optimize all or recent tables each month.

Last option is specify concrete months. But it is also hard generate month in shell script. But finally I found solution: php console database:optimize-archive-tables 'now - 14 months'. 13th month still has daily reports, 14th is the one I need, better keep not optimized data 1 month longer than forever.

However I still think Matomo could be run in this predictable (deleting daily, but keeping monthly or yearly reports) time when optimization is very useful. Maybe checkbox at Keep all data for.

@tsteur commented on February 3rd 2019 Member

I'm not sure if optimize table will work on all versions of MySQL 4.1+ (or 5.1+) with InnoDb (I know it's basically just rebuilding the table, but not sure what happens in older versions).

We could enable it for InnoDb for archive tables only. Log tables be too risky re downtime etc. Would for sure not enable it for log tables. In general, personally, I would disable enable_sql_optimize_queries = 0 in config and only run the command regularly. Might be good to recommend this some day or make it default (you don't want to lock log tables)... @mattab

Re the command: I think in general lastMonth, and first month of the year (because it stores yearly archives there) be useful. When running this regularly, it should be a lot more efficient than optimizing all archives tables all over again.

So it would be definitely good to add an option for first month of the year (last month should be doable with last1).

@tsteur commented on February 3rd 2019 Member

Put it into 3.10 as it would make it our optimize tables much more efficient by not optimizing all tables again and again.

@garas commented on February 3rd 2019

Yes, it should be enabled for archive tables only because of predictable high data savings and because of locking of archive tables would only disrupt Matomo dashboard, not tracking.

For predictable, I mean it could shrink tables ~35 times! (30 day + 4 weeks + 1 month ->1 month aggregated info. It is assuming 1 day takes about same amount of space as 1 week or 1 month). For January, it would be ~18x savings, as it will contain 1 year + 1 month.

After removing daily data, these tables usually won't be written anymore, so it is good time prepare for long term storage by running one last final optimization and this would remove need for optimizing all tables again and again without predicted outcome.

For log tables, locking is problem, also savings are unpredictable because sites traffic can increase/decrease, also space will be reused after deleting data. Optimization here is manual decision.

Powered by GitHub Issue Mirror