I have set:
After Matomo deleted old data, but left yearly/monthly reports, table size for
piwik_archive_blob_2017_01 reported by phpMyAdmin reduced 150MB to 4MB. However actual
.ibd size left 150MB.
OPTIMIZE piwik_archive_blob_2017_01, the
.ibd file was rebuilt and now takes 4MB.
You can see data, index and free space by running
SELECT `TABLE_SCHEMA`, `TABLE_NAME`, `DATA_LENGTH`, `INDEX_LENGTH`, `DATA_FREE` FROM `information_schema`.`TABLES` WHERE `TABLE_SCHEMA` = 'piwikdb' AND `TABLE_NAME` LIKE '%archive_blob%'
Could Matomo run
OPTIMIZE on archive tables after deleting much data?
In my case it reduced Matomo total DB space 4GB to 1GB.
There is a command
./console database:optimize-archive-tables to do that.
We also run Optimize automatically sometimes but usually not when it is InnoDB unless it is MariaDB
I recommend you set up a cronjob maybe to do it monthly for example.
I've tried that command. Options 'all', 'now', hidden
lastN are not useful, I don't need re-optimize all or recent tables each month.
Last option is specify concrete months. But it is also hard generate month in shell script. But finally I found solution:
php console database:optimize-archive-tables 'now - 14 months'. 13th month still has daily reports, 14th is the one I need, better keep not optimized data 1 month longer than forever.
However I still think Matomo could be run in this predictable (deleting daily, but keeping monthly or yearly reports) time when optimization is very useful. Maybe checkbox at Keep all data for.
I'm not sure if
optimize table will work on all versions of MySQL 4.1+ (or 5.1+) with InnoDb (I know it's basically just rebuilding the table, but not sure what happens in older versions).
We could enable it for InnoDb for archive tables only. Log tables be too risky re downtime etc. Would for sure not enable it for log tables. In general, personally, I would disable
enable_sql_optimize_queries = 0 in config and only run the command regularly. Might be good to recommend this some day or make it default (you don't want to lock log tables)... @mattab
Re the command: I think in general
first month of the year (because it stores yearly archives there) be useful. When running this regularly, it should be a lot more efficient than optimizing all archives tables all over again.
So it would be definitely good to add an option for
first month of the year (last month should be doable with
Put it into 3.10 as it would make it our optimize tables much more efficient by not optimizing all tables again and again.
Yes, it should be enabled for archive tables only because of predictable high data savings and because of locking of archive tables would only disrupt Matomo dashboard, not tracking.
For predictable, I mean it could shrink tables ~35 times! (30 day + 4 weeks + 1 month ->1 month aggregated info. It is assuming 1 day takes about same amount of space as 1 week or 1 month). For January, it would be ~18x savings, as it will contain 1 year + 1 month.
After removing daily data, these tables usually won't be written anymore, so it is good time prepare for long term storage by running one last final optimization and this would remove need for optimizing all tables again and again without predicted outcome.
For log tables, locking is problem, also savings are unpredictable because sites traffic can increase/decrease, also space will be reused after deleting data. Optimization here is manual decision.
dates option doesn't match any of the predefined values ("all", "now" or the lastN options) then it is passed to the DateFactory - this means that "january" will already work.
There's no guide for the optimize tables command on matomo.org at the moment, so perhaps we should write some so that we can document this properly.
For now created a short new section in
edited After reading https://github.com/matomo-org/matomo/issues/10439 also updated the doc to list another command as well, that may free DB space:
What do you think is the next step for this issue @katebutler ?
php console core:purge-old-archive-data already optimizes tables after purging.
But it just optimizes all dates independently if it was purged or not any data.
I should filter
$dates to optimize only
if ($archivePurger->purgeArchivesWithPeriodRange($date) > 0)).