Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to archive january archive of this / last year #14064

Closed
garas opened this issue Feb 1, 2019 · 13 comments
Closed

Add option to archive january archive of this / last year #14064

garas opened this issue Feb 1, 2019 · 13 comments
Assignees
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Milestone

Comments

@garas
Copy link

garas commented Feb 1, 2019

I have set:

  • Delete old aggregated report data after 12 months
  • Keep all data for: Yearly+Monthly reports
  • MySQL has innodb_file_per_table=ON.

After Matomo deleted old data, but left yearly/monthly reports, table size for piwik_archive_blob_2017_01 reported by phpMyAdmin reduced 150MB to 4MB. However actual .ibd size left 150MB.

After running OPTIMIZE piwik_archive_blob_2017_01, the .ibd file was rebuilt and now takes 4MB.

You can see data, index and free space by running

SELECT `TABLE_SCHEMA`, `TABLE_NAME`, `DATA_LENGTH`, `INDEX_LENGTH`, `DATA_FREE`
FROM `information_schema`.`TABLES`
WHERE `TABLE_SCHEMA` = 'piwikdb' AND `TABLE_NAME` LIKE '%archive_blob%'

Could Matomo run OPTIMIZE on archive tables after deleting much data?

In my case it reduced Matomo total DB space 4GB to 1GB.

@tsteur
Copy link
Member

tsteur commented Feb 1, 2019

There is a command ./console database:optimize-archive-tables to do that.

We also run Optimize automatically sometimes but usually not when it is InnoDB unless it is MariaDB

@tsteur tsteur closed this as completed Feb 1, 2019
@tsteur tsteur added the answered For when a question was asked and we referred to forum or answered it. label Feb 1, 2019
@tsteur
Copy link
Member

tsteur commented Feb 1, 2019

I recommend you set up a cronjob maybe to do it monthly for example.

@garas
Copy link
Author

garas commented Feb 1, 2019

I've tried that command. Options 'all', 'now', hidden lastN are not useful, I don't need re-optimize all or recent tables each month.

Last option is specify concrete months. But it is also hard generate month in shell script. But finally I found solution: php console database:optimize-archive-tables 'now - 14 months'. 13th month still has daily reports, 14th is the one I need, better keep not optimized data 1 month longer than forever.

However I still think Matomo could be run in this predictable (deleting daily, but keeping monthly or yearly reports) time when optimization is very useful. Maybe checkbox at Keep all data for.

@tsteur
Copy link
Member

tsteur commented Feb 3, 2019

I'm not sure if optimize table will work on all versions of MySQL 4.1+ (or 5.1+) with InnoDb (I know it's basically just rebuilding the table, but not sure what happens in older versions).

We could enable it for InnoDb for archive tables only. Log tables be too risky re downtime etc. Would for sure not enable it for log tables. In general, personally, I would disable enable_sql_optimize_queries = 0 in config and only run the command regularly. Might be good to recommend this some day or make it default (you don't want to lock log tables)... @mattab

Re the command: I think in general lastMonth, and first month of the year (because it stores yearly archives there) be useful. When running this regularly, it should be a lot more efficient than optimizing all archives tables all over again.

So it would be definitely good to add an option for first month of the year (last month should be doable with last1).

@tsteur tsteur reopened this Feb 3, 2019
@tsteur tsteur added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Feb 3, 2019
@tsteur tsteur added this to the 3.10.0 milestone Feb 3, 2019
@tsteur tsteur changed the title Reclaim InnoDB disk space after deleting old data Add option to archive january archive of this and last year Feb 3, 2019
@tsteur tsteur changed the title Add option to archive january archive of this and last year Add option to archive january archive of this / last year Feb 3, 2019
@tsteur
Copy link
Member

tsteur commented Feb 3, 2019

Put it into 3.10 as it would make it our optimize tables much more efficient by not optimizing all tables again and again.

@tsteur tsteur removed the answered For when a question was asked and we referred to forum or answered it. label Feb 3, 2019
@tsteur tsteur modified the milestones: 3.10.0, Backlog (Help wanted) Feb 3, 2019
@tsteur tsteur added the Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement. label Feb 3, 2019
@tsteur tsteur modified the milestones: Backlog (Help wanted), 3.10.0 Feb 3, 2019
@garas
Copy link
Author

garas commented Feb 3, 2019

Yes, it should be enabled for archive tables only because of predictable high data savings and because of locking of archive tables would only disrupt Matomo dashboard, not tracking.

For predictable, I mean it could shrink tables ~35 times! (30 day + 4 weeks + 1 month ->1 month aggregated info. It is assuming 1 day takes about same amount of space as 1 week or 1 month). For January, it would be ~18x savings, as it will contain 1 year + 1 month.

After removing daily data, these tables usually won't be written anymore, so it is good time prepare for long term storage by running one last final optimization and this would remove need for optimizing all tables again and again without predicted outcome.

For log tables, locking is problem, also savings are unpredictable because sites traffic can increase/decrease, also space will be reused after deleting data. Optimization here is manual decision.

@katebutler katebutler self-assigned this Mar 21, 2019
@katebutler
Copy link

If the dates option doesn't match any of the predefined values ("all", "now" or the lastN options) then it is passed to the DateFactory - this means that "january" will already work.

There's no guide for the optimize tables command on matomo.org at the moment, so perhaps we should write some so that we can document this properly.

@mattab
Copy link
Member

mattab commented Jul 15, 2019

For now created a short new section in
https://matomo.org/docs/managing-your-databases-size/#purging-and-optimizing-the-database-using-a-console-command

edited After reading #10439 also updated the doc to list another command as well, that may free DB space:
Screenshot from 2019-07-15 13-18-39

What do you think is the next step for this issue @katebutler ?

@garas
Copy link
Author

garas commented Jul 15, 2019

php console core:purge-old-archive-data already optimizes tables after purging.

But it just optimizes all dates independently if it was purged or not any data.
I should filter $dates to optimize only if ($archivePurger->purgeArchivesWithPeriodRange($date) > 0)).

} else {
foreach ($dates as $date) {
$message = sprintf("Purging custom range archives for %s...", $date->toString('Y_m'));
$this->performTimedPurging($output, $message, function () use ($date, $archivePurger) {
$archivePurger->purgeArchivesWithPeriodRange($date);
});
}
}
$skipOptimizeTables = $input->getOption('skip-optimize-tables');
if ($skipOptimizeTables) {
$output->writeln("Skipping OPTIMIZE TABLES.");
} else {
$this->optimizeArchiveTables($output, $dates, $input->getOption('force-optimize-tables'));
}

@katebutler
Copy link

@mattab Thanks, I'll flesh out that doc a bit more and explain the options that are available, think that's all we need to do.

@mattab mattab modified the milestones: 3.11.0, 3.12.0 Jul 23, 2019
@mattab mattab modified the milestones: 3.12.0, 3.13.0 Oct 22, 2019
@mattab
Copy link
Member

mattab commented Oct 22, 2019

Do you maybe have an update on this one @katebutler?

@katebutler
Copy link

Suggested addition to the https://matomo.org/docs/managing-your-databases-size/#purging-and-optimizing-the-database-using-a-console-command section - I don't have WP access so @tsteur can you please add it?

To reduce execution time, you can select specific date periods to run these scripts for. This means that you can target just the recent months, for which report data is likely to have changed. Note that data for annual reports is stored in the archives for January of the appropriate year, so it is a good idea to optimize these regularly as well, e.g.

./console database:optimize-archive-tables last2
./console database:optimize-archive-tables january

@tsteur
Copy link
Member

tsteur commented Nov 7, 2019

Cheers, done.

@tsteur tsteur closed this as completed Nov 7, 2019
@tsteur tsteur added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

No branches or pull requests

4 participants