@minusf opened this Issue on February 26th 2019

While migrating to a new DB server, the decision has been made to keep only old reports
and delete "raw logs" older than a certain period. This meant deleting around 800M records
of the total of 900M (around 120G).

At this size just enabling the "Regularly delete old raw data" option in the admin stalled all new
inserts and locked up the database for a considerable time. The ajax calls trying to determine
the number of rows and database size basically kill the instance until the queries can finish.

Not knowing how this option (which is called by different names accross the FAQ
and the documentation, some showing very old screenshots as well) is actually carried out,
another decision has been made to run the console tool.

While the console tool uses a default limit of 1000 for the long running delete job, having inserts
while this is running makes the whole process never ending loop of locked tables...
So instead of waiting for weeks to have this done in the background, stopped all new inserts
and let the delete job run without interference.

Unfortunately the command left to run overnight kept dying after processing around 2M
rows.

$ php -d max_execution_time=0 ./console core:delete-logs-data \
--dates=2014-01-01,2018-01-01 --optimize-tables
Preparing to delete all visits belonging to ALL websites between
2014-01-01 00:00:00 and 2018-01-01 00:00:00.
You are about to delete log data. This action cannot be undone,
are you sure you want to continue? (Y/N) y
........................................................................................

...

........................................................................................
.........PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted
(tried to allocate 12288 bytes) in
/var/www/html/piwik/libs/Zend/Db/Statement/Pdo.php on line 58

I know that i could simply increase the php memory limit with php -d memory_limit=...
but there seems to be a memory leak. Looping over row id's falling between a certain
date range should not eat memory like this.

@tsteur commented on February 26th 2019 Member

Is it any better when not doing --optimize-tables (that would take a long time likely)?

Apart from this code looks somewhat straight forward, not sure re the memory leak where it could be. 500MB isn't all that much though and I would try it at least with 1GB or 2GB which is like "nothing". If it exceeds 2GB we would possibly investigate at some point.

A try could be otherwise to specifically unset some of the variables and force garbage collection to happen from time to time.

@minusf commented on February 27th 2019

Optimization took nothing in the greater picture of removing 800 million rows
punctuated by aborted processes... This is the output from the last run (where memory
was already bumped to 1G, and that limit was hit in the previous run as well).

Successfully deleted 1547876 visits. Time elapsed: 1198.064s

Optimizing table log_visit... done. Time elapsed: 90.111s
Optimizing table log_link_visit_action... done. Time elapsed: 1602.128s
Optimizing table log_conversion... done. Time elapsed: 1.796s
Optimizing table log_conversion_item... done. Time elapsed: 1.080s
Optimizing table log_action... done. Time elapsed: 24.421s

Table optimization finished.

I think we will disagree that it is ok for a PHP console command to need gigabytes of RAM
to loop over some SELECTed rows and DELETE them...

@mattab commented on January 21st 2020 Member

we haven't heard this issue recently, and this code should be used in the scheduled task which deletes the old log (when it's enabled in the setting), so I reckon this now works fine? closing, but feel free to re-open if it's still an issue..

This Issue was closed on January 21st 2020
Powered by GitHub Issue Mirror