Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running out of memory while deleting old logs #14141

Closed
minusf opened this issue Feb 26, 2019 · 3 comments
Closed

running out of memory while deleting old logs #14141

minusf opened this issue Feb 26, 2019 · 3 comments
Labels
worksforme The issue cannot be reproduced and things work as intended.

Comments

@minusf
Copy link

minusf commented Feb 26, 2019

While migrating to a new DB server, the decision has been made to keep only old reports
and delete "raw logs" older than a certain period. This meant deleting around 800M records
of the total of 900M (around 120G).

At this size just enabling the "Regularly delete old raw data" option in the admin stalled all new
inserts and locked up the database for a considerable time. The ajax calls trying to determine
the number of rows and database size basically kill the instance until the queries can finish.

Not knowing how this option (which is called by different names accross the FAQ
and the documentation, some showing very old screenshots as well) is actually carried out,
another decision has been made to run the console tool.

While the console tool uses a default limit of 1000 for the long running delete job, having inserts
while this is running makes the whole process never ending loop of locked tables...
So instead of waiting for weeks to have this done in the background, stopped all new inserts
and let the delete job run without interference.

Unfortunately the command left to run overnight kept dying after processing around 2M
rows.

$ php -d max_execution_time=0 ./console core:delete-logs-data \
--dates=2014-01-01,2018-01-01 --optimize-tables
Preparing to delete all visits belonging to ALL websites between
2014-01-01 00:00:00 and 2018-01-01 00:00:00.
You are about to delete log data. This action cannot be undone,
are you sure you want to continue? (Y/N) y
........................................................................................

...

........................................................................................
.........PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted
(tried to allocate 12288 bytes) in
/var/www/html/piwik/libs/Zend/Db/Statement/Pdo.php on line 58

I know that i could simply increase the php memory limit with php -d memory_limit=...
but there seems to be a memory leak. Looping over row id's falling between a certain
date range should not eat memory like this.

@tsteur
Copy link
Member

tsteur commented Feb 26, 2019

Is it any better when not doing --optimize-tables (that would take a long time likely)?

Apart from this code looks somewhat straight forward, not sure re the memory leak where it could be. 500MB isn't all that much though and I would try it at least with 1GB or 2GB which is like "nothing". If it exceeds 2GB we would possibly investigate at some point.

A try could be otherwise to specifically unset some of the variables and force garbage collection to happen from time to time.

@minusf
Copy link
Author

minusf commented Feb 27, 2019

Optimization took nothing in the greater picture of removing 800 million rows
punctuated by aborted processes... This is the output from the last run (where memory
was already bumped to 1G, and that limit was hit in the previous run as well).

Successfully deleted 1547876 visits. Time elapsed: 1198.064s

Optimizing table log_visit... done. Time elapsed: 90.111s
Optimizing table log_link_visit_action... done. Time elapsed: 1602.128s
Optimizing table log_conversion... done. Time elapsed: 1.796s
Optimizing table log_conversion_item... done. Time elapsed: 1.080s
Optimizing table log_action... done. Time elapsed: 24.421s

Table optimization finished.

I think we will disagree that it is ok for a PHP console command to need gigabytes of RAM
to loop over some SELECTed rows and DELETE them...

@mattab
Copy link
Member

mattab commented Jan 21, 2020

we haven't heard this issue recently, and this code should be used in the scheduled task which deletes the old log (when it's enabled in the setting), so I reckon this now works fine? closing, but feel free to re-open if it's still an issue..

@mattab mattab closed this as completed Jan 21, 2020
@mattab mattab added the worksforme The issue cannot be reproduced and things work as intended. label Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
worksforme The issue cannot be reproduced and things work as intended.
Projects
None yet
Development

No branches or pull requests

3 participants