Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command core:delete-logs-data will not delete unused rows from log_action table #11189

Open
mattab opened this issue Jan 13, 2017 · 7 comments
Open
Labels
Bug For errors / faults / flaws / inconsistencies etc.

Comments

@mattab
Copy link
Member

mattab commented Jan 13, 2017

When trying to manage the database size one can:

  1. configure the user interface and configure Piwik to "Delete old visitor logs from the database". When enabled, a scheduled task will be executed at the end of core:archive command which will purge the old data as per the settings selected in the UI.

  2. or one can manually run a console command core:delete-logs-data to delete old visit data

One may expect that both 1) and 2) do the same thing. However 2) will not purge the un-used records from the log_action table. In one case reported to us, this table log_action was 170G big and we expected core:delete-logs-data to purge many Gb from this table.

The workaround for now is to use technique 1) and configure the "delete old logs data" using the Administration > Privacy > Delete old visitor logs. Use TasksTimetable plugin to see when the task will be executed.

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Feb 20, 2017
@mattab mattab added this to the Backlog (Help wanted) milestone Feb 20, 2017
@minusf
Copy link

minusf commented Feb 26, 2019

After trying both of these methods, piwik_log_visit does not contain the older records,
however piwik_log_link_visit_action still contains many millions of rows with older years in
server_time:

mysql> select count(*) from piwik_log_link_visit_action;
+-----------+
| count(*)  |
+-----------+
| 109677109 |
+-----------+
1 row in set (0.01 sec)

mysql> select count(*) from piwik_log_link_visit_action where year(server_time) < 2018;
+----------+
| count(*) |
+----------+
| 99228134 |
+----------+
1 row in set (22.45 sec)

Shouldn't these be removed as part of the cleanup?

@tsteur
Copy link
Member

tsteur commented Feb 26, 2019

Looking at the code It will delete them if there's still an entry for log_visit. Eg if you ever manually deleted entries from log_visit it wouldn't have been able to clean them up.

@minusf
Copy link

minusf commented Feb 26, 2019

What does it mean manually?
Raw SQL queries?
Running console core:delete-logs-data --dates=2014-01-01,2018-01-01?

And if Matomo will not/cannot remove them, can I remove them with raw queries?

@tsteur
Copy link
Member

tsteur commented Feb 26, 2019

Yes meant raw queries and yes you could remove them with raw queries.

@minusf
Copy link

minusf commented Feb 26, 2019

Even matomo's built-in facilities were not used to clean out old records and
I was the person on the team to enable "Regularly delete old raw data" in the first place,
so I heavily doubt raw queries were ever used to trim the database. Everything
was collected for years and the DB size ballooned to 126G...

I started with cleaning out the records through console yesterday and while
piwik_log_visit was cleared out, and piwik_log_link_visit_action shrunk
from 66G down to 7.7G, its bulk content still seems to be older records than
my retention period of 90 days.

What are my options? Should I delete them using raw queries?

@tsteur
Copy link
Member

tsteur commented Feb 26, 2019

Does it not delete them when you execute it again? It's otherwise hard to say what is happening without DB access and what entries are remaining. Are there eg entries < 2014? Just a random guess.

@minusf
Copy link

minusf commented Feb 26, 2019

Subsequent runs do nothing.

Preparing to delete all visits belonging to ALL websites between 2014-01-01 00:00:00 and 2018-01-01 00:00:00.
You are about to delete log data. This action cannot be undone, are you sure you want to continue? (Y/N) y

Successfully deleted 0 visits. Time elapsed: 10.697s

Also, I ran console to remove everything before 2018-01-01.
When that has finished after 2 days (see #14141), I have set the log retention period
in the admin to 90 days. The archive cronjob runs every hour, but so far it has not deleted
the rest of 2018 (up to decemberish).

I am confused about the feature and so far it seems unreliable.

INFO [2019-02-26 21:07:09] 16414  ---------------------------                                                                               
INFO [2019-02-26 21:07:09] 16414  SCHEDULED TASKS                                                                                           
INFO [2019-02-26 21:07:09] 16414  Starting Scheduled tasks...                                                                               
INFO [2019-02-26 21:07:09] 16414  Scheduler: executing task Piwik\Plugins\CustomPiwikJs\Tasks.updateTracker...                              
INFO [2019-02-26 21:07:09] 16414  Scheduler: finished. Time elapsed: 0.001s                                                                 
INFO [2019-02-26 21:07:09] 16414  Scheduler: executing task Piwik\Plugins\PrivacyManager\Tasks.deleteLogData...                             
INFO [2019-02-26 21:07:09] 16414  Scheduler: finished. Time elapsed: 0.003s                                                                 
INFO [2019-02-26 21:07:09] 16414  Scheduler: executing task Piwik\Plugins\PrivacyManager\Tasks.anonymizePastData...                         
INFO [2019-02-26 21:07:09] 16414  Scheduler: finished. Time elapsed: 0.000s                                                                 
INFO [2019-02-26 21:07:09] 16414  done                                                                                                      
INFO [2019-02-26 21:07:09] 16414  ---------------------------     

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

3 participants