Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete old log_* table data also without idvisit #17964

Merged
merged 7 commits into from Sep 13, 2021
Merged

Conversation

geekdenz
Copy link
Contributor

@geekdenz geekdenz commented Sep 4, 2021

fixes #16529

Description:

Reuse logic from DataSubject and refactor into core.

To test this manually, please see https://github.com/matomo-org/test-examples/blob/trackingformlogs/logdeleter/general/index.php . You can configure it with https://github.com/matomo-org/test-examples/blob/trackingformlogs/config/config.example.php .

Review

@geekdenz
Copy link
Contributor Author

geekdenz commented Sep 5, 2021

You'll need to apply this patch, otherwise it won't get into the code.

See https://github.com/matomo-org/test-examples/blob/trackingformlogs/logdeleter/general/index.php .

diff --git a/plugins/PrivacyManager/PrivacyManager.php b/plugins/PrivacyManager/PrivacyManager.php
index f11ff26b43..98055c9150 100644
--- a/plugins/PrivacyManager/PrivacyManager.php
+++ b/plugins/PrivacyManager/PrivacyManager.php
@@ -672,6 +672,7 @@ class PrivacyManager extends Plugin
         $deleteIntervalDays = $settings[$setting];
         $deleteIntervalSeconds = $this->getDeleteIntervalInSeconds($deleteIntervalDays);
 
+        return true;
         if ($lastDelete === false ||
             $lastDelete === '' ||
             ((int)$lastDelete + $deleteIntervalSeconds) <= time()

@geekdenz geekdenz marked this pull request as ready for review September 7, 2021 05:49
@geekdenz geekdenz added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Needs Review PRs that need a code review labels Sep 7, 2021
@sgiehl sgiehl added this to the 4.7.0 milestone Sep 7, 2021
@sgiehl
Copy link
Member

sgiehl commented Sep 7, 2021

@geekdenz Wouldn't it make sense to add/change some tests in LogDeleterTest to prove that the new logic works as expected. Ideally the new/changed tests would fail on 4.x-dev but work with the changes of this PR.

@tsteur
Copy link
Member

tsteur commented Sep 7, 2021

@sgiehl it's not really needed since the logic we use itself is already tested.

sgiehl
sgiehl previously approved these changes Sep 8, 2021
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested it locally, seems to work as expected 👍

Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geekdenz it seems like maybe some other methods can now be removed as well like Rawlogdao::deleteFromLogTable?

The only other thought I have was that before we were doing idvisit IN(1,2,3,4,4) and now we use idvisit=1 or idvisit = 2 or idvisit = 3. I believe this is though the same for the query optimiser. Might be something to quickly double check / look into.

Also before we were doing idvisit in (1,2,3,4) and now we use placeholders using ?. This is a regression and we should change the below query in DataSubjects to not use placeholders. When hundreds or thousands of placeholders are used then MySQL can get very very slow. We learned this over the years. We don't need to bind the parameters there if we cast the value to an integer.

@geekdenz
Copy link
Contributor Author

geekdenz commented Sep 8, 2021

idvisit IN(1,2,3,4,4)

will be more efficient, though just by a constant factor (strlen('idvisit = 1 OR ') - strlen('1,')), even just because the text is shorter. The query optimiser might optimise this automatically though, but it adds another step for it. It'll add up with lots of IDs. I'll do it as it is not difficult.

@geekdenz
Copy link
Contributor Author

geekdenz commented Sep 9, 2021

OK. Double checked manually and got expected results:

mysql> SELECT distinct b.idvisit, c.visit_last_action_time, a.idlogform, a.idformview, a.field_name FROM matomo_log_form_field as a join matomo_log_form as b on a.idlogform=b.idlogform JOIN matomo_log_visit AS c ON c.idvisit=b.idvisit;
+---------+------------------------+-----------+------------+------------+
| idvisit | visit_last_action_time | idlogform | idformview | field_name |
+---------+------------------------+-----------+------------+------------+
|   10444 | 2020-08-01 00:00:00    |        24 | zaODNJ     | input1     |
|   10444 | 2020-08-01 00:00:00    |        24 | zaODNJ     | input2     |
|   10444 | 2020-08-01 00:00:00    |        24 | zaODNJ     | input3     |
|   10449 | 2020-08-01 00:00:00    |        28 | 9GlRNX     | input1     |
|   10449 | 2020-08-01 00:00:00    |        28 | 9GlRNX     | input2     |
|   10449 | 2020-08-01 00:00:00    |        28 | 9GlRNX     | input3     |
|   10443 | 2021-09-07 03:05:54    |        23 | ZDa9lV     | input1     |
|   10443 | 2021-09-07 03:05:54    |        23 | ZDa9lV     | input2     |
|   10443 | 2021-09-07 03:05:54    |        23 | ZDa9lV     | input3     |
+---------+------------------------+-----------+------------+------------+
9 rows in set (0.01 sec)

mysql> SELECT distinct b.idvisit, c.visit_last_action_time, a.idlogform, a.idformview, a.field_name FROM matomo_log_form_field as a join matomo_log_form as b on a.idlogform=b.idlogform JOIN matomo_log_visit AS c ON c.idvisit=b.idvisit;
+---------+------------------------+-----------+------------+------------+
| idvisit | visit_last_action_time | idlogform | idformview | field_name |
+---------+------------------------+-----------+------------+------------+
|   10443 | 2021-09-07 03:05:54    |        23 | ZDa9lV     | input1     |
|   10443 | 2021-09-07 03:05:54    |        23 | ZDa9lV     | input2     |
|   10443 | 2021-09-07 03:05:54    |        23 | ZDa9lV     | input3     |
+---------+------------------------+-----------+------------+------------+
3 rows in set (0.01 sec)

@tsteur
Copy link
Member

tsteur commented Sep 13, 2021

Fyi tweaked the test cases to make sure it works fine when idvisit and idvisit+idsite is mixed and also to know it works when multiple deletions are requested. Was the easiest way for me to test the generated queries work.

Screenshot 2021-09-13 at 3 04 19 PM

Screenshot 2021-09-13 at 3 20 57 PM

@tsteur
Copy link
Member

tsteur commented Sep 13, 2021

Generally tested for quite a while and looks all good to me. Waiting for automated tests to finish before merging 🎉

@tsteur tsteur merged commit 2fd2c0c into 4.x-dev Sep 13, 2021
@tsteur tsteur deleted the m-16529-retry branch September 13, 2021 04:42
@geekdenz
Copy link
Contributor Author

Thanks @tsteur for the added test and info!

@justinvelluppillai justinvelluppillai modified the milestones: 4.7.0, 4.5.0 Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Old data in log tables without idvisit column are not purged (affects plugins)
4 participants