@EreMaijala opened this Issue on December 4th 2018

The changes made in #13227 for deleteLogDataForDeletedSites seem to have caused a major performance issue. They result in a query that locks a large Piwik database for a long time, potentially several minutes. As far as I can see the queries will also never accomplish anything, since the where clause in the queries contains "site.idsite IS NULL" (created here: https://github.com/matomo-org/matomo/blob/70505b1ebb5584a8a859bcb9a54e0cb50d66677e/plugins/PrivacyManager/Model/DataSubjects.php#L408), and idsite in the site table or log_visit table cannot be null. The resulting queries look like this:

DELETE log_conversion FROM piwik_log_conversion log_conversion LEFT JOIN piwik_log_visit log_visit ON log_conversion.idvisit = log_visit.idvisit  LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'
DELETE log_conversion_item FROM piwik_log_conversion_item log_conversion_item LEFT JOIN piwik_log_visit log_visit ON log_conversion_item.idvisit = log_visit.idvisit  LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'
DELETE log_link_visit_action FROM piwik_log_link_visit_action log_link_visit_action LEFT JOIN piwik_log_visit log_visit ON log_link_visit_action.idvisit = log_visit.idvisit  LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'
DELETE log_visit FROM piwik_log_visit log_visit LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'

It seems that whatever these queries should be doing is not happening. Instead they're causing tracking to be blocked for a significant time. The query plan for the above queries shows how bad it is since it goes through the whole log_visit table:

+------+-------------+-----------+--------+---------------------------------------------------------------------------+---------+---------+------------------------+----------+-------------------------+
| id   | select_type | table     | type   | possible_keys                                                             | key     | key_len | ref                    | rows     | Extra                   |
+------+-------------+-----------+--------+---------------------------------------------------------------------------+---------+---------+------------------------+----------+-------------------------+
|    1 | SIMPLE      | log_visit | ALL    | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | NULL    | NULL    | NULL                   | 48810128 | Using where             |
|    1 | SIMPLE      | site      | eq_ref | PRIMARY                                                                   | PRIMARY | 4       | piwik.log_visit.idsite |        1 | Using where; Not exists |
+------+-------------+-----------+--------+---------------------------------------------------------------------------+---------+---------+------------------------+----------+-------------------------+
@tsteur commented on December 4th 2018 Member

This looks like a bug / regression indeed. Moving it therefore into 3.8.

The left join with site.idsite is null can't really work as you mention. Would need to be a right join.

Hopefully this will speed up the query and cause less trouble.

Powered by GitHub Issue Mirror