Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleteLogDataForDeletedSites locks database for a long period, bogus query? #13800

Closed
EreMaijala opened this issue Dec 4, 2018 · 3 comments · Fixed by #13894
Closed

deleteLogDataForDeletedSites locks database for a long period, bogus query? #13800

EreMaijala opened this issue Dec 4, 2018 · 3 comments · Fixed by #13894
Assignees
Labels
Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@EreMaijala
Copy link
Contributor

The changes made in #13227 for deleteLogDataForDeletedSites seem to have caused a major performance issue. They result in a query that locks a large Piwik database for a long time, potentially several minutes. As far as I can see the queries will also never accomplish anything, since the where clause in the queries contains "site.idsite IS NULL" (created here:

$where = "site.idsite IS NULL AND $tableToSelect.idsite <= ?";
), and idsite in the site table or log_visit table cannot be null. The resulting queries look like this:

DELETE log_conversion FROM piwik_log_conversion log_conversion LEFT JOIN piwik_log_visit log_visit ON log_conversion.idvisit = log_visit.idvisit  LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'
DELETE log_conversion_item FROM piwik_log_conversion_item log_conversion_item LEFT JOIN piwik_log_visit log_visit ON log_conversion_item.idvisit = log_visit.idvisit  LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'
DELETE log_link_visit_action FROM piwik_log_link_visit_action log_link_visit_action LEFT JOIN piwik_log_visit log_visit ON log_link_visit_action.idvisit = log_visit.idvisit  LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'
DELETE log_visit FROM piwik_log_visit log_visit LEFT JOIN piwik_site site ON site.idsite = log_visit.idsite WHERE site.idsite IS NULL AND log_visit.idsite <= '3953'

It seems that whatever these queries should be doing is not happening. Instead they're causing tracking to be blocked for a significant time. The query plan for the above queries shows how bad it is since it goes through the whole log_visit table:

+------+-------------+-----------+--------+---------------------------------------------------------------------------+---------+---------+------------------------+----------+-------------------------+
| id   | select_type | table     | type   | possible_keys                                                             | key     | key_len | ref                    | rows     | Extra                   |
+------+-------------+-----------+--------+---------------------------------------------------------------------------+---------+---------+------------------------+----------+-------------------------+
|    1 | SIMPLE      | log_visit | ALL    | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | NULL    | NULL    | NULL                   | 48810128 | Using where             |
|    1 | SIMPLE      | site      | eq_ref | PRIMARY                                                                   | PRIMARY | 4       | piwik.log_visit.idsite |        1 | Using where; Not exists |
+------+-------------+-----------+--------+---------------------------------------------------------------------------+---------+---------+------------------------+----------+-------------------------+
@tsteur tsteur added the Regression Indicates a feature used to work in a certain way but it no longer does even though it should. label Dec 4, 2018
@tsteur tsteur added this to the 3.8.0 milestone Dec 4, 2018
@tsteur
Copy link
Member

tsteur commented Dec 4, 2018

This looks like a bug / regression indeed. Moving it therefore into 3.8.

The left join with site.idsite is null can't really work as you mention. Would need to be a right join.

Hopefully this will speed up the query and cause less trouble.

@EreMaijala EreMaijala changed the title deleteLogDataForDeletedSites locks database for a long, bogus query? deleteLogDataForDeletedSites locks database for a long period, bogus query? Dec 7, 2018
@tsteur
Copy link
Member

tsteur commented Dec 21, 2018

I just tested and the query should work. I can see though how this maybe could affect performance and lock the table especially since it locks at all the data. I reckon more efficient might be to get a list of all existing sites, a list of all idsites used in log_visit, and remove the ones from log_visit with where idsite= ? where they no longer exist. I think we also have a method that doesn't delete all entries at once but does it in batches... however, I reckon if we delete by idSite, there wouldn't be a big LOCK cause it would only lock the rows that match that idSite and likely you would not even track data into it anymore for that site(cause it no longer exists and request would fail).

@tsteur
Copy link
Member

tsteur commented Dec 21, 2018

Ideally we would also only run it, when sites were actually deleted. Otherwise there would be lots of full table scans I suppose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
2 participants