Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of delete query and make sure it doesn't lock existing tracking requests #13894

Merged
merged 2 commits into from Dec 22, 2018

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Dec 21, 2018

fix #13800

I'm thinking this should improve the performance of the delete queries:

  • Before it would run all the delete queries mentioned in deleteLogDataForDeletedSites locks database for a long period, bogus query? #13800 every week even if no site was deleted
  • Before it would potentially do a full table scan but now the where idsite in(...) should only lock rows belonging to that site. New tracking requests for those rows / idsite shouldn't come in cause the site no longer exists (would be discarded by tracker) so it shouldn't be a problem.

Initially I also implemented a check for each table that had idsite column to check if tracking requests for that site exists and only delete if the log table has tracking requests for that site, but it shouldn't be needed.

@tsteur tsteur added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Needs Review PRs that need a code review labels Dec 21, 2018
@tsteur tsteur added this to the 3.8.0 milestone Dec 21, 2018
$idSitesLogConversion = $this->getDistinctIdSitesInTable('log_conversion', $maxIdSite);
$idSitesUsed = array_unique(array_merge($idSitesLogVisit, $idSitesLogVisitAction, $idSitesLogConversion));

$idSitesNoLongerExisting = array_diff($idSitesUsed, $allExistingIdSites);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible edge case that I haven't thought through too much, but what if a site hasn't tracked any visits yet, and this runs. It will be treated as a deleted site, correct? If tracking is installed while this process is under way, will it be a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't cause it would only return sites here that were tracked into and not exist in sites?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sites has ids, [1,2,3,4,5] and sites that are tracked has ids [1,2,3], then the array_diff for $idSitesNoLongerExisting would be [4,5], correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm seeing it wrong. idSitesUsed contains the list of all tracked siteIds... then the second part is the number of sites that actually exist (it should be the other way around from your example?) and then it would correctly return the deleted sites that were tracked into in the diff?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try and explain again (I might be seeing it wrong too). There are 5 sites total:

[1,2,3,4,5]

No site is deleted. Only three of the sites have tracked data, [1,2,3]. So when looking through log tables for idsites, we only get [1,2,3].

In this situation, we will try to delete log data for [4,5], which would be ok since there isn't any log data, except possibly if tracking begins during the deletion process. Again, I'm not sure if this is a problem ultimately, just mentioning it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis it would return [] not [4,5] see eg https://3v4l.org/Q4uci

array_diff returns all entries from the first array, that are not in the second array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, no issue then. 👍

@diosmosis diosmosis merged commit e03a642 into 3.x-dev Dec 22, 2018
@diosmosis diosmosis deleted the 13800 branch December 22, 2018 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deleteLogDataForDeletedSites locks database for a long period, bogus query?
2 participants