Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of a shorter lock wait time in tracker mode (new INI setting innodb_lock_wait_timeout) #16092

Merged
merged 3 commits into from Jun 21, 2020

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Jun 19, 2020

MySQL wait locks can cause some performance issues on high traffic/concurrency loads. For example the following scenario:

  • There's a bulk request #1 updating various entries in a transaction
  • Another tracking request #2 eg for same visitor comes in
  • There may be a lock wait time for #2 because it's waiting for the transaction in #1 to finish.
  • By default, the tracking request may wait to get the lock for 30 to 50 seconds (MySQL default).
  • More and more requests might be coming in for the same visitor and suddenly you have say a few hundred or thousands requests waiting for 30 to 50 seconds and/or failing which could make an auto scaling system think the servers or instances aren't healthy and considers them all as unhealthy

To address this issue, the thought is to configure a shorter lock wait time so these locks aren't held for too long.

When an error 1205 Lock wait timeout exceeded; try restarting transaction happens, then MySQL rolls back the current statement. Because there might be multiple inserts or updates within one tracking request we can't really safely replay the entire tracking request (unless innodb_rollback_on_timeout=1 was enabled). This means for now the behaviour be as follows:

  • For a single tracking request we don't do anything and simply fail
  • For a request within the bulk tracking request we skip the request that had a lock wait time compared to previous behaviour where it would have failed all tracking requests and rolled back everything.

We need to see later if we maybe need to respond with a success tracking request when this happens but that's to be seen (to avoid having too many error requests making instances unhealthy). If any possible we don't want to do this.

A note on deadlocks

Thought I also quickly go into deadlocks in case someone thinks that's related which it isn't quite related.

This change does not affect deadlocks which may happen as part of bulk requests from the JS tracker. Deadlocks usually don't happen for log analytics or queued tracking because they don't send bulk requests in parallel for the same visitor but process them one after another.

Depending on the MySQL distribution (MySQL, MariaDB, RDS, Aurora, ...) or version these might be detected immediately without waiting for a few seconds. For some MySQL instances this behaviour depends on the MySQL innodb_deadlock_detect variable. FYI: Having this variable enabled can slow down performance on a high load MySQL.

When a deadlock happens, MySQL rolls back the entire transactions. This means the current behaviour is that the tracking request fails and nothing in the DB is changed. It's like the bulk request didn't happen.

We could eventually catch deadlock errors and try to reply them once. This would be probably a separate PR.

You can also workaround by configuring [Tracker]bulk_requests_use_transaction=0 in Matomo but then you need to expect more DB CPU usage and slower average response times cause more indexes are written etc.

How can a lock wait timeout happen?

  • session 1: start transaction
  • session 1: update site set name = '...' where idsite = 1
  • session 2: update site set name = '...' where idsite = 1

Because the session 1 is not committed yet, the session 2 needs to wait for the session 1 to finish before it can have the row lock to update the site.

How does a deadlock happen?

Basically deadlocks happen when there are two DB sessions and they both wait for a lock that the other one holds. Basically they both want the same locks but they get them in the opposite order like this:

  • session 1: start transaction
  • session 2: start transaction
  • session 1: update site set name = '...' where idsite = 1
  • session 2: update site set name = '...' where idsite = 2
  • session 1: update site set name = '...' where idsite = 2
  • session 2: update site set name = '...' where idsite = 1

Basically session 2 waits for the lock to update site 1 which session 1 holds, but session 1 waits at the same time for a lock to update site 2 which session 2 has.

This is my understanding of how things work. I might be wrong :)

@tsteur tsteur added the Needs Review PRs that need a code review label Jun 19, 2020
@tsteur tsteur added this to the 4.0.0 milestone Jun 19, 2020
@@ -60,6 +60,13 @@ public function process(Tracker $tracker, RequestSet $requestSet)
$invalidRequests[] = $index;
} catch (InvalidRequestParameterException $ex) {
$invalidRequests[] = $index;
} catch (Exception $e) {
if (stripos($e->getMessage(), 'Lock wait timeout exceeded')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit hard to create a test for this one... not sure it's needed?

@diosmosis diosmosis merged commit 77f607d into 4.x-dev Jun 21, 2020
@diosmosis diosmosis deleted the lockwaittime branch June 21, 2020 12:42
diosmosis pushed a commit that referenced this pull request Jun 25, 2020
* Allow configuration of a shorter lock wait time in tracker mode

* do not catch errors for single tracking request

* Update global.ini.php
@mattab mattab changed the title Allow configuration of a shorter lock wait time in tracker mode Allow configuration of a shorter lock wait time in tracker mode (new INI setting innodb_lock_wait_timeout) Sep 29, 2020
@mattab mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants