New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option::set() INSERT ... ON DUPLICATE KEY can be slow #12550
Conversation
93e0d53
to
dd60fbd
Compare
A side-effect of this PR is that the SQL is database agnostic. |
$bind = array($name, $value, $autoLoad); | ||
|
||
Db::query($sql, $bind); | ||
} catch (\Exception $e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for catching exceptions here? We didn't used to do that before. Wondering if that might have any side effect always updating the all
array, even if the insert failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all
?
The proposed PR is no longer an atomic operation. So, the catch is for a potential race condition with concurrent requests to INSERT the same option name; i.e., only the first will succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some lines below the 'static' array is updated $this->all[$name] = $value;
. This will now be done, even if the second query fails to update.
Db::query($sql, $bind); | ||
} catch (\Exception $e) { | ||
} | ||
} | ||
|
||
$this->all[$name] = $value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was always the assumption that the query (INSERT...ON DUPLICATE) was successful. So, status quo behaviour for concurrent updates.
…EY UPDATE can be slow
I don't think introducing potential race conditions is a good idea. Given how many different ways Options are used, it seems like this could potentially have far reaching and hard to debug effects. Is there a reason I'm also not sure if Option setting is a bottleneck for Matomo. I wonder how much of an effect on performance this actually has? IMO if Option::set() really has that much of an effect on an individual request or command, then it's being called too many times. |
There are tradeoffs in every approach.
|
Like I said, there are always tradeoffs. For example, you're welcome to remove the |
It's not the exception silencing itself that worries me, but the possibility for race conditions in general. If two competing processes try to insert different values into the same row, one process will win out, the other will fail, and the value in the DB may not be what we expect. Options are used in a lot of different ways, some of which are in concurrent contexts (like an option that's used as a mutex), and it worries me that there could now be a potential race condition in something that's used so widely, where we're no longer sure of what data gets written to the row. And that, to me, means potential bugs that are hard to reproduce and hard to debug (especially w/o any sort of logs to dig through). As for whether it's important for the SQL to be database agnostic, that's a priority question & I can't answer that. |
Matomo doesn't have cache coherence for option settings now. There's already a race condition when you have two concurrent requests that both call |
The current process will use an out of date value, but the next process execution will use the value we may not expect it to use. |
Two ideas to make race conditions less of an issue here:
This still leaves whoever is doing ops w/ something additional to consider, and the option table will be less human readable, but it will reduce the amount of times this error can happen. I think an ideal fix would be creating a new type of component, like Note: this is just a proposal, not a review requirement. Waiting for other opinions. |
@diosmosis do you know where we do this? Might be worth at some point possibly to refactor this as nowadays options is often used just for like anything which it is maybe not supposed to be... We could also if really needed have another method like Being MySQL-specific is totally fine as Matomo doesn't work with other SQLs anyway (AFAIK). So if there is a solution that provides us a fast execution that is also atomic I'd be 👍 Otherwise as @robocoder points out |
@tsteur I think CronArchive might store a list or multiple lists in options during parallel execution of archiving? Sometimes PIDs are stored in options values to mark a command as being run by a specific process, etc. There might be some counters too. My knowledge is out of date so specifics escape me.
Actually thinking about it some more, I think the problem I was seeing was a phantom. I think I assumed the original SQL would throw on collision, but w/ ON DUPLICATE UPDATE, it of course wouldn't. So actually this PR is fine, and I will stop creating noise :) This does highlight a problem in that there aren't really any atomic primitives, but I've had to take concurrency into consideration fairly often in the recent days. Not sure why |
There are no special privileges for MySQL's GET_LOCK, but since it's a named lock in a global namespace, it would require a naming strategy for shared hosting users, and as a result, be at risk of DoS attack. It probably also requires MySQL 5.7.5 or newer, since older versions don't share locks across master-slave servers. |
Seems like the consensus is to merge, so merging. |
…EY UPDATE can be slow (matomo-org#12550)
https://www.percona.com/blog/2010/07/14/why-insert-on-duplicate-key-update-may-be-slow-by-incurring-disk-seeks/ explains:
REPLACE INTO
might be faster but the semantics are:Since
Option::set()
is largely called to update the value for an existing setting, we should try:It's not an atomic operation but overall, this approach strikes a good balance between performance and ACID.