@robocoder opened this Pull Request on February 12th 2018 Contributor

https://www.percona.com/blog/2010/07/14/why-insert-on-duplicate-key-update-may-be-slow-by-incurring-disk-seeks/ explains:

Instead, what MySQL does is the following:

  • call handler::write_row to attempt an insertion, if it succeeds, we are done
  • if handler::write_row returns an error indicating a duplicate key, outside of the handler, apply the necessary update to the row
  • call handler::update_row to apply the update
    ...
    So, the moral of the story is this. In MySQL, “insert … on duplicate key update” is slower than “replace into”.

REPLACE INTO might be faster but the semantics are:

INSERT;
if failure then
    DELETE;
    INSERT;
fi

Since Option::set() is largely called to update the value for an existing setting, we should try:

UPDATE;

if affected_rows === 0 then
    INSERT;
fi

It's not an atomic operation but overall, this approach strikes a good balance between performance and ACID.

@robocoder commented on February 15th 2018 Contributor

A side-effect of this PR is that the SQL is database agnostic.

@diosmosis commented on March 25th 2018 Member

I don't think introducing potential race conditions is a good idea. Given how many different ways Options are used, it seems like this could potentially have far reaching and hard to debug effects.

Is there a reason REPLACE INTO won't work here? If it's atomic & has the same effect as the original code, then it should provide the same guarantees as the original. Or am I missing something?

I'm also not sure if Option setting is a bottleneck for Matomo. I wonder how much of an effect on performance this actually has? IMO if Option::set() really has that much of an effect on an individual request or command, then it's being called too many times.

@mattab @tsteur @sgiehl do you guys have any opinions?

@robocoder commented on March 25th 2018 Contributor

There are tradeoffs in every approach. INSERT...ON DUPLICATE is simple but MySQL specific and has some performance problems as seen in #12400 and the linked blog post.

REPLACE INTO may be faster and atomic, but it isn't without its cons either. A snippet from the blog post's comments:

REPLACE always deletes the row, then inserts it with the new values. If you do this often enough you can end up fragmenting the innodb table space. With a fragmented table space things just get slower on rotational disks. Plus, you run the risk of running out of tablespace much faster than you would normally.

@robocoder commented on March 25th 2018 Contributor

Like I said, there are always tradeoffs. For example, you're welcome to remove the try...catch and change the INSERT to INSERT IGNORE, but then you'll be MySQL-specific again.

@diosmosis commented on March 25th 2018 Member

It's not the exception silencing itself that worries me, but the possibility for race conditions in general. If two competing processes try to insert different values into the same row, one process will win out, the other will fail, and the value in the DB may not be what we expect. Options are used in a lot of different ways, some of which are in concurrent contexts (like an option that's used as a mutex), and it worries me that there could now be a potential race condition in something that's used so widely, where we're no longer sure of what data gets written to the row. And that, to me, means potential bugs that are hard to reproduce and hard to debug (especially w/o any sort of logs to dig through).

As for whether it's important for the SQL to be database agnostic, that's a priority question & I can't answer that.

@robocoder commented on March 25th 2018 Contributor

Matomo doesn't have cache coherence for option settings now. There's already a race condition when you have two concurrent requests that both call Option::set(). Each request has its own cached value and neither re-reads from the database after writing, so one of them is going to run to completion with an out-of-date value.

@diosmosis commented on March 25th 2018 Member

The current process will use an out of date value, but the next process execution will use the value we may not expect it to use.

@diosmosis commented on March 27th 2018 Member

Two ideas to make race conditions less of an issue here:

  • Change the semantics of Option::delete() and Option::deleteLike() (& any other delete methods if they exist) to set rows to NULL instead of actually DELETE-ing. This makes it so the race condition will only happen once, if it happens.
    I think this is do-able since Option::get() will return false if the option doesn't exist. So we can just check if the value is NULL and return false.

  • In the try-catch, when an exception is thrown, check if it is due to the row already existing. If not, re-throw. If yes, check if the existing value is different from the value we're saving & log a message including the option name & both values with log level INFO. If someone is doing log aggregation w/ Matomo, this error will at least not get lost.

This still leaves whoever is doing ops w/ something additional to consider, and the option table will be less human readable, but it will reduce the amount of times this error can happen.

I think an ideal fix would be creating a new type of component, like Options, which defines the options plugins use. Then we can ensure they always exist and ensure plugin option names don't clash. But that would be a large breaking change...

Note: this is just a proposal, not a review requirement. Waiting for other opinions.

@tsteur commented on March 27th 2018 Owner

Options are used in a lot of different ways, some of which are in concurrent contexts (like an option that's used as a mutex)

@diosmosis do you know where we do this? Might be worth at some point possibly to refactor this as nowadays options is often used just for like anything which it is maybe not supposed to be...

We could also if really needed have another method like setAtomic() which uses the old behaviour and it would be fine if this one is a bit slower in some cases.

Being MySQL-specific is totally fine as Matomo doesn't work with other SQLs anyway (AFAIK). So if there is a solution that provides us a fast execution that is also atomic I'd be 👍

Otherwise as @robocoder points out Option is already not really atomic anyway as the values are cached for the whole request etc (unless you call clearOptionCache or so).

@diosmosis commented on March 27th 2018 Member

@diosmosis do you know where we do this? Might be worth at some point possibly to refactor this as nowadays options is often used just for like anything which it is maybe not supposed to be...

@tsteur I think CronArchive might store a list or multiple lists in options during parallel execution of archiving? Sometimes PIDs are stored in options values to mark a command as being run by a specific process, etc. There might be some counters too. My knowledge is out of date so specifics escape me.

Otherwise as @robocoder points out Option is already not really atomic anyway as the values are cached for the whole request etc (unless you call clearOptionCache or so).

Actually thinking about it some more, I think the problem I was seeing was a phantom. I think I assumed the original SQL would throw on collision, but w/ ON DUPLICATE UPDATE, it of course wouldn't. So actually this PR is fine, and I will stop creating noise :)

This does highlight a problem in that there aren't really any atomic primitives, but I've had to take concurrency into consideration fairly often in the recent days. Not sure why GET_LOCK() wasn't used, though I remember using in some situations.

@robocoder commented on March 27th 2018 Contributor

There are no special privileges for MySQL's GET_LOCK, but since it's a named lock in a global namespace, it would require a naming strategy for shared hosting users, and as a result, be at risk of DoS attack. It probably also requires MySQL 5.7.5 or newer, since older versions don't share locks across master-slave servers.

@diosmosis commented on April 2nd 2018 Member

Seems like the consensus is to merge, so merging.

This Pull Request was closed on April 2nd 2018
Powered by GitHub Issue Mirror