Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocateNewArchiveId: Cannot get named lock allocateNewArchiveId #6417

Closed
mattab opened this issue Oct 11, 2014 · 24 comments
Closed

allocateNewArchiveId: Cannot get named lock allocateNewArchiveId #6417

mattab opened this issue Oct 11, 2014 · 24 comments
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Oct 11, 2014

There have been several reports of this error: allocateNewArchiveId: Cannot get named lock allocateNewArchive

2014-10-10 12:47:17 ERROR [2014-10-10 12:47:17] [6c725] Got invalid response from API request: index.php?module=API&method=API.get&idSite=1681&period=day&date=last2&format=php&trigger=archivephp. Response was 'a:2:{s:6:"result";s:5:"error";s:7:"message";s:95:"allocateNewArchiveId: Cannot get named lock allocateNewArchiveId.piwik_archive_numeric_2014_10.";}'

The goal of this issue is to investigate and make this bug does not occur again.

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Oct 11, 2014
@mattab mattab added this to the Piwik 2.9.0 milestone Oct 11, 2014
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Oct 11, 2014
@tsteur
Copy link
Member

tsteur commented Oct 15, 2014

I had a look but not sure how to fix it unless we completely get rid of it (and using for instance other IDs for idarchive instead of our auto increment) as it somehow always can happen. But maybe I'm wrong.

Does anyone know if it is possible to do GET_LOCK(0.1). Haven't found anything about this. Docs say "seconds" but not sure if it would work with something like this. Update: I did a little test and it seems to be converted to 0 seconds. Not sure if there is another way to wait for a lock for like 100ms. Couldn't find anything. I was expecting the second parameter is only a timeout and the lock will be released immediately but according to a check I did this is not the case.

I had one get_lock() and released it after 30 seconds. Had another get_lock(,100) and although it was released after 30 seconds I got the lock after 100 with my PHP 5.3 / MySQL 5.5 version.

@diosmosis
Copy link
Member

Maybe we can use an atomic count to keep track of the max number of archive IDs instead of using a lock? Ie, instead of getting a lock and doing a SELECT + INSERT, we could store the max id archive in an archive row, and do update/select like this: http://stackoverflow.com/questions/15034136/select-updated-rows-in-mysql#answer-15034351 . This way, we effectively reserve an ID and can safely insert afterwards. I think it might work.

@tsteur
Copy link
Member

tsteur commented Oct 16, 2014

Not sure how you would set the id here or rather how you would get the maxidarchive?

Wouldn't this be the same as this?

 $insertSql = "INSERT INTO $numericTable "
    . " SELECT @var := IFNULL( MAX(idarchive), 0 ) + 1,
                        '" . $locked . "',
                        " . (int)$idSite . ",
                        '" . $date . "',
                        '" . $date . "',
                        0,
                        '" . $date . "',
                        0 "
    . " FROM $numericTable as tb1";
Db::get()->exec($insertSql);

$selectIdSql = "SELECT @var";
$id = Db::get()->fetchOne($selectIdSql, $locked);

Problem is if 2 or more requests are selecting max(idarchive) at the same time we would again need a lock as with innodb not the whole table is locked.

@diosmosis
Copy link
Member

My thought does not involve getting MAX(idarchive). The insertion happens in two steps:

  1. reserve new idarchive
Db::query("UPDATE archive_numeric SET value = value + 1 WHERE name = 'specialIdArchiveCount' AND @reservedId := value");

$reservedId = Db::fetchOne("SELECT @reservedId");
  1. use new idarchive in insert
Db::query("INSERT INTO archive_numeric (idarchive, ...) VALUES (?, ...)", array($reservedId));

Complications arise when the special archive doesn't exist, but I have not thought that far. An option could be used for each archive table as well.

@tsteur
Copy link
Member

tsteur commented Oct 16, 2014

Using another table for incrementing id would work. We did this in some other projects to be able to shard if needed. That's why we didn't use auto_increment feature in MySQL. What we did there is basically following:

Table consisting of column name and value.

UPDATE table SET value = LAST_INSERT_ID(value + :count) WHERE name = 'archive_numeric_2014_01';

followed by

SELECT LAST_INSERT_ID()

works as last insert id is managed per connection. Count would allow you to optionally reserve multiple ids in one step if needed.

@mattab any opinions?

@diosmosis
Copy link
Member

Can you do it with one table or do you need one per archive numeric table?

@diosmosis
Copy link
Member

Ah ok, saw the name part of the WHERE.

@diosmosis
Copy link
Member

If a new table is used I'd like to request name refer not to the table per-say, but the name of the atomic count, so it can be used in other ways. This would require being able to reset a count, not sure if that can be done w/ LAST_INSERT_ID(...) being used?

@tsteur
Copy link
Member

tsteur commented Oct 16, 2014

Not sure what you meant with your first sentence?

Should work with last_insert_id. Not sure if there is any difference in using this one or @var. Only posted it as we used it this way and worked perfectly even under load.

@diosmosis
Copy link
Member

I mean instead of just using it for the IDs of tables, use it as a table for mysql based semaphores (so name could be, for example, 'CronArchive.someDistributedCount'.

@tsteur
Copy link
Member

tsteur commented Oct 16, 2014

yeah sure this can be done out of the box I think. Basically this is what we would use it for with idarchive. Do you have a specific use case in mind? Maybe better for finding a decision whether we want to create such a table or leave the lock.

@diosmosis
Copy link
Member

I've already implemented something similar (uses Option table and value = value + 1) for #5363 locally, so using the table for this would just mean less duplication. The class I've created is called Semaphore and is located in core/Concurrency (in case someone decides to implement the table in this ticket).

@mattab mattab added the Critical Indicates the severity of an issue is very critical and the issue has a very high priority. label Oct 23, 2014
@tsteur
Copy link
Member

tsteur commented Oct 30, 2014

@mattab what's the thought here? Is there actually a need to put a critical and major label here? I started ignoring those labels as they are set just so often and to me sometimes "randomly"

@mattab mattab removed the Critical Indicates the severity of an issue is very critical and the issue has a very high priority. label Oct 30, 2014
@mattab
Copy link
Member Author

mattab commented Oct 30, 2014

@tsteur Meant only to put one not two labels.

It's a Major bug as we regularly have it on Cloud and Enterprise. If you have an idea how it could be solved please give it a try, it would be fantastic to get rid of this issue!

@tsteur
Copy link
Member

tsteur commented Oct 31, 2014

as discussed we could get rid of at least one of those two by having a "counter" in MySQL. Can work on this.

Not sure whether we can get rid of the second one (haven't really thought about this one yet). Maybe it would already occur less or it might be already fixed by removing one of them (probably not).

@mattab
Copy link
Member Author

mattab commented Oct 31, 2014

as discussed we could get rid of at least one of those two by having a "counter" in MySQL. Can work on this.

+1 it's a good start to get rid of one of the two locks.

Not sure whether we can get rid of the second one (haven't really thought about this one yet).

Maybe you can think about it once you start removing the first lock. Would of course be nice to get rid of both locks if possible 👍

@tsteur
Copy link
Member

tsteur commented Nov 2, 2014

I am currently implementing the "sequence" table and came across one problem that I already expected.

Imagine we have a new table to count the sequences. When updating Piwik we'll have to "migrate" the data to this table. Meaning we have to create several initial counter values such as archive_numeric_2014_10 => 4822 (name => value/currentMaxId).

Usually we would show the SQL for migration during the update process but the ID might have already changed until someone actually executes the update (either manually, console, UI). To workaround this one could say we simply add +10000 to the next archive id so even if some more ids are created meanwhile it would not fail.

Problem is the code will be already updated and wants to use the new sequence table whereas the SQL update is not executed yet. So all archive creation during this time won't work. Meaning we can also not migrate those sequence ids in the background as the table won't exist yet.

So probably we'll have to detect whether the sequence table is installed and if so migrate in the background (should be very fast as we only have to get current max archive id and insert this value into an empty table) otherwise if table is not installed yet go with old logic. Alternatively we could say something like "when creating the next archive table use sequence table otherwise still use lock" although not sure how to implement it. Need to think a bit more about it. I'll check out what is easier to implement.

@tsteur
Copy link
Member

tsteur commented Nov 2, 2014

Using the sequence table starting from the next archive table and the lock for the current archive table should work but might lead to issues if someone updates around the beginning of a month. Also we could never get rid of the code that uses lock and it could lead to inconsistent ids when lock is used due to any db failure although sequence should be used. ==> This is not an option. (I know this all sounds a bit confusing but simply ignore it if you do not understand)

We could detect in the background whether the table exist and if not, we can create it. We could probably also detect whether a sequence was initialized (meaning whether we have migrated the max archive id to the sequence table) and if not migrate it on the fly. I don't think this would add any overhead but it is possible that two processes migrate the max archive id at the same time which could maybe lead to errors again but don't think so. Will probably go with this solution.

Otherwise we could just create the sequence table and migrate the max archive id during the update but any archive creation would not work between the time of updating the files and finishing the SQL update. Is this a problem? @mattab

@mattab
Copy link
Member Author

mattab commented Nov 2, 2014

Otherwise we could just create the sequence table and migrate the max archive id during the update but any archive creation would not work between the time of updating the files and finishing the SQL update. Is this a problem?

AFAIK Piwik will return an error message in the API output when the DB schema is pending some upgrades. So it should be OK to do this! (the archiving will not have triggered anyway)

@tsteur
Copy link
Member

tsteur commented Nov 3, 2014

See pull request.

Removed the lock to create the new archive id. The second lock is still there but this problem should not really occur anymore. The second lock locks no longer the whole table but only the "table + archiveId" which should be fine. We could maybe just remove the second lock completely ( https://github.com/piwik/piwik/pull/6583/files#diff-f0dc4d4d133f84005fcebc1e10f155c0R159 ) if InnoDb is used as there are also indexes on both columns (should not lock the whole table) but not sure whether we can remove it as I still do not 100% understand what is happening in the archiver.

Would be nice to run some tests with the sequence table to see if any problems occur under load (should not but one never knows). Otherwise the system tests say everything still works so I reckon the change should be fine.

tsteur added a commit that referenced this issue Nov 4, 2014
refs #6417 this should fix some tests. Start initial value by 0 so first generated id will be 1

fix omnifixture creation was broken

refs #6417 wondering if omnifixture creation worked

refs #6417 acquire a lock per archive id which should be better

refs #6417 we have to generate the SQL in the update itself otherwise the update would fail at some point in the future if the tables signature changes

refs #6417 split updates into a separate update file otherwise it would not be executed for devs using Piwik from git

refs #6417 added some documentation

refs #6417 Piwik 2.9.0-b2 was released meaning we have to move it to b3

refs #6417 Piwik 2.9.0-b3 was released meaning we have to move it to b4
@mattab
Copy link
Member Author

mattab commented Nov 4, 2014

The second lock locks no longer the whole table but only the "table + archiveId" which should be fine.

Oh interesting, was this a bug before that we locked the whole table?

but not sure whether we can remove it as I still do not 100% understand what is happening in the archiver.

Can you check the following use case still work:

  • When two widgets request the same date+period+idsite at the same time
  • Expected: only one archive is processed (and other widgets "wait" for the archive to finish to grab it)
  • Not expected: both widgets process each an archive (processing twice the same data)

tsteur added a commit that referenced this issue Nov 4, 2014
@tsteur
Copy link
Member

tsteur commented Nov 4, 2014

It creates the same archive multiple times but same happens with master branch. I'm not sure if you verified this in #4918 or if you only had a look at the time. I see the same behavior when adding the 20seconds to archiver (dashboard loads after about 23 sec) as you described in that issue but when verifying the archives one can see it does create the same archive multiple times

@mattab
Copy link
Member Author

mattab commented Nov 5, 2014

Likely I didn't check properly but I intended to check (and thought it worked but possible i bugged). If it's not working in master already then it's fine. Maybe we should create a ticket for this enhancement.

tsteur added a commit that referenced this issue Nov 6, 2014
refs #6417 this should fix some tests. Start initial value by 0 so first generated id will be 1

fix omnifixture creation was broken

refs #6417 wondering if omnifixture creation worked

refs #6417 acquire a lock per archive id which should be better

refs #6417 we have to generate the SQL in the update itself otherwise the update would fail at some point in the future if the tables signature changes

refs #6417 split updates into a separate update file otherwise it would not be executed for devs using Piwik from git

refs #6417 added some documentation

refs #6417 Piwik 2.9.0-b2 was released meaning we have to move it to b3

refs #6417 Piwik 2.9.0-b3 was released meaning we have to move it to b4
tsteur added a commit that referenced this issue Nov 6, 2014
@tsteur tsteur self-assigned this Nov 7, 2014
@mattab
Copy link
Member Author

mattab commented Nov 7, 2014

Closing as @tsteur has finished this nice solution of a sequence table, and fixed the other "deletion" lock 👍

@mattab mattab closed this as completed Nov 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

3 participants