@eramirezprotec opened this Issue on February 10th 2020 Contributor

Hi!

I'm getting this error every day since I updated Matomo to version 3.13.1:

INFO [2020-02-09 04:07:32] 9595 Failed to invalidate archived reports: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

I found some tickets about the deadlock situation but none of them speak about the deadlock happening when trying to invalidate archived reports.

Thank you very much.

@tsteur commented on February 10th 2020 Member

seeing here https://github.com/matomo-org/matomo/issues/6398#issuecomment-388370992 reports to invalidate are mentioned as well. Might be the same. How often do you see this error per day? Are you also using the log importer? Is there any chance you could enable mysql error log so we can get more information about the deadlock?

@pardvm commented on February 11th 2020

@tsteur,
I've able to relate this problem with this other: #15267
In that issue, you sugested to have a sleep between the launch of each cron archiver. We did it indeed but it doens't seems to work.

Do you have another idea on how to get several archivers synchronized in order only one site is processed?

@tsteur commented on February 11th 2020 Member

@pardvm have you tried to increase the sleep to say one minute (assuming you launch them hourly)? And I suppose the sleep is executed correctly with different sleep intervals for each archiver? Looking at the code this should work. Let me know if this doesn't work or if you confirm it's set up like this. I suppose we could in general also add some random minor sleep interval at the beginning of the script just to avoid these issues in general. @pardvm as a result of this do you also see deadlocks?

@pardvm commented on February 12th 2020

@tsteur,
We've increased the sleep to 30s and, yes: each archiver is launched 30s after the previous.

I'll keep observing and I'll come back here with the results.

@pardvm commented on February 13th 2020

@tsteur,
We haven't observed the deadlock since last modification regarding the sleep: cross fingers!
Thanks

@tsteur commented on February 13th 2020 Member

Awesome @pardvm fingers crossed. Let me know if they come back otherwise.

@eramirezprotec might be worth a try as well in case you're starting multiple archivers at the same time?

I will keep this issue open to at least at a tiny random sleep at the beginning of the archive script to prevent this issue a bit better.

@diosmosis @mattab thinking to add some sleep to the beginning of the archiver script in CoreArchiver.php command.

        // we randomly sleep up to half a second in case multiple archivers are being launched at the same time.
        // this can prevent deadlocks see https://github.com/matomo-org/matomo/issues/15545 although ideally we should
        // sleep here a bit longer to better avoid this.
        usleep(Common::getRandomInt(0, 500000));

For example. This would randomly sleep between 0 and 0.5seconds. From above we can see though that this might not be enough and we'd need to sleep randomly maybe between 0 and 10 or 0 and 30 seconds to minimise this issue. Not sure that's an issue? It might still not fully prevent any race conditions there but at least it be less likely. Fixing the actual race conditions be tricky. Would prefer not adding a new parameter just for that as it then doesn't work out of the box.

Alternatively, we could have an FAQ describing this behaviour and would also need to add a sleep for that example: https://github.com/matomo-org/matomo/issues/4903#issuecomment-48314857

However, FAQs may not be viewed so if it worked out of the box it be even better. Thought there maybe was an FAQ already but couldn't find it. On https://matomo.org/docs/setup-auto-archiving/ it doesn't describe it. So as part of this guide it could be maybe mentioned how to launch multiple archivers with the sleep instead of changing the code.

refs https://github.com/matomo-org/matomo/issues/14217

@pardvm commented on February 13th 2020

@tsteur,
It just have ocurred again!...
All day was fine but... it is still here (though randomly)
What a pity indeed.

P.S.- F.Y.I. @eramirezprotec is a colleague at work and we both are trying to figure this problem out

@pardvm commented on February 14th 2020

@tsteur, @eramirezprotec
For your information, bellow is the complete stack trace of the last occurrence of the problem:

 ERROR [2020-02-14 02:09:00] 2191 Got invalid response from API request: ?module=API&method=API.get&idSite=92&period=day&date=last15&format=php&trigger=archivephp. Response was 'a:2:{s:6:"result";s:5:"error";s:7:"message";s:11640:"SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction 

<a href='/0'>#0</a> /data/httpd-pwk-2.4.25/htdocs/analytics/libs/Zend/Db/Statement.php(300): Zend_Db_Statement_Pdo->_execute(Array) 
<a href='/1'>#1</a> /data/httpd-pwk-2.4.25/htdocs/analytics/libs/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array) 
<a href='/2'>#2</a> /data/httpd-pwk-2.4.25/htdocs/analytics/libs/Zend/Db/Adapter/Pdo/Abstract.php(238): Zend_Db_Adapter_Abstract->query('DELETE FROM `pi...', Array) 
<a href='/3'>#3</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Db/Adapter/Pdo/Mysql.php(309): Zend_Db_Adapter_Pdo_Abstract->query('DELETE FROM `pi...', Array) 
<a href='/4'>#4</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Db.php(276): Piwik\Db\Adapter\Pdo\Mysql->query('DELETE FROM `pi...', Array) 
<a href='/5'>#5</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Option.php(235): Piwik\Db::query('DELETE FROM `pi...', Array) 
<a href='/6'>#6</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Option.php(94): Piwik\Option->deleteNameLike('report_to_inval...', NULL) 
<a href='/7'>#7</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Archive/ArchiveInvalidator.php(194): Piwik\Option::deleteLike('report_to_inval...') 
<a href='/8'>#8</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Archive/ArchiveInvalidator.php(268): Piwik\Archive\ArchiveInvalidator->forgetRememberedArchivedReportsToInvalidate(92,Object(Piwik\Date)) 
<a href='/9'>#9</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Archive.php(508): Piwik\Archive\ArchiveInvalidator->markArchivesAsInvalidated(Array, Array, false) 
<a href='/10'>#10</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Archive.php(641): Piwik\Archive->invalidatedReportsIfNeeded() 
<a href='/11'>#11</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Archive.php(620): Piwik\Archive->cacheArchiveIdsAfterLaunching(Array, Array) 
<a href='/12'>#12</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Archive.php(549): Piwik\Archive->getArchiveIds(Array) 
<a href='/13'>#13</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Archive.php(303): Piwik\Archive->get(Array, 'numeric') 
<a href='/14'>#14</a> /data/httpd-pwk-2.4.25/htdocs/analytics/plugins/VisitsSummary/API.php(47): Piwik\Archive->getDataTableFromNumeric(Array) 
<a href='/15'>#15</a> [internal function]: Piwik\Plugins\VisitsSummary\API->get(92, 'day', 'last15', false, Array) 
<a href='/16'>#16</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/API/Proxy.php(237): call_user_func_array(Array, Array) 
<a href='/17'>#17</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Context.php(28): Piwik\API\Proxy->Piwik\API\{closure}() 
<a href='/18'>#18</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/API/Proxy.php(328): Piwik\Context::executeWithQueryParameters(Array, Object(Closure)) 
<a href='/19'>#19</a> /data/httpd-pwk-2.4.25/htdocs/analytics/plugins/API/API.php(435): Piwik\API\Proxy->call('\Piwik\Plugins\...', 'get', Array) 
<a href='/20'>#20</a> [internal function]: Piwik\Plugins\API\API->get(92, 'day', 'last15', false, Array) 
<a href='/21'>#21</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/API/Proxy.php(237): call_user_func_array(Array, Array) 
<a href='/22'>#22</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Context.php(28): Piwik\API\Proxy->Piwik\API\{closure}() 
<a href='/23'>#23</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/API/Proxy.php(328): Piwik\Context::executeWithQueryParameters(Array, Object(Closure)) 
<a href='/24'>#24</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/API/Request.php(265): Piwik\API\Proxy->call('\Piwik\Plugins\...', 'get', Array) 
<a href='/25'>#25</a> /data/httpd-pwk-2.4.25/htdocs/analytics/plugins/API/Controller.php(41): Piwik\API\Request->process() 
<a href='/26'>#26</a> [internal function]: Piwik\Plugins\API\Controller->index() 
<a href='/27'>#27</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/FrontController.php(589): call_user_func_array(Array, Array) 
<a href='/28'>#28</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/FrontController.php(165): Piwik\FrontController->doDispatch('API', false, Array) 
<a href='/29'>#29</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/dispatch.php(34): Piwik\FrontController->dispatch() 
<a href='/30'>#30</a> /data/httpd-pwk-2.4.25/htdocs/analytics/index.php(27): require_once('/data/httpd-pwk...') 
<a href='/31'>#31</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/CliMulti/RequestCommand.php(79): require_once('/data/httpd-pwk...') 
<a href='/32'>#32</a> /data/httpd-pwk-2.4.25/htdocs/analytics/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(257): Piwik\CliMulti\RequestCommand->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput)) 
<a href='/33'>#33</a> /data/httpd-pwk-2.4.25/htdocs/analytics/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput)) 
<a href='/34'>#34</a> /data/httpd-pwk-2.4.25/htdocs/analytics/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Piwik\CliMulti\RequestCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput)) 
<a href='/35'>#35</a> [internal function]: Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput)) 
<a href='/36'>#36</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Console.php(125): call_user_func(Array, Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput)) 
<a href='/37'>#37</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Access.php(644): Piwik\Console->Piwik\{closure}() 
<a href='/38'>#38</a> /data/httpd-pwk-2.4.25/htdocs/analytics/core/Console.php(126): Piwik\Access::doAsSuperUser(Object(Closure)) 
<a href='/39'>#39</a> /data/httpd-pwk-2.4.25/htdocs/analytics/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Piwik\Console->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput)) 
<a href='/40'>#40</a> /data/httpd-pwk-2.4.25/htdocs/analytics/console(27): Symfony\Component\Console\Application->run() 
<a href='/41'>#41</a> {main}, caused by: SQLSTATE[40001]: Serialization fa
@tsteur commented on February 16th 2020 Member

I wonder if it's related to something I've seen yesterday where we do a lot of deletes there on the option and the general cache https://github.com/matomo-org/matomo/blob/3.13.2/core/Archive/ArchiveInvalidator.php#L178-L195

As a result some other requests might try to insert/update these option entries while another one is trying to delete it etc. As above is in a for loop there could be such edge conditions maybe.

@pardvm when this happens it be great - if you can - shortly afterwards execute on the database this query

SHOW ENGINE INNODB STATUS

This would maybe let us better understand which lock it was waiting for.

If that's not easily possible, you could set the log level to debug temporarily in config/config.ini.php but this could have side effects that you see a lot of messages in the Matomo UI. You would probably want to change this line from Log::debug to Log::warn in https://github.com/matomo-org/matomo/blob/3.13.2/core/Db.php#L813

And also make sure to add a logging to file see https://matomo.org/faq/troubleshooting/faq_115/
image

If that's not easily possible, you could temporarily maybe change the above line to throw new Exception("Encountered deadlock: " . print_r($deadlockInfo, true));

@tsteur commented on February 16th 2020 Member

Hope this helps. And thanks for your help troubleshooting this

@pardvm commented on February 16th 2020

@tsteur ,
Pff!!. It would be very difficult to be awake when the error occurs again. However, I'll try the SHOW ENGINE INNODB STATUS command as soon I can.

Also, I'll evalute doing the temporary change in the code to flush $deadlockInfo

Thanks a lot for your support

@pardvm commented on February 17th 2020

@tsteur,
Find attached in innodb-status.txt the last result for show engine innodb status after the dead-lock occured.

Hope this could help.
Let me know about anything I can do to help

@tsteur commented on February 17th 2020 Member

@pardvm thanks very much for this.

The lock in here seems actually bit different and not for that particular case:

MySQL thread id 2891310, OS thread handle 140646037559040, query id 925132939 minfor16 10.0.3.116 piwik updating
UPDATE piwik_log_visit SET idvisitor = ..., visit_last_action_time = '2020-02-17 09:10:31', visit_exit_idaction_name = '10708013', visit_exit_idaction_url = '17443', visit_total_actions = visit_total_actions + 1 , visit_total_interactions = visit_total_interactions + 1 , visit_total_time = '85' WHERE idsite = '13' AND idvisit = '1226508397'
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 955 page no 3893779 n bits 136 index PRIMARY of table piwik.piwik_log_visit trx id 1425206090 lock_mode X locks rec but not gap waiting
Record lock, heap no 37 PHYSICAL RECORD: n_fields 79; compact format; info bits 0

refs https://github.com/matomo-org/matomo/issues/14619

@pardvm commented on February 20th 2020

@tsteur,
I've been able to capture another deadlock.
For your reference, please, find attached corresponding information:
innodb-status-20-02.txt

@Marc-Whiteman commented on February 20th 2020

We are seeing the same error while running ver 3.13.1:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-02-20 11:10:11 0x7feae7fff700
*** (1) TRANSACTION:
TRANSACTION 1436045592, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 62 row lock(s)
MySQL thread id 3730831, OS thread handle 140647735736064, query id 1169747052 minfor15.mas.local 10.0.3.39 piwik updating
DELETE FROM `piwik_option` WHERE option_name LIKE 'report_to_invalidate_90_2020-02-20%'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 872 page no 71 n bits 280 index PRIMARY of table `piwik`.`piwik_option` trx id 1436045592 lock_mode X waiting
Record lock, heap no 184 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 7265706f72745f746f5f696e76616c69646174655f375f323032302d3032; asc report_to_invalidate_7_2020-02; (total 38 bytes);
 1: len 6; hex 00005598505f; asc   U P_;;
 2: len 7; hex a30007951d0132; asc       2;;
 3: len 1; hex 31; asc 1;;
 4: len 1; hex 80; asc  ;;
@pardvm commented on February 20th 2020

@Marc-Whiteman,
As far I have been able to discover, the efects of that deadlock are not "visible" to the operators accessing Matomo dashboards (I gues that it is due to the fact that the problem arises during invalidation process of a previous archive during the current day.
Can you also confirm that?

@tsteur commented on February 20th 2020 Member

Cheers for this input. This helps me understand things better. Created https://github.com/matomo-org/matomo/pull/15603 which hopefully fixes it. Feel free to give it a try already if you can. Hoping this can make it into the release on Monday/Tuesday if not it'll be in 3.13.4 I reckon. If you could give us feedback if that helps that be great.

copying some info from the txt in here.

*** (1) TRANSACTION:
TRANSACTION 1436045592, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 62 row lock(s)
DELETE FROM `piwik_option` WHERE option_name LIKE 'report_to_invalidate_90_2020-02-20%'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 872 page no 71 n bits 280 index PRIMARY of table `piwik`.`piwik_option` trx id 1436045592 lock_mode X waiting
Record lock, heap no 184 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 7265706f72745f746f5f696e76616c69646174655f375f323032302d3032; asc report_to_invalidate_7_2020-02; (total 38 bytes);

*** (2) TRANSACTION:
TRANSACTION 1436045407, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
76 lock struct(s), heap size 8400, 93 row lock(s), undo log entries 151
MySQL thread id 3730828, OS thread handle 140646891386624, query id 1169747207 minfor16 10.0.3.116 piwik update
INSERT IGNORE INTO `piwik_option` (option_name, option_value, autoload) VALUES ('report_to_invalidate_58_2020-02-20_5626', '1', '0')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 872 page no 71 n bits 280 index PRIMARY of table `piwik`.`piwik_option` trx id 1436045407 lock_mode X locks rec but not gap
Record lock, heap no 184 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 30; hex 7265706f72745f746f5f696e76616c69646174655f375f323032302d3032; asc report_to_invalidate_7_2020-02; (total 38 bytes);

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 872 page no 71 n bits 280 index PRIMARY of table `piwik`.`piwik_option` trx id 1436045407 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 175 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
@tsteur commented on February 24th 2020 Member

This should be hopefully fixed. Let us know if it still happens with the latest release and I'll reopen.

This Issue was closed on February 24th 2020
Powered by GitHub Issue Mirror