@tsteur opened this Issue on March 14th 2022 Member

Getting recently below error regularly. I assume this started with https://github.com/matomo-org/matomo/pull/18797

[2022-03-13 19:01:13] piwik.ERROR: Uncaught exception in API: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '127824106' for key 'PRIMARY' in libs/Zend/Db/Statement/Pdo.php:233 Stack trace: #0 libs/Zend/Db/Statement/Pdo.php(233): PDOStatement->execute() #1 libs/Zend/Db/Statement.php(300): Zend_Db_Statement_Pdo->_execute(Array) #2 libs/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array) #3 libs/Zend/Db/Adapter/Pdo/Abstract.php(238): Zend_Db_Adapter_Abstract->query('INSERT INTO log...', Array) #4 core/Db/Adapter/Pdo/Mysql.php(314): Zend_Db_Adapter_Pdo_Abstract->query('INSERT INTO log...', Array) #5 core/DataAccess/LogAggregator.php(325): Piwik\Db\Adapter\Pdo\Mysql->query('INSERT INTO log...', Array) #6 core/DataAccess/LogAggregator.php(351): Piwik\DataAccess\LogAggregator->createTemporaryTable('logtmpsegmented...', 'SELECT
/* trigg...', Array) #7 core/DataAccess/LogAggregator.php(561): Piwik\DataAccess\LogAggregator->generateQuery('count(distinct ...', Array, 'log_visit.visit...', '', '') #8 core/ArchiveProcessor.php(546): Piwik\DataAccess\LogAggregator->queryVisitsByDimension(Array, 'log_visit.visit...', Array, Array) #9

Goal of this issue to avoid these problems. The segment query already seems to do a distinct so maybe we execute the query twice when the data actually already exists and there is no need to execute it again?

@tsteur commented on March 15th 2022 Member

In case it helps, it seems to mostly happen with high traffic instances. It could be though that it's unrelated to high traffic and maybe they use certain features or so

@mikkeschiren commented on March 30th 2022

The site we have the issue on would be regarded as a high traffic site. For low traffic instances, we don't see the issue at the moment.

@bx80 commented on March 31st 2022 Contributor

I think this is going to be hard to track down without a full segment query. The exception was thrown by code inside the createTemporaryTable method just after a check to make sure the table doesn't already exist, so that would seem to rule out the query being executed twice?

Since it's likely that prior to the temporary index being added in #18797 duplicate records were being inserted, maybe it would be an acceptable workaround to use INSERT IGNORE when inserting the records into the temporary table?

@sgiehl commented on April 5th 2022 Member

@tsteur I guess the error happens during archiving, right? Are you able to look up which segment is archived when it fails?

@tsteur commented on April 5th 2022 Member

@sgiehl I'm seeing in one of them around visitorType and there are some nested calls which I can see in the full backtrace.
fyi we're seeing this error around 20+ times a day. I'll ping you in chat re full details.

@sgiehl commented on April 12th 2022 Member

@tsteur I'm not able to reproduce that in any way. Which makes it hard to find the problem.
Based on the error messages it seems it happens unrelated to the type of segment, period or date that is archived.
I guess the error could basically have two reasons:

  • The segment query returns an id twice, which should be unlikely due to the DISTINCT select. We could maybe try if using a GROUP BY would solve the issue
  • The method tries to insert the data "twice". That would mean the checks if the table already exists won't work properly for some reason. But looking at the code that shouldn't happen.
@tsteur commented on April 12th 2022 Member

@sgiehl what would be the next steps to resolve this issue and make progress?

If the distinct was the problem, I would assume then it be maybe more reproducible but maybe not.

@sgiehl commented on April 12th 2022 Member

That's a good question. Simply using a Insert Ignore would for sure solve the issue. But it might only hide the origin problem.

Not sure if cloud team could maybe try to debug that a bit more to see if it is the query that contains duplicates or if the table already exists before, but records are inserted again nevertheless.

@tsteur commented on April 12th 2022 Member

I had a quick search for "when does mysql distinct not return distinct values" and "mysql distinct vs group by" as we can pretty much rule out the table exists check not working.

Finding few posts around like https://www.experts-exchange.com/articles/12282/Select-Distinct-is-returning-duplicates.html where it does mention the group by you mentioned earlier as well. Although maybe it's more when other columns are included. It's not quite clear.
Also it might hard to say if a group by was faster or slower (seems to depend on indexes etc, because we sort anyway it might not be a big performance difference). Otherwise potentially the ignore could be a solution.

Can you get in touch with the cloud team so they know what to log?

@sgiehl commented on April 25th 2022 Member

@tsteur Pushed some code that could maybe help to identify the queries that produce duplicate records:
https://github.com/matomo-org/matomo/pull/new/m18797
I haven't tested the changes, and also I'm not sure if logging an error might still let the archiving fail. So maybe we should log an info instead 🤔 What do you think?

@tsteur commented on April 26th 2022 Member

Be good to test it if the log

  • makes the archiving fail if using cron archiving
  • if the error could show up in the UI when this happens assuming browser archiving is enabled
  • If there's anything printed, the echoed JSON may not be able to be read

If only the cloud team applies this patch, then the cloud team can use a special logger though that won't cause any issues and only logs to file.

Is the purpose of the second query to check if the insert works when using INSERT IGNORE? It might be good to still throw the exception in that case just to make sure we won't ignore this error as it otherwise could cause performance issues from running the same query twice (these are often very slow).

@sgiehl commented on April 28th 2022 Member

@tsteur I would suggest we only let the cloud team apply the patch, so we can gather the queries that maybe produce the problem. I've added the INSERT IGNORE so the archiving won't fail any longer. But we can also leave that away in a first step, so we only log the queries.

@tsteur commented on April 28th 2022 Member

@JasonMortonNZ @samjf could we log the query mentioned in https://github.com/matomo-org/matomo/compare/m18797?expand=1#diff-4140e491c64a9dcbf541f11e1db846adc3c27a4000810494ba8688e5feeee776R332 using our cloud file logger whenever the error occurs?

Because the temp table insert eventually works at some point when it tries to archive again, we would need to re-run the select part of that query fairly soon after this error happens at some point and see if we get duplicate data.

@tsteur commented on June 9th 2022 Member

We've been logging the queries that produced this error and then executed the select part. It's not clear why they would produce any duplicate visitIDs. Often they don't even have any joins. Also created temporary tables and tested the identical insert query but no error either.

Example query:

SELECT /* trigger = CronArchive, idSegments = [90] */
                distinct log_visit.idvisit as idvisit
            FROM
                log_visit AS log_visit
            WHERE
                ( log_visit.visit_last_action_time >= "2022-06-05 22:00:00"
                AND log_visit.visit_last_action_time <= "2022-06-12 21:59:59"
                AND log_visit.idsite IN (1) )
                AND
                ( log_visit.config_device_type = 0 )
                group by idvisit having count(*) > 1
            ORDER BY
                log_visit.idvisit ASC

I've executed a few different queries and never received any duplicate visit ID (when testing the select) or error (when testing the insert).

This was tested from a MySQL client directly, not through application code.

What else could we log or do to resolve this issue @sgiehl ?

cc @JasonMortonNZ @samjf

@tsteur commented on June 9th 2022 Member

Verified that the table is empty when this error happens, then the insert is executed and the duplicate record error appears.

@tsteur commented on June 9th 2022 Member

FYI I've also applied below patch but the error seems to still happen meaning it doesn't look like the read uncommitted is causing this.

diff --git a/core/DataAccess/LogAggregator.php b/core/DataAccess/LogAggregator.php
index cada4b1571..eb2a94611b 100644
--- a/core/DataAccess/LogAggregator.php
+++ b/core/DataAccess/LogAggregator.php
@@ -305,7 +305,7 @@ class LogAggregator
             // set uncommitted is easily noticeable in the code as it could be missed quite easily otherwise
             // we set uncommitted so we don't make the INSERT INTO... SELECT... locking ... we do not want to lock
             // eg the visits table
-            if (!$transactionLevel->setUncommitted()) {
+            if (0 && !$transactionLevel->setUncommitted()) {
                 $canSetTransactionLevel = false;
             }
         }
@tsteur commented on June 9th 2022 Member

Is there anything else we should log or try?

@EreMaijala commented on June 9th 2022 Contributor

@tsteur This may be a silly question since I'm not sure what's actually being done here, but is it by any change possible that this happens when there are concurrent archivers and they happen to write same id's to the same temporary table at the same time?

@tsteur commented on June 10th 2022 Member

@EreMaijala I was wondering this as well but it shouldn't be unless the DB connection was shared.

Temporary tables only exist for the DB connection and aren't visible to other DB connections. As soon as the DB connection is closed the temporary table is also removed.

The archiving basically works like this:

  • the core:archive script is running and launches sub-commands for each archive that needs to be processed
    • it launches a sub-command (eg using shell_exec in php) to archive a report. These sub-commands could technically share the DB connection from the parent but it's not the case from what I can see on our platform.
    • it may launch multiple sub-commands in parallel (eg using shell_exec in php) to trigger the archiving of reports. This is not the case though on our platform as we always only launch one command. Meaning we can be sure there's no DB connection reuse happening
  • another core:archive script may be running in parallel. This has guaranteed a new DB connection and wouldn't cause a problem
    • it launches another command (eg using shell_exec in php) to trigger the archiving
    • it may launch multiple commands in parallel (eg using shell_exec in php) to trigger the archiving

Sorry it's probably not clear what I'm describing above.

There are cases where sub-processes launched from one PHP command can share the DB connection from the parent command. From testing this is not the case though on our platform and we're also not launching multiple commands in parallel so this shouldn't be the case here.

I believe we can rule out that parallel commands are trying to write into the same table.

See also https://dba.stackexchange.com/a/22103

Powered by GitHub Issue Mirror