Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrity constraint violation: 1062 Duplicate entry '...' for key 'PRIMARY for segment temporary table #18937

Closed
tsteur opened this issue Mar 14, 2022 · 28 comments · Fixed by #19662
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Mar 14, 2022

Getting recently below error regularly. I assume this started with #18797

[2022-03-13 19:01:13] piwik.ERROR: Uncaught exception in API: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '127824106' for key 'PRIMARY' in libs/Zend/Db/Statement/Pdo.php:233 Stack trace: #0 libs/Zend/Db/Statement/Pdo.php(233): PDOStatement->execute() #1 libs/Zend/Db/Statement.php(300): Zend_Db_Statement_Pdo->_execute(Array) #2 libs/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array) #3 libs/Zend/Db/Adapter/Pdo/Abstract.php(238): Zend_Db_Adapter_Abstract->query('INSERT INTO log...', Array) #4 core/Db/Adapter/Pdo/Mysql.php(314): Zend_Db_Adapter_Pdo_Abstract->query('INSERT INTO log...', Array) #5 core/DataAccess/LogAggregator.php(325): Piwik\Db\Adapter\Pdo\Mysql->query('INSERT INTO log...', Array) #6 core/DataAccess/LogAggregator.php(351): Piwik\DataAccess\LogAggregator->createTemporaryTable('logtmpsegmented...', 'SELECT
/* trigg...', Array) #7 core/DataAccess/LogAggregator.php(561): Piwik\DataAccess\LogAggregator->generateQuery('count(distinct ...', Array, 'log_visit.visit...', '', '') #8 core/ArchiveProcessor.php(546): Piwik\DataAccess\LogAggregator->queryVisitsByDimension(Array, 'log_visit.visit...', Array, Array) #9

Goal of this issue to avoid these problems. The segment query already seems to do a distinct so maybe we execute the query twice when the data actually already exists and there is no need to execute it again?

@tsteur tsteur added Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should. labels Mar 14, 2022
@tsteur tsteur added this to the 4.9.0 milestone Mar 14, 2022
@tsteur
Copy link
Member Author

tsteur commented Mar 15, 2022

In case it helps, it seems to mostly happen with high traffic instances. It could be though that it's unrelated to high traffic and maybe they use certain features or so

@mikkeschiren
Copy link
Contributor

The site we have the issue on would be regarded as a high traffic site. For low traffic instances, we don't see the issue at the moment.

@bx80
Copy link
Contributor

bx80 commented Mar 31, 2022

I think this is going to be hard to track down without a full segment query. The exception was thrown by code inside the createTemporaryTable method just after a check to make sure the table doesn't already exist, so that would seem to rule out the query being executed twice?

Since it's likely that prior to the temporary index being added in #18797 duplicate records were being inserted, maybe it would be an acceptable workaround to use INSERT IGNORE when inserting the records into the temporary table?

@sgiehl
Copy link
Member

sgiehl commented Apr 5, 2022

@tsteur I guess the error happens during archiving, right? Are you able to look up which segment is archived when it fails?

@tsteur
Copy link
Member Author

tsteur commented Apr 5, 2022

@sgiehl I'm seeing in one of them around visitorType and there are some nested calls which I can see in the full backtrace.
fyi we're seeing this error around 20+ times a day. I'll ping you in chat re full details.

@justinvelluppillai justinvelluppillai modified the milestones: 4.9.0, 4.10.0 Apr 12, 2022
@sgiehl
Copy link
Member

sgiehl commented Apr 12, 2022

@tsteur I'm not able to reproduce that in any way. Which makes it hard to find the problem.
Based on the error messages it seems it happens unrelated to the type of segment, period or date that is archived.
I guess the error could basically have two reasons:

  • The segment query returns an id twice, which should be unlikely due to the DISTINCT select. We could maybe try if using a GROUP BY would solve the issue
  • The method tries to insert the data "twice". That would mean the checks if the table already exists won't work properly for some reason. But looking at the code that shouldn't happen.

@tsteur
Copy link
Member Author

tsteur commented Apr 12, 2022

@sgiehl what would be the next steps to resolve this issue and make progress?

If the distinct was the problem, I would assume then it be maybe more reproducible but maybe not.

@sgiehl
Copy link
Member

sgiehl commented Apr 12, 2022

That's a good question. Simply using a Insert Ignore would for sure solve the issue. But it might only hide the origin problem.

Not sure if cloud team could maybe try to debug that a bit more to see if it is the query that contains duplicates or if the table already exists before, but records are inserted again nevertheless.

@tsteur
Copy link
Member Author

tsteur commented Apr 12, 2022

I had a quick search for "when does mysql distinct not return distinct values" and "mysql distinct vs group by" as we can pretty much rule out the table exists check not working.

Finding few posts around like https://www.experts-exchange.com/articles/12282/Select-Distinct-is-returning-duplicates.html where it does mention the group by you mentioned earlier as well. Although maybe it's more when other columns are included. It's not quite clear.
Also it might hard to say if a group by was faster or slower (seems to depend on indexes etc, because we sort anyway it might not be a big performance difference). Otherwise potentially the ignore could be a solution.

Can you get in touch with the cloud team so they know what to log?

@sgiehl
Copy link
Member

sgiehl commented Apr 25, 2022

@tsteur Pushed some code that could maybe help to identify the queries that produce duplicate records:
https://github.com/matomo-org/matomo/pull/new/m18797
I haven't tested the changes, and also I'm not sure if logging an error might still let the archiving fail. So maybe we should log an info instead 🤔 What do you think?

@tsteur
Copy link
Member Author

tsteur commented Apr 26, 2022

Be good to test it if the log

  • makes the archiving fail if using cron archiving
  • if the error could show up in the UI when this happens assuming browser archiving is enabled
  • If there's anything printed, the echoed JSON may not be able to be read

If only the cloud team applies this patch, then the cloud team can use a special logger though that won't cause any issues and only logs to file.

Is the purpose of the second query to check if the insert works when using INSERT IGNORE? It might be good to still throw the exception in that case just to make sure we won't ignore this error as it otherwise could cause performance issues from running the same query twice (these are often very slow).

@sgiehl
Copy link
Member

sgiehl commented Apr 28, 2022

@tsteur I would suggest we only let the cloud team apply the patch, so we can gather the queries that maybe produce the problem. I've added the INSERT IGNORE so the archiving won't fail any longer. But we can also leave that away in a first step, so we only log the queries.

@tsteur
Copy link
Member Author

tsteur commented Apr 28, 2022

@JasonMortonNZ @samjf could we log the query mentioned in https://github.com/matomo-org/matomo/compare/m18797?expand=1#diff-4140e491c64a9dcbf541f11e1db846adc3c27a4000810494ba8688e5feeee776R332 using our cloud file logger whenever the error occurs?

Because the temp table insert eventually works at some point when it tries to archive again, we would need to re-run the select part of that query fairly soon after this error happens at some point and see if we get duplicate data.

@sgiehl sgiehl modified the milestones: 4.10.0, 4.11.0 May 5, 2022
@justinvelluppillai justinvelluppillai modified the milestones: 4.11.0, 4.12.0 Jun 7, 2022
@tsteur
Copy link
Member Author

tsteur commented Jun 9, 2022

We've been logging the queries that produced this error and then executed the select part. It's not clear why they would produce any duplicate visitIDs. Often they don't even have any joins. Also created temporary tables and tested the identical insert query but no error either.

Example query:

SELECT /* trigger = CronArchive, idSegments = [90] */
				distinct log_visit.idvisit as idvisit
			FROM
				log_visit AS log_visit
			WHERE
				( log_visit.visit_last_action_time >= "2022-06-05 22:00:00"
				AND log_visit.visit_last_action_time <= "2022-06-12 21:59:59"
				AND log_visit.idsite IN (1) )
                AND
                ( log_visit.config_device_type = 0 )
                group by idvisit having count(*) > 1
			ORDER BY
				log_visit.idvisit ASC

I've executed a few different queries and never received any duplicate visit ID (when testing the select) or error (when testing the insert).

This was tested from a MySQL client directly, not through application code.

What else could we log or do to resolve this issue @sgiehl ?

cc @JasonMortonNZ @samjf

@tsteur
Copy link
Member Author

tsteur commented Jun 9, 2022

Verified that the table is empty when this error happens, then the insert is executed and the duplicate record error appears.

@tsteur
Copy link
Member Author

tsteur commented Jun 9, 2022

FYI I've also applied below patch but the error seems to still happen meaning it doesn't look like the read uncommitted is causing this.

diff --git a/core/DataAccess/LogAggregator.php b/core/DataAccess/LogAggregator.php
index cada4b1571..eb2a94611b 100644
--- a/core/DataAccess/LogAggregator.php
+++ b/core/DataAccess/LogAggregator.php
@@ -305,7 +305,7 @@ class LogAggregator
             // set uncommitted is easily noticeable in the code as it could be missed quite easily otherwise
             // we set uncommitted so we don't make the INSERT INTO... SELECT... locking ... we do not want to lock
             // eg the visits table
-            if (!$transactionLevel->setUncommitted()) {
+            if (0 && !$transactionLevel->setUncommitted()) {
                 $canSetTransactionLevel = false;
             }
         }

@tsteur
Copy link
Member Author

tsteur commented Jun 9, 2022

Is there anything else we should log or try?

@EreMaijala
Copy link
Contributor

@tsteur This may be a silly question since I'm not sure what's actually being done here, but is it by any change possible that this happens when there are concurrent archivers and they happen to write same id's to the same temporary table at the same time?

@tsteur
Copy link
Member Author

tsteur commented Jun 10, 2022

@EreMaijala I was wondering this as well but it shouldn't be unless the DB connection was shared.

Temporary tables only exist for the DB connection and aren't visible to other DB connections. As soon as the DB connection is closed the temporary table is also removed.

The archiving basically works like this:

  • the core:archive script is running and launches sub-commands for each archive that needs to be processed
    • it launches a sub-command (eg using shell_exec in php) to archive a report. These sub-commands could technically share the DB connection from the parent but it's not the case from what I can see on our platform.
    • it may launch multiple sub-commands in parallel (eg using shell_exec in php) to trigger the archiving of reports. This is not the case though on our platform as we always only launch one command. Meaning we can be sure there's no DB connection reuse happening
  • another core:archive script may be running in parallel. This has guaranteed a new DB connection and wouldn't cause a problem
    • it launches another command (eg using shell_exec in php) to trigger the archiving
    • it may launch multiple commands in parallel (eg using shell_exec in php) to trigger the archiving

Sorry it's probably not clear what I'm describing above.

There are cases where sub-processes launched from one PHP command can share the DB connection from the parent command. From testing this is not the case though on our platform and we're also not launching multiple commands in parallel so this shouldn't be the case here.

I believe we can rule out that parallel commands are trying to write into the same table.

See also https://dba.stackexchange.com/a/22103

@justinvelluppillai
Copy link
Contributor

@tsteur is there anything we can do here yet to move this one along or should we move it out of the milestone?

@tsteur
Copy link
Member Author

tsteur commented Aug 9, 2022

@justinvelluppillai that would be up to the core team but we definitely need to resolve this issue as it affects quite a few users and breaks the archiving.

@bx80
Copy link
Contributor

bx80 commented Aug 10, 2022

Would it be worth writing a test script to build a suitable random dataset and then repeating the select / temporary table creation until the error is encountered? If we can automate recreating the issue reliably, even if only once every x attempts, then we could narrow down the possible causes.

@tsteur
Copy link
Member Author

tsteur commented Aug 10, 2022

I did run these queries many times after we encountered the error on production and couldn't reproduce it. It was using the very same query just minutes later and running it many times. It may be really hard to reproduce. Since we know we don't execute the query twice we could potentially also think about simply adding an ignore to the insert and the problem be gone? Just a random thought.

@bx80
Copy link
Contributor

bx80 commented Aug 10, 2022

That seems like a pragmatic solution to me 👍 We've established that it only happens occasionally, it's hard to reproduce, we're not running the query multiple times and since the example queries are returning a simple list of primary keys then ignoring a duplicate insert value won't affect the intended outcome.

@EreMaijala
Copy link
Contributor

While I appreciate the pragmatic approach and hope for a quick resolution to be able to upgrade Matomo, I'm still a bit worried that the root cause is unknown. Not that I have any better idea, so maybe ignore is the best solution anyway.

@SteVio89
Copy link

SteVio89 commented Aug 10, 2022

We also had the problem on our instance.

Additionally, we had another issue related to MariaDB, similar to #18864.

By adding enable_segments_cache = 0 to the config file, we were able to resolve both issues.

It may not be the best solution for the issue, but it may be a temporary solution until a fix is found.

@tsteur
Copy link
Member Author

tsteur commented Aug 10, 2022

enable_segments_cache = 0

When this is set, the query will be no longer executed but it also makes archiving the reports in the background slower.

While I appreciate the pragmatic approach and hope for a quick resolution to be able to upgrade Matomo, I'm still a bit worried that the root cause is unknown

I would be too. However, because we ruled out that the same query might be executed twice and is writing data twice into it I'm not as worried. It's neither executed in parallel by another process nor after another. This would have been my biggest worry.

@justinvelluppillai
Copy link
Contributor

Let's go with the INSERT IGNORE for this and progress the issue.

@sgiehl sgiehl self-assigned this Aug 23, 2022
@justinvelluppillai justinvelluppillai added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants