@orlitzky opened this Issue on December 19th 2020

Our log import (matomo-4.0.5) failed last night with the following error,

ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] Exception: /var/www/viabit.com/analytics/public/core/Tracker/Db/Pdo/Mysql.php(238): Error query: SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\xF0\x9F\x92\x95\xF0\x9F...' for column `piwik_viabit`.`log_action`.`name` at row 1 In query: INSERT INTO log_action (name, hash, type, url_prefix) VALUES (?,CRC32(?),?,?) Parameters: array (
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 0 => '💕🦸 Pills without prescription on 🎁 www.NetDoctor.store 🎁 Buy Levitra from $0.84/pill 🦸💕Purchase Levitra Canada - Levitra 20 Mg Order Online',
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 1 => '💕🦸 Pills without prescription on 🎁 www.NetDoctor.store 🎁 Buy Levitra from $0.84/pill 🦸💕Purchase Levitra Canada - Levitra 20 Mg Order Online',
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 2 => 8,
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 3 => NULL,
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] )
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] Error query: SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\xF0\x9F\x92\x95\xF0\x9F...' for column `piwik_viabit`.`log_action`.`name` at row 1 In query: INSERT INTO log_action (name, hash, type, url_prefix) VALUES (?,CRC32(?),?,?) Parameters: array (
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 0 => '💕🦸 Pills without prescription on 🎁 www.NetDoctor.store 🎁 Buy Levitra from $0.84/pill 🦸💕Purchase Levitra Canada - Levitra 20 Mg Order Online',
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 1 => '💕🦸 Pills without prescription on 🎁 www.NetDoctor.store 🎁 Buy Levitra from $0.84/pill 🦸💕Purchase Levitra Canada - Levitra 20 Mg Order Online',
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 2 => 8,
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] 3 => NULL,
ERROR BulkTracking[2020-12-19 07:55:01 UTC] [a1651] )

These are the two log lines responsible:

www.example.org 83.171.252.205 - - [18/Dec/2020:09:49:03 -0500] "GET /?s=%F0%9F%92%95%F0%9F%A6%B8%20Pills%20without%20prescription%20on%20%F0%9F%8E%81%20www.NetDoctor.store%20%F0%9F%8E%81%20Buy%20Levitra%20from%20%240.84%2Fpill%20%F0%9F%A6%B8%F0%9F%92%95Purchase%20Levitra%20Canada%20-%20Levitra%2020%20Mg%20Order%20Online HTTP/1.1" 200 23328 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
www.example.org 83.171.252.205 - - [18/Dec/2020:09:49:04 -0500] "GET /?s=%F0%9F%92%95%F0%9F%A6%B8%20Pills%20without%20prescription%20on%20%F0%9F%8E%81%20www.NetDoctor.store%20%F0%9F%8E%81%20Buy%20Levitra%20from%20%240.84%2Fpill%20%F0%9F%A6%B8%F0%9F%92%95Purchase%20Levitra%20Canada%20-%20Levitra%2020%20Mg%20Order%20Online HTTP/1.1" 200 23328 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
@peterbo commented on December 20th 2020 Contributor

This is probably a duplicate of https://github.com/matomo-org/matomo-log-analytics/issues/293 - did you convert your tables to utf8mb4 already?

@orlitzky commented on December 20th 2020

No... should I have? How do I do it?

@peterbo commented on December 20th 2020 Contributor

@orlitzky - Go to Settings->System check->Database abilities. The system check will tell you if there's something to do and, if yes, how it's done.

@orlitzky commented on December 20th 2020

Thanks! I've got a huge database, but php ./console core:convert-to-utf8mb4 seems to be working. Christmas is coming, why not add that to core:upgrade? =)

@peterbo commented on December 20th 2020 Contributor

@orlitzky I'm not 100% sure, but converting the collation of the entire database should not be a forced process connected to the update flow. This should rather be a step after considering potential risks, free db disk space, possible problems with some MySQL versions, etc.

@orlitzky commented on December 20th 2020

Armchair quarterbacking: regardless of the risks, the log importer and the database have to agree on the character set used in the database. Otherwise, (at best!) the log importer simply wont work -- as in this bug report. Clearly the log importer is designed to feed "real" utf8 into the database... IMO the most sensible way to make them agree is to make the database support it as well.

(You could teach the log importer to convert to utf8mb3, and make everyone use that in their databases, but why?)

Put another way, the current upgrade process is...

  1. User runs the log import for a while
  2. A visitor tries to put some unicode garbage in a querystring
  3. The nightly log import fails
  4. The user comes here and reports a bug
  5. Someone says to upgrade the database to utf8mb4
  6. The user does it
  7. He reimports/archives his logs from the previous day

That's going to happen anyway, eventually, and the end result is the same. Why not eliminate all those steps?

@peterbo commented on December 20th 2020 Contributor

As far as I know, the conversion is also advertised in the updater somewhere, but is not a mandatory step to do. I'm sure, things will be optimized in the future, based on user feedback

@sgiehl commented on December 20th 2020 Member

As @peterbo has written. Doing that in a "normal" update is a bit too risky. Also it is not supported by all MySQL versions and depends on some InnoDB settings.
Note: For new installs UTF8mb4 should be automatically used if available.

@orlitzky What database charset did you have set in your config.ini.php before the utf8mb4 migration?
Note: If it is set to utf8 Matomo should automatically remove/replace unsupported chars before inserting anything in the database. When it's set to utf8mb4 that won't be done, as Matomo assumes the database is able to handle the chars.

@orlitzky commented on December 20th 2020

I don't have a charset specified in config.ini.php, but the global mariadb client/server defaults were utf8 until I started dealing with this yesterday.

@tsteur commented on December 20th 2020 Member

That's going to happen anyway, eventually, and the end result is the same. Why not eliminate all those steps?

Totally understand where you are coming from and agree. Just fyi wanted to add that most users don't run into this problem because they aren't tracking any special characters or emojis vs it could create problems during the upgrade when forcing utf8mb4.

@orlitzky commented on December 21st 2020

FWIW my conversion finally finished. I manually invalidated the reports for the 18th, re-archived, and everything is back to normal for now. Thanks again for the hint.

Powered by GitHub Issue Mirror