Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use utf8mb4 character set if possible #15618

Merged
merged 41 commits into from May 17, 2020
Merged

Use utf8mb4 character set if possible #15618

merged 41 commits into from May 17, 2020

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented Feb 24, 2020

fixes #9785

@sgiehl sgiehl added the Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. label Feb 24, 2020
@sgiehl sgiehl added this to the 4.0.0 milestone Feb 24, 2020
@sgiehl sgiehl force-pushed the utf8mb4 branch 2 times, most recently from 99db12f to c34c12e Compare March 2, 2020 09:08
@sgiehl sgiehl force-pushed the utf8mb4 branch 2 times, most recently from 9917e48 to 444236b Compare March 2, 2020 15:50
@sgiehl sgiehl force-pushed the utf8mb4 branch 4 times, most recently from 0d6f3cc to 82aab9d Compare March 24, 2020 12:19
@sgiehl
Copy link
Member Author

sgiehl commented Mar 24, 2020

@tsteur @diosmosis could you maybe have a first look at the PR?
In general the changes should already work. But for sure will need some more excessive testing

What still needs to be clarified:

  • When updating to Matomo 4, or installing it newly then, it can still be the case that utf8 charset will be used, if utf8mb4 is not available. If that is the case, the update to utf8mb will be actually never tried again. Wondering if we should maybe try to do the conversion for every update automatically if utf8mb4 is not yet configured.

  • Using the database charset config to determine if the tables actually use utf8mb4 might not be very save. Someone could maybe change the charset even though the update wasn't performed. Not sure if we should handle that case and maybe store the used charset in the option table instead or maybe simply ignore that case, as it likely shouldn't happen often. Also saving the config in the update process might fail or not work on multi server env 🤔

  • I've changed the length of various fields, so the key length fits again. Imho the new length shouldn't make any problem, but be good if someone else have a thought about that as well...

@sgiehl sgiehl added Needs Review PRs that need a code review and removed Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. labels Apr 1, 2020
@sgiehl sgiehl marked this pull request as ready for review April 20, 2020 08:54

if (empty($result) || $result['Value'] !== 'ON') {
return 'utf8'; // innodb_file_per_table is required for utf8mb4
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cache the result of this function? And haven't been following this issue, but do we need to do the innodb_file_per_table check if character_set_database is set to utf8mb4, or can we just assume it will work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we can set Config::getInstance()->database['charset'] if it's not set to the default...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I see this is already done during installation/update, so maybe that's enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering the same if we need to check for SHOW VARIABLES LIKE 'character_set_database' after innodb_file_per_table ?

@diosmosis
Copy link
Member

Wondering if we should maybe try to do the conversion for every update automatically if utf8mb4 is not yet configured.

It would have to be a major update though, correct? Since it would update a bunch of columns (not sure if those affect log tables). If it's too intensive an operation, maybe it could be a command that users can run when they're ready.

Using the database charset config to determine if the tables actually use utf8mb4 might not be very save.

Would it help to check the database charset config, and check if a table has the right charset/column length to check if the update was run? This would still be a problem if some tables have different charsets, but that sounds like a real edge case.

I've changed the length of various fields, so the key length fits again. Imho the new length shouldn't make any problem, but be good if someone else have a thought about that as well...

Only one that looks like it could be an issue is the option_name change. There might be some dynamically created one that somehow gets too long. Maybe we can report when the option name is too long somehow?

@Findus23
Copy link
Member

Just for reference, this is how Nextcloud handled the upgrade: (They did it in a console command)

https://docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html

@sgiehl
Copy link
Member Author

sgiehl commented Apr 22, 2020

Sure. Providing a command to do the migration makes it possible for those people that have shell access. But those using a shared hoster without shell access, wouldn't have the possibility to upgrade later. But maybe we could force the same update in Matomo 5 (for those that haven't switched till then). So using utf8mb4 is kind of optional for Matomo 4, but a requirement for Matomo 5. Would that be reasonably @mattab @tsteur

@diosmosis
Copy link
Member

The mechanism to make the change could also be in the UI if needed, right? Doesn't have to be a command.

@sgiehl
Copy link
Member Author

sgiehl commented Apr 29, 2020

@diosmosis I've updated the PR.

So here a summary of the current behavior. Maybe @tsteur or @mattab have some thoughts/comments on that as well:

  • When updating to Matomo 4
    • all columns used in database indices that are too long will be adjusted in length
    • In case utf8mb4 is available all tables will be converted as well.
    • The used database charset in config will be automatically set to utf8 or utf8mb4 depending on the availability of utf8mb4 (when set to utf8mb4 we assume it's available and do not replace 4 byte chars in tracking requests)
  • If the conversion did not take place while updating
    • There is a new console command core:convert-to-utf8mb4, which will trigger converting the tables and updating the used charset in config
    • For people not being able to run a console command I would suggest to add a new FAQ explaining how to convert all tables manually and adjust the config afterwards
  • New installs will automatically use utf8mb4 and fallback to utf8 if not available
  • There is an (optional) diagnostic check that checks if utf8mb4 is available and in use. If not available and/or not in use it will show a warning and suggest to run the command if possible
  • There's (already) an update for Matomo 5.0.0-b1, which will force the conversion to utf8mb4 if it was not done before. I'd also suggest to create a follow up issue to remove utf8 support for Matomo 5 and add it as requirement.

@mattab
Copy link
Member

mattab commented Apr 29, 2020

There's (already) an update for Matomo 5.0.0-b1, which will force the conversion to utf8mb4 if it was not done before.

just checking, we wouldn't force the conversion if it's not available though, to prevent errors/breaking instance?

@mattab mattab requested a review from tsteur April 29, 2020 20:41
Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgiehl there's still a test failing in https://travis-ci.org/github/matomo-org/matomo/jobs/686048413#L668

Would expect that only tables starting with archive_ are returned.

Otherwise looked good. The only thing I would maybe do is when executing the command instead of showing the table names
image

Maybe we could show all the SQL queries we will be executing in case someone wants to execute these manually? And then show at the end a command ./console config:set --section=database --key=charset --value=utf8mb4. I reckon some bigger installs like us might prefer to execute these queries manually as it can take a long time. Eg you want to update one log table over night, then another table another night when there is low traffic etc. (as you have to pause tracking during this time likely). There could also be a question whether they want the SQL plus config set instructions to be printed in order to execute it manually or whether they want it to be executed automatically?

This also makes me think if there should be an option to automatically disable tracking while the log tables are being migrated. Not sure but this might be even a good idea to do by default? If someone has low traffic, then tracking be down only for short time anyway. If someone has a a big DB, then they might get in trouble when keeping tracking enabled as I'd expect the inserts to be not processed but haven't tested this. It seems converting charset has no online DDL supporthttps://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl-operations.html#online-ddl-table-operations so would expect it might be best to disable tracking during the migration of log tables. fyi @mattab

core/DbHelper.php Outdated Show resolved Hide resolved
@sgiehl
Copy link
Member Author

sgiehl commented May 13, 2020

@tsteur applied some feedback and adjusted the command so it is able to disable tracking and only show all required commands...


$isWritable = Piwik::hasUserSuperUserAccess() && CoreAdminController::isGeneralSettingsAdminEnabled();
$dbSettings = new Settings();
if ($dbSettings->getUsedCharset() !== 'utf8mb4' && DbHelper::getDefaultCharset() === 'utf8mb4') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fyi was worried this might issue DB requests during tracking but double checked looks like these system settings aren't used during tracking 👍 Might be good though to change the if to if ($isWritable && $dbSettings->getUsedCharset() ...)? This way eg on cloud we avoid these queries there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be last thing. feel free to merge if/once tests pass

@sgiehl sgiehl force-pushed the utf8mb4 branch 3 times, most recently from 4a689bb to 27d4c13 Compare May 14, 2020 14:02
@sgiehl
Copy link
Member Author

sgiehl commented May 14, 2020

@tsteur I've merge all related PRs already, but had to change something here in order to get the tests running. See 7b06248
Maybe you can have a quick look at that commit and merge the PR if it's fine...

@tsteur
Copy link
Member

tsteur commented May 15, 2020

@sgiehl not sure...
in 7b06248#diff-6bfd1a5b94264d4d0e0c03e1ebd70249R446 should it be maybe getPluginsLoadedAndActivated instead of only getting loaded plugins? Do you know why loadActivatedPlugins didn't work? Bit scared we're loading more plugins than we should.

BTW another option could be when posting the event to try

Piwik::postEvent('Db.getTablesInstalled', [&$allMyTables], false, (Manager::getAllPluginsNames()); // and if this doesn't work could try the array_keys($loadedPlugins)

@sgiehl
Copy link
Member Author

sgiehl commented May 15, 2020

@tsteur actually it should load the plugins that were loaded before. But I can try to revert that to loadActivatedPlugins. Guess the if statement around was more important, as otherwise isInstalled fired the event. will check that

btw. Passing the plugins to the postEvent won't work, as the method only checks loaded plugins.

@sgiehl
Copy link
Member Author

sgiehl commented May 15, 2020

@tsteur using loadActivatedPlugins works as well now. Changed it again. So feel free to merge if there isn't anything else

@tsteur tsteur merged commit b6ace0e into 4.x-dev May 17, 2020
@tsteur tsteur deleted the utf8mb4 branch May 17, 2020 20:39
@mattab mattab added Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. labels Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
5 participants