@sgiehl opened this Pull Request on February 24th 2020 Member

fixes #9785

@sgiehl commented on March 24th 2020 Member

@tsteur @diosmosis could you maybe have a first look at the PR?
In general the changes should already work. But for sure will need some more excessive testing

What still needs to be clarified:

  • When updating to Matomo 4, or installing it newly then, it can still be the case that utf8 charset will be used, if utf8mb4 is not available. If that is the case, the update to utf8mb will be actually never tried again. Wondering if we should maybe try to do the conversion for every update automatically if utf8mb4 is not yet configured.

  • Using the database charset config to determine if the tables actually use utf8mb4 might not be very save. Someone could maybe change the charset even though the update wasn't performed. Not sure if we should handle that case and maybe store the used charset in the option table instead or maybe simply ignore that case, as it likely shouldn't happen often. Also saving the config in the update process might fail or not work on multi server env 🤔

  • I've changed the length of various fields, so the key length fits again. Imho the new length shouldn't make any problem, but be good if someone else have a thought about that as well...
@diosmosis commented on April 20th 2020 Member

Wondering if we should maybe try to do the conversion for every update automatically if utf8mb4 is not yet configured.

It would have to be a major update though, correct? Since it would update a bunch of columns (not sure if those affect log tables). If it's too intensive an operation, maybe it could be a command that users can run when they're ready.

Using the database charset config to determine if the tables actually use utf8mb4 might not be very save.

Would it help to check the database charset config, and check if a table has the right charset/column length to check if the update was run? This would still be a problem if some tables have different charsets, but that sounds like a real edge case.

I've changed the length of various fields, so the key length fits again. Imho the new length shouldn't make any problem, but be good if someone else have a thought about that as well...

Only one that looks like it could be an issue is the option_name change. There might be some dynamically created one that somehow gets too long. Maybe we can report when the option name is too long somehow?

@Findus23 commented on April 22nd 2020 Member

Just for reference, this is how Nextcloud handled the upgrade: (They did it in a console command)

https://docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html

@sgiehl commented on April 22nd 2020 Member

Sure. Providing a command to do the migration makes it possible for those people that have shell access. But those using a shared hoster without shell access, wouldn't have the possibility to upgrade later. But maybe we could force the same update in Matomo 5 (for those that haven't switched till then). So using utf8mb4 is kind of optional for Matomo 4, but a requirement for Matomo 5. Would that be reasonably @mattab @tsteur

@diosmosis commented on April 22nd 2020 Member

The mechanism to make the change could also be in the UI if needed, right? Doesn't have to be a command.

@sgiehl commented on April 29th 2020 Member

@diosmosis I've updated the PR.

So here a summary of the current behavior. Maybe @tsteur or @mattab have some thoughts/comments on that as well:

  • When updating to Matomo 4
    • all columns used in database indices that are too long will be adjusted in length
    • In case utf8mb4 is available all tables will be converted as well.
    • The used database charset in config will be automatically set to utf8 or utf8mb4 depending on the availability of utf8mb4 (when set to utf8mb4 we assume it's available and do not replace 4 byte chars in tracking requests)
  • If the conversion did not take place while updating
    • There is a new console command core:convert-to-utf8mb4, which will trigger converting the tables and updating the used charset in config
    • For people not being able to run a console command I would suggest to add a new FAQ explaining how to convert all tables manually and adjust the config afterwards
  • New installs will automatically use utf8mb4 and fallback to utf8 if not available
  • There is an (optional) diagnostic check that checks if utf8mb4 is available and in use. If not available and/or not in use it will show a warning and suggest to run the command if possible
  • ~There's (already) an update for Matomo 5.0.0-b1, which will force the conversion to utf8mb4 if it was not done before. I'd also suggest to create a follow up issue to remove utf8 support for Matomo 5 and add it as requirement.~
@mattab commented on April 29th 2020 Member

There's (already) an update for Matomo 5.0.0-b1, which will force the conversion to utf8mb4 if it was not done before.

just checking, we wouldn't force the conversion if it's not available though, to prevent errors/breaking instance?

@sgiehl commented on April 30th 2020 Member

just checking, we wouldn't force the conversion if it's not available though, to prevent errors/breaking instance?

I've added that for Matomo 5. Matomo 4 won't force anything. But I think until we release Matomo 5 there will be some time gone by and as utf8mb4 is already default collation for newer releases of MySQL I assume we could maybe require it for Matomo 5. But that is something we can see then

@tsteur commented on May 1st 2020 Member
  • Could we also maybe have a diagnostic check that shows if utf8 or utf8mb4 is used and if utf8 is used, point them to an FAQ on how to upgrade using the console command (yes if they are on a hoster they might not be able to execute it, we can explain this in FAQ)

  • Would it be useful for users to be able to specify an option to not execute the migration to utf8mb4? There might be some users for some reasons we're not aware yet where they maybe don't want it or where it cannot work? Could have a simple DB option like database[no_upgrade_utf8mb4=1]
@tsteur commented on May 1st 2020 Member

Would it be useful for users to be able to specify an option to not execute the migration to utf8mb4? There might be some users for some reasons we're not aware yet where they maybe don't want it or where it cannot work? Could have a simple DB option like database[no_upgrade_utf8mb4=1]

And we might also need to have the possibility to define some global variable or so to disable it. Thinking about WordPress... only realising now this seems to be using so far
image

Wonder if this already may or may not cause random issues and I'd need to probably be able to control whether Matomo uses utf8mb4 or not (depending on DB_CHARSET constant)
Wondering if we already need to do something there ASAP and whether it causes issue when someone uses some other charset (need to add this to Matomo for WP system report). We're using the WordPress DB connection so we cannot control this separately.

@Findus23 commented on May 2nd 2020 Member

I am not sure how many hosts are left without utf8mb4 support. It seems to have been added in 5.5.3 (released in 2010) , which means the last version without it is 5.1 which is end of life since 2013.

But there might still be reasons left for some people to not switch over.

@tsteur commented on May 3rd 2020 Member

I am not sure how many hosts are left without utf8mb4 support. It seems to have been added in 5.5.3 (released in 2010) , which means the last version without it is 5.1 which is end of life since 2013.

But there might still be reasons left for some people to not switch over.

I think in the issue itself there was some mention where it may not work due to limit of file handles. Also Matomo for WordPress would use some limitations as they might support utf8mb4, but use a different charset. Eg utf8.

If it's easy for us to not force people to use utf8mb4 that be great.

@sgiehl commented on May 4th 2020 Member

@tsteur I've applied some review feedback.

Could we also maybe have a diagnostic check that shows if utf8 or utf8mb4 is used and if utf8 is used, point them to an FAQ on how to upgrade using the console command (yes if they are on a hoster they might not be able to execute it, we can explain this in FAQ)

There is a diagnostic check. If utf8mb4 is available, but not used, it will show the command that can be used to migrate to utf8mb4. We can add a link to an FAQ there, once we have one.
If utf8mb4 is not available, it shows a message that unsupported characters are replaced in tracking requests.

Would it be useful for users to be able to specify an option to not execute the migration to utf8mb4? There might be some users for some reasons we're not aware yet where they maybe don't want it or where it cannot work? Could have a simple DB option like database[no_upgrade_utf8mb4=1]

We can also remove it from the update script completely and only update to utf8mb4 using the console command. Most users won't have any problems using utf8. And for those asking for emojis and stuff like that, we have a possibility to update using a console command. And we could even provide an FAQ how to do it manually.

@sgiehl commented on May 5th 2020 Member

@tsteur I've spent some more thoughts on that topic. Actually I would prefer to not do the utf8mb4 update automatically at all. I would let new installs use utf8mb4 if it is available and provide the command to convert everything to utf8mb4 (and a FAQ how to do manually). Not using utf8mb4 has only one effect at the moment. And that is the replacing of unsupported characters. As that only affects a minority I actually don't see much benefit in forcing most instances to convert their database to utf8mb4, which might take a while depending on the size.
That way there shouldn't be any problems with that while updating and no problems with wordpress. So we lower the risks but still provide people requiring utf8mb4 the possibility to use it.

Note: Currently all MySQL versions are handling utf8 as alias of utf8mb3. I would suggest to start a new discussion about automatically updating or forcing utf8mb4 as soon as MySQL changes the alias of utf8 to utf8mb4 (what they announced to do at some point.)

@tsteur commented on May 5th 2020 Member

@sgiehl that would be fine by me. As the update scripts can take quite a while it would also allow high traffic instances to plan better for it (also us on the cloud where we wouldn't want to do this during a regular update). We could show in the diagnostic check that they can upgrade to utf8mb4 if their DB supports it etc.

Wonder if we'd also offer a one click button to perform the update for users who can't run commands or do it manually and they have not much data so it goes quick? Of course there's a risk it'll be executed during a scheduled task in a tracking request etc and things may time out or something ... but suppose that be also a risk regularly. Or maybe we offer a button for users where we know we can launch a cliMulti request in the background ($cliMulti->supportsAsync == true)

@sgiehl commented on May 6th 2020 Member

@tsteur started to change the implementation. The conversion will no longer be done during the update. There will now be an additional setting (under update settings), so trigger the conversion to utf8mb4. The conversion itself will then be done as scheduled task once. Maybe you could have a quick look if it is fine that way. Would adjust / improve the code then tomorrow. The text messages displayed anywhere will need to improved as well, so it's clear it's possible to use the setting or the command...

@tsteur commented on May 6th 2020 Member

image

Could maybe also say here alternatively the upgrade can be triggered through the UI in the background... and could also

image

could we mention here that performing this upgrade can take very long (up to many hours) and cause downtime etc? Do we even need to link to an FAQ or explain everything in there? Should we maybe mention for high traffic sites we recommend disabling the tracker while performing this update? And recommend to not use the UI setting but instead use the command?

In both places in diagnostic checks and the UI setting be great to also explain what they gain by performing this upgrade since most users won't know what UTF8MB4 is, whether it represents an issue for them, whether they should do it or not etc.

Would have here https://github.com/matomo-org/matomo/pull/15618/commits/4f8abcdd1f1e16bfc7a340cd76455939d2afe400#diff-33d27fe974b95654859eba7890184c33R46 maybe expected to see utf8mb4 by default and then have an update that sets it back to utf8? Or how do we make sure that new installs use utf8mb4 when possbile?

@sgiehl commented on May 7th 2020 Member

@tsteur will improve the displayed messages now.

Would have here 4f8abcd#diff-33d27fe974b95654859eba7890184c33R46 maybe expected to see utf8mb4 by default and then have an update that sets it back to utf8? Or how do we make sure that new installs use utf8mb4 when possbile?

We actually can't set utf8mb4 as default in the global config. Someone doing an update might otherwise end up in a broken Matomo maybe. Let's say someone's database does not support utf8mb4 and he triggers an update to Matomo 4. If he doesn't have set a charset in his local config utf8mb4 would now be assumend until the update script sets the local config. But before that some other updates might run trying to create tables with utf8mb4 or even the database connection itself might fail as the charset set for the connection doesn't exist.
Nevertheless for new install utf8mb4 should be used, as it sets the database charset in local config when creating the config file here:
https://github.com/matomo-org/matomo/blob/6a17b9cfb7475d307fb45e4df6c1119c3a583640/plugins/Installation/Controller.php#L602
But actually I just realized that the database might be created before the config file is written. So will change the database creation to always use the detected charset. Guess that shouldn't be a problem for wordpress, as the database always exists there, right?

Note: people manually creating the config file or using some other process to install Matomo might end up still using utf8, but that's nothing we can circumvent I guess.

@sgiehl commented on May 7th 2020 Member

@tsteur There's something else we might need to find a solution for. The conversion currently converts all tables returned by DbHelper::getTablesInstalled(). That does only include the tables defined in the Schema and all archive tables. So the conversion won't target any tables additionally created by any plugin. So all those tables would possibly remain with utf8. Wondering if it would make sense to add a new hook into the getTablesInstalled method, so plugin can kind of register there tables 🤔

@tsteur commented on May 7th 2020 Member

But actually I just realized that the database might be created before the config file is written. So will change the database creation to always use the detected charset. Guess that shouldn't be a problem for wordpress, as the database always exists there, right?

I guess DB creation wouldn't affect it WP as it is indeed already created. Also in general I think it wouldn't have a big effect as long as we always set encoding when creating a table.

Wondering if it would make sense to add a new hook into the getTablesInstalled method, so plugin can kind of register there tables 🤔

image

Could maybe change the intersect to array_unique(array_merge($allMyTables, $allTables))? I reckon that would be fine... I guess it be an issue though when no table prefix is used... so might not work and a hook be needed. Guess sounds good to add the hook. Bit annoying as it's unexpected for developers but if they missed implementing it, it likely wouldn't cause an issue. Also maybe if a table prefix is defined, we could maybe always add all tables that were found? Maybe not though as it could lead to hard to find/debug issues...

@sgiehl commented on May 8th 2020 Member

@tsteur I've add a new event. Nice side effect: the DBStats plugin will finally show really all tables, and not only those created by the core schema 🙈
Will now create PRs for all our plugins to implement that event...

@sgiehl commented on May 8th 2020 Member

@tsteur update the PR with the new event and created PRs for (most of) our plugins. If the event is fine that way, feel free to quickly check the changes in the linked PRs and merge them, so we can update the submodules here again.

@sgiehl commented on May 13th 2020 Member

@tsteur applied some feedback and adjusted the command so it is able to disable tracking and only show all required commands...

@sgiehl commented on May 14th 2020 Member

@tsteur I've merge all related PRs already, but had to change something here in order to get the tests running. See https://github.com/matomo-org/matomo/pull/15618/commits/7b06248635e3f495f52f358d14af1d94b5d2adff
Maybe you can have a quick look at that commit and merge the PR if it's fine...

@tsteur commented on May 15th 2020 Member

@sgiehl not sure...
in https://github.com/matomo-org/matomo/commit/7b06248635e3f495f52f358d14af1d94b5d2adff#diff-6bfd1a5b94264d4d0e0c03e1ebd70249R446 should it be maybe getPluginsLoadedAndActivated instead of only getting loaded plugins? Do you know why loadActivatedPlugins didn't work? Bit scared we're loading more plugins than we should.

BTW another option could be when posting the event to try

Piwik::postEvent('Db.getTablesInstalled', [&$allMyTables], false, (Manager::getAllPluginsNames()); // and if this doesn't work could try the array_keys($loadedPlugins)
@sgiehl commented on May 15th 2020 Member

@tsteur actually it should load the plugins that were loaded before. But I can try to revert that to loadActivatedPlugins. Guess the if statement around was more important, as otherwise isInstalled fired the event. will check that

btw. Passing the plugins to the postEvent won't work, as the method only checks loaded plugins.

@sgiehl commented on May 15th 2020 Member

@tsteur using loadActivatedPlugins works as well now. Changed it again. So feel free to merge if there isn't anything else

This Pull Request was closed on May 17th 2020
Powered by GitHub Issue Mirror