New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Matomo database use utf8mb4 collation by default #9785
Comments
👍 for this. Just switched to MySQL 5.7 and it works fine with the new utf8mb4. Also they plan to make this value the default in 5.8 maybe |
I stumbled upon a problem where custom value was showing �����. Looking through the possible causes I found out that Matomo changed only to utf8_unicode_ci. Can I just change the collation of database table to utf8mb4_unicode_ci? Will it cause any issues? |
This change require single file per table. Such a change is extreme problematic as many vservers have limis with open file handles and will run out of resources. I tried on my server and was made to revert... and my vserver has high limits... there are other hosters that have just 1/3 of mine. |
@alexhass Can you expand what the encoding/collation of a database table has to do with the amount of files needed? |
To be able to use utf8mb4 with php you have requirements. One is that you need to reconfigure mysql to use one file per table (innodb_file_per_table=true). This increases the files from one large db file to thousands. As mysql opens all files and keep them open you will run out of file handles and get an error „Too many open files“. Than mysql cannot start and you are offline. At https://www.drupal.org/project/utf8mb4_convert you can see what is required to make utf8mb4 working. I wish I could convert my drupal sites but I cannot because of thousands of open file handles. The hoster has limited to a number where one drupal install may be too much... note every single mysql table requires at least two file handles. |
Hi, Indeed on my MariaDB instance innodb_file_per_table seems to be enabled and indeed there are two files per table. Which brings me to the question of why the limits are so low on your server and why you can't increase them just high enough. |
I‘m not only running matomo on the server! Also keep in mind the operating system, dovecot, postfix, apache, every php file include and so on - all need additional file handles. Mine is based on Virtuozzo virtualization and has 8gb ram, 200gb ssd and a typical limit is 1400 file handles for a 8,85€/month server. That is where vservers start and i‘m not willing to spend more money per month. The file handle number cannot increased as the hoster force you to this limit. I asked them and was told to book a dedicated server for 40€/month. No no no... If I‘m not wrong 1blue.de has a limit of ~800 file handles for 10€/month servers. That is common at this pricing and my hoster limits are the highest in Germany for entry level servers. Larger vserver for 16€ have ~2300 file handles. One Drupal site has maybe 300-500 tables and I‘m running 7 of them. |
@tsteur @mattab switching to utf8mb4 or at least activating it by default for new installs shouldn't be a problem.
that would make it a lot more complicate, as we would need to check wich collation is actually in use before storing 4 byte chars and stuff like that. |
I just tried to implement that, but actually it won't be that easy as simply converting all tables to utf8mb4. We have some tables where a varchar is part of a index. With utf8mb4 each char will need 4 bytes, but the default maximum index length is 767 bytes. So a varchar(200) will already be to big for a index. So either we make those fields smaller or we do not use utf8mb4 charset when it's not needed 🤔 |
Not sure when we have an index on varchar. Would it help to index only first say 30 char of varchar? That might be good anyway to keep index smaller |
'site_url' => "CREATE TABLE {$prefixTables}site_url (
idsite INTEGER(10) UNSIGNED NOT NULL,
url VARCHAR(255) NOT NULL,
PRIMARY KEY(idsite, url)
) ENGINE=$engine DEFAULT CHARSET=$charset
",
'option' => "CREATE TABLE `{$prefixTables}option` (
option_name VARCHAR( 255 ) NOT NULL,
option_value LONGTEXT NOT NULL,
autoload TINYINT NOT NULL DEFAULT '1',
PRIMARY KEY ( option_name ),
INDEX autoload( autoload )
) ENGINE=$engine DEFAULT CHARSET=$charset
",
'session' => "CREATE TABLE {$prefixTables}session (
id VARCHAR( 255 ) NOT NULL,
modified INTEGER,
lifetime INTEGER,
data TEXT,
PRIMARY KEY ( id )
) ENGINE=$engine DEFAULT CHARSET=$charset
",
'archive_numeric' => "CREATE TABLE {$prefixTables}archive_numeric (
idarchive INTEGER UNSIGNED NOT NULL,
name VARCHAR(255) NOT NULL,
idsite INTEGER UNSIGNED NULL,
date1 DATE NULL,
date2 DATE NULL,
period TINYINT UNSIGNED NULL,
ts_archived DATETIME NULL,
value DOUBLE NULL,
PRIMARY KEY(idarchive, name),
INDEX index_idsite_dates_period(idsite, date1, date2, period, ts_archived),
INDEX index_period_archived(period, ts_archived)
) ENGINE=$engine DEFAULT CHARSET=$charset
",
'archive_blob' => "CREATE TABLE {$prefixTables}archive_blob (
idarchive INTEGER UNSIGNED NOT NULL,
name VARCHAR(255) NOT NULL,
idsite INTEGER UNSIGNED NULL,
date1 DATE NULL,
date2 DATE NULL,
period TINYINT UNSIGNED NULL,
ts_archived DATETIME NULL,
value MEDIUMBLOB NULL,
PRIMARY KEY(idarchive, name),
INDEX index_period_archived(period, ts_archived)
) ENGINE=$engine DEFAULT CHARSET=$charset
", |
I reckon it could be fine for these columns to limit to say 30 or 50 characters if that helps make it work? |
Don't know if it's possible to limit to 30-50 characters the INDEX because it's also used in PRIMARY KEY? it seems maybe we need to limit these fields to 191 chars in length? https://stackoverflow.com/questions/51934422/varchar-primary-key-cannot-be-longer-than-191 |
Limiting to 191 works only for keys that aren't combined with other fields. Otherwise we need to reduce it even more (by the amount of bytes the other field takes). |
I‘m strongly against mb4 format if not 100% needed. I cannot use matomo anymore than. My vserver (18€/month) has a max open files limit that disallows me to convert to mb4 at all. mysql server requires two files per table and this means the same file handles once mysql starts up. You can no longer use one large innodb file that contains all tables. I have a medium vserver, but 2500 file handles are used very fast and than the server stops. A root server is too expensive. |
Cheers @alexhass I had no idea hosters limit these things. Is that happening as soon as one table is using mb4 ? Or would it be fine if a few tracking tables (say 3-5 tables) use mb4? On the tables above we indeed wouldn't really need MB4 so be easiest to keep them the way they are. |
To run mb4 you need to enable innodb_file_per_table. The setting is global to mysql server and affects all tables. I‘m not aware that any other way exists. |
👍 so we would need to check such a flag and only use MB4 if possible. Would that work in your case? |
Users have been waiting for utf8mb4 support for many years. Especially those with non English websites. It will solve a lot of issues. |
Tsteuer: maybe. The general problem with mb4 stays as is. This is not about excluding a language. In general I would be 100% pro mb4, but the current limitations in mysql are really bad. I have no idea why file per table is required by design.... but as I know it is required and my server is not extremly limited! Just try not enforcing mb4 and leave an option to run without mb4, please. |
Maybe it is possible to detect file per table setting and use this setting to switch between utf8 and utf8mb4. |
https://docs.nextcloud.com/server/15/admin_manual/configuration_database/mysql_4byte_support.html |
In MySQL 8 utf8mb4 will be the standard collation. @alexhass is the utf8mb4 charset available on your system, or isn't it listed at all? E.g. does Imho we won't need utf8mb4 for all tables, it might be enough to convert the log tables... |
Yes it is available. Maybe you read my comments first. This is not about the mysql does not have the feature. It requires file per table and this causes the kernel to go out of file handles as there are limits. |
I have read them. Actually I would have expected that utf8mb4 is not available if it would require |
@alexhass I get your argument that enabling What I don't understand is why one would be limited in file handles. On my vServer (Hetzner 20€/Month, but the same is true for the 3€ per Month server) the limit is (/proc/sys/fs/file-max) 1.6M The linux kernel confirms this default of 1.6 million handles for 16GB
https://serverfault.com/questions/716578/default-value-of-proc-sys-fs-file-max/716591 Increasing the file handler limit by 10K would allow 3000 more MySQL tables which should be enough for Matomo for a while and would worst case (if all of them are used) take up an additional 10MB of memory. So unless you are running on a microcontroller with severe memory limitations file handlers aren't scarce. And it shouldn't matter to the Hoster what file limit I am using, if I want to spend a few MB more or less of my memory on it, which kernel I am using or even if I want to run OpenBSD or Haiku on my virtual server as thanks to KVM they just forward the low-level commands to the CPU. I don't think allowing hosters to save single digit megabytes of memory is worth denying people to store data in their native language. |
I‘m on server4you vserver and the limits are the highest vserver limits on the market. 1blu and others have a lot lower limits than server4you has |
Just checked a cheap t3a.micro AWS VS with 1GB RAM and default Ubuntu. You get 95.6K file handlers. |
I’m not paying Amazon for underpowered and extreme expensive hosting. Find the 16gb machine with 512gb ssd for 16€ first. |
When we will require MySQL 5.5.3 in Piwik, it would be desired to make our database use
utf8mb4
collation. This would solve some possibly weird issues and in general, is a best practise for properly managing unicode characters in MySQL.Background information utf8mb4
Notes
utf8mb4
if it's present (in case it could be disabled for some reasons)The text was updated successfully, but these errors were encountered: