Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert-to-utf8mb4 side effect: text column data types will be modfied #19115

Closed
MichaelRoosz opened this issue Apr 18, 2022 · 3 comments
Closed
Labels
c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.

Comments

@MichaelRoosz
Copy link
Contributor

MichaelRoosz commented Apr 18, 2022

After upgrading to utf8mb4 via
./console core:convert-to-utf8mb4
as documented here:
https://matomo.org/faq/how-to-update/how-to-convert-the-database-to-utf8mb4-charset/

All columns of all tables of type "text" (text, mediumtext, etc) have automatically been increased in size by the sql server (MariaDb 10.7).

For example, before:

CREATE TABLE matomo_session (
  id varchar(191) NOT NULL,
  modified int(11) DEFAULT NULL,
  lifetime int(11) DEFAULT NULL,
  data mediumtext DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

after:

CREATE TABLE matomo_session (
  id varchar(191) NOT NULL,
  modified int(11) DEFAULT NULL,
  lifetime int(11) DEFAULT NULL,
  data longtext DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

The logic behind this is documented here:
https://mariadb.com/kb/en/setting-character-sets-and-collations/#table-level

For example, an ascii TEXT column requires a single byte per character, so the column can hold up to 65,535 characters. If the column is converted to utf8, 3 bytes can be required for each character, so the column will be converted to MEDIUMTEXT to be able to hold the same number of characters.

As this might be unexpected, I think it would be great to mention this behavior here: https://matomo.org/faq/how-to-update/how-to-convert-the-database-to-utf8mb4-charset/

@peterhashair
Copy link
Contributor

@MichaelRoosz thanks for reporting this, will update the FAQ.

@peterhashair peterhashair added the c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. label Apr 19, 2022
@sgiehl sgiehl added the Needs priority decision This issue may need to be added to the current milestone by Product Manager label Apr 25, 2022
@justinvelluppillai justinvelluppillai removed the Needs priority decision This issue may need to be added to the current milestone by Product Manager label May 1, 2022
@danielmcclure
Copy link
Contributor

Hi @MichaelRoosz, Thanks again for reporting this. We've gone back and updated the FAQ you suggested with the following information:

Column size adjustment during utf8mb4 conversion
One final note is that when upgrading your database to support utf8mb4, be aware that columns storing text data (such as TEXT, MEDIUMTEXT, etc.) will automatically increase in size to ensure the upgraded column can store as many characters as the source column. For example a MEDIUMTEXT column will become a LONGTEXT column. You will need to account for this if you’re manually updating your database schema or if you have specific data storage limits to ensure a smooth upgrade.

Can you please let me know if this adequately addresses your concerns?

@MichaelRoosz
Copy link
Contributor Author

Hello @danielmcclure looks good to me, thank you 👍

@sgiehl sgiehl added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

No branches or pull requests

6 participants