New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in connection character encoding #984
Comments
What version of MySQL? (There was a problem reported in the forum for MySQL 4.1.7-nt.) In core/Piwik.php's createDatabaseObject() function, please try one of these:
or:
|
And of course, make a similar change in core/Tracker/Db.php's connect() function. |
I tried on Mysql 5.0.33 In core/Tracker/Db.php's connect() function, I've tried:
But It does not works. But if I test:
Then it works. |
Thanks. |
In [1473]: fixes #904 - MySQL error codes; unsupported adapters can map these to driver-specific SQLSTATE Fixed tracker profiling data not recorded until after report generated. More refactoring and database abstraction:
Note: unit tests don't run with MYSQLI -- Zend Framework's Mysqli adapater doesn't support prepare() yet |
It is a bit annoying that we have to do this. I am not entirely sure of the performance overhead of running this query for each page view, but this might be costly. Is this really the only way to properly record the data in the tables? It is unclear to me why when the table field is set to UTF-8, with UTF-8 data coming in, the data would not be properly recorded? |
From my testing and reading the MySQL documentation, it looks like the storage and connection charsets settings are independent of each other. From http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html (emphasis added): |
raziel057: in your mysql client, can you run show variables, and tell us what the values of these are:
Thanks. |
Yes storage and connection chartsets settings are independent. Here my settings: character_set_client : utf8 collation_connection : utf8_unicode_ci |
Why is it necessary to set the connection char set if everything appears to be defaulting to utf8? Are these your settings before the code change? |
Yes I had the same settings before the code change. I don't know really why we must force the connection charset at connection but I noticed there is problems if it's not case... Can you test? |
My server defaults to latin1. If I set the connection to latin1 (or let it default to this), and insert into a utf8 table something like '', it gets stored as (I using the hex() function in my select): | C3A5C5A0C2A0C3A8C2BDC2BDC3A6E280A2C2B0C3A6C28DC2AEC3A4C2B8C2AD | This isn't utf8 but a select gives back the expected result, so there's conversion magic at work. If I set the connection to utf8, the same string is stored as: | E58AA0E8BDBDE695B0E68DAEE4B8AD | This is utf8 and select returns the original string. However, the previoulsly stored string is now munged in the select result. So, my opinion is that the SET NAMES fix follows the MySQL documentation. The storage is more compact and from an ongoing support perspective, it means consistent behaviour for all users. However, while it's ok for a new database, it's a potential problem for existing users and their datasets as there seems to be no easy way to fix the stored data. Given http://bugs.mysql.com/bug.php?id=19292, there may be some library/driver-level issue at play here. raziel057: Please make a copy of index.php and test this mod to see if there's a difference between the mysql client and the php extension:
|
I executed this code:
As result:
|
Thanks raziel05. That confirms that the php extensions are not negotiating the charset for the connection; they're using the charset of the server build (ignoring my.cnf). I'm going to make the following change:
Existing users would not be (immediately) affected. latin1 users would still have inefficient storage of non-ASCII text. Existing users could add this line to their config manually, or rebuild mysql with utf8 as the default charset. |
apang (not vipsoft?), I am not sure this is the best solution - if the server configuration changes in between the time Piwik was installed and today, data would not be recorded the same. We should aim to have consistent behavior independently of the server configuration. Vipsoft, your comment seems to show that independantly of the connexion character set, data is stored and retrieved without error. raziel057 says he notices errors if the connexion character set is not set - were you able to reproduce his problem? |
Hmm. I think I auto-logged in under an old user Id...I don't use that computer often except for IIS testing. Re: proposed change. It would only fail if a server build went from utf8 default to non-utf8. Re: mismatched charsets. MySQL appears to handle it in my limited testing and charset combination. Re: raziel057's test. I can reproduce this but I'm not sure if there are cases where a conversion fails, thus resulting in lost pageviews. Let's defer this to 0.5. I'll revert or comment out the changes. To investigate:
|
In [1484], refs #984 - adapters no longer default to utf8 connection add utf8 connection charset test -- the warning is currently suppressed pending further investigation (previously noted in this ticket) in the interim, users can manually add 'charset = utf8' to their config.ini.php to get the desired behaviour fix bug where skipping step in installation munged session var and a browser restart was necessary fill in default port from adapter restore emulated prepares for pdo_mysql to facilitate use of query cache |
my testing, after applying patch in #708, showed that calling setDocumentTitle with accents resulted in the accents being displayed properly in the Page reports. Anthon, my quick read of the commits show that no extra SQL query is made in the Tracker - can you please confirm? Closing for now. Please reopen if there were any extra logic introduced because of this ticket that probably shouldn't be in the codebase. |
Summary:
ToDo:
|
In [1595], add compat function if mysqli_set_charset doesn't exist |
Observations:
|
Note: I ran an exhaustive test (per comment:17) -- U+0000 thru U+10FFFF - and MySQL store/fetch ran without error. |
Varchar columns are set in utf8_general_ci in piwik database when installing. I notice there is a problem when I insert accented characters. For example if you try to customize the page name displayed in JavaScript tracking, for example:
In piwik_log_action table, the record is saved as:
Home/Tlcom/Professionnels/INTEGRAL Pro/Avantages Mobile
It seems the problem comes from the fact the connection isn't set as utf8. PDO doesn't care about charsets. So If you want to have your connection in unicode / utf-8 or any other encoding, you'll have to tell your database, for example using $dbh->exec('SET CHARACTER SET utf8') (mysql).
So I tested to add the following instruction on DB connect, just after establishing a new connection:
The text was updated successfully, but these errors were encountered: