New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On databases that have a default charset that is not utf-8, data can get corrupted #6497
Comments
@ataraxie thanks for the report. Because it's likely that this is a bug in PHP, could you please try upgrade php to latest 5.5.18 ? if you still experience the problem with this version we would further investigate. cheers |
This is not a bug in PHP I think. I had once access to another Piwik instance with the exact same issue. In the end the values were stored "broken" in the database or so. It was not UTF8. I couldn't find the actual problem but also didn't invest so much time. |
If the problem was w/ the DB, then format=XML would fail, no? |
json_encode couldn't handle those values and failed |
That's right. I tried with PHP 5.5.18 and the problem persists. Also: when I add use umlauts in the Piwik admin interface for a website name, it's ok. My sites were added through the API. When I check in the database, then the encoding of the sites added through the admin interface is different. So I add a Piwik site in the admin interface with the name "äää". I do a SELECT name FROM piwik_site WHERE idsite = 1; The result is:
When I add the site with an SQL statement or through the Piwik API, the result of the above statement will be:
The second case causes the error and I assume (as @tsteur stated above) json_encode fails (I'm wondering why, if so, there's nothing written in the PHP error log). So, I guess this is clearly an database encoding issue. I created the Piwik database on my server with the default CREATE DATABASE statement as adviced in the Piwik docs. However, if I run this: SELECT * FROM information_schema.SCHEMATA The result is that my Piwik database has latin1 as DEFAULT_CHARACTER_SET_NAME and latin1_swedish_ci as DEFAULT_COLLATION_NAME. Hmmmmmmm..... |
Ok, I finally got something in the log files:
|
Can you please paste here exact API URL used to created the website that cause the bug? I will try to re-create the issue locally. @ataraxie |
I am pretty sure you won't be able to reproduce it. In the other instance where I experienced this bug it was also not possible to reproduce it on my instance. And it even failed to store the correct characters when adding a new site via SiteManager UI |
Thank your for further investigation. In my script, I'm calling this URL: In addition to the administrator panel, the issue also occurs when I go to "All Websites" and then apply a date range. I'm also going to do some further research in what data causes the json_encode call to fail. I'll let you know. I don't think the issue is so hard to reproduce:
|
Ok, some more information after experimenting in the source... The interesting line where everything happens is piwik\core\DataTable\Renderer\Json.php line 74. Suppose I set the sitename of site 1 using MySQL workbench as explained above to "äää". If I do a few log/var_dump statements in the PHP file (suppose I have a
So if the json_encode function returns jsonbool(false) on the array, there will be no data in the UI. The interesting thing is that if I do a
I think the general question is: is my database encoding simply incorrect (latin1) and the requirement for Piwik to work is a UTF-8 encoded database? Note that MySQL distributions have usually I don't really know a good solution here at the moment. In our case, we'll probably need to patch the Json.php file temporarily with a utf8_encode on all sitenames in the callback function. Or something like that... EDIT
I don't really know what side-effects this has, especially on performance. The request really really long. But I had this problem without the encoding issues as well. I guess this is the same as the problem here: |
@ataraxie thanks for the details. I'd like to ask you to test something. In your
Then try again to add a new website with utf characters. Are the names recorded valid this time? is the bug still the same for data added to your piwik after changing the config file? |
@mattab I cannot believe it - this is it! What I did is:
I also added a new website with name "äää" like I did before using the API and it works too! So I really think this solves the issue. Thank you!! Besides that, I now still have the problem that the load process for the websites takes really long (~ 1 min) and I get "long running script" errors in browsers. Maybe you could also give me some directions how to proceed here? My archive script is running every 40 minutes without errors. |
please setup: http://piwik.org/docs/setup-auto-archiving/ regarding the initial problem, I'm wondering if we could detect that the Mysql does not run under proper unicode and automatically set the setting when it's the case? |
I found your solution here in GitHub for this kind of Problem: adding the line: charset = utf8 in the database section of config.ini.php. I did this as well and the websites all reappeared in the settings. But when I now look at my data the page titels are not displayed correct and look like this: Broschüren --> should be: Broschüren The database is utf-8 and the pages are all utf-8 as well. What can I do? Best regards Christoph |
No its gone. Maybe it was a caching thing. |
Again today on the forums someone reported that adding charset=utf8 solved a problem for them. Maybe we should investigate this and really understand the problem. Or maybe we should simply add charset=utf8 to the config by default, to prevent such issue all together? |
The goal of this issue is to estimate whether there is any performance overhead when we add (because we told users of the workaround |
It takes on my instance about 60 microsecs with Mysqli and 100 microsecs with PDO so should not cause any problems. Not sure how it behaves under load but don't think that it causes any problems. Maybe the I presume second version (dsn + init command) should be even faster |
btw reading your comment i remembered we wanted to consider: Unify Tracker/Db and Db into one? #7727 |
Yes, I have had bad experiences in the past in OSIpage app where users would post lots of URLs from different languages. We had some key tables where URLs were stored, were using default latin1 charset which seemed to garble URLs of foreign languages. The fix had been quite simple. We need to make sure that,
In other words, we could use this mysql command to change a table's charset one at a time.
After applying these changes in OSIpage app, we didn't ever come across URL garble issues again. I hope this solves the issue. |
@tsteur Awesome! moving into 2.15.0 |
In #6497 we set the charset in the config to utf8 by default. This caused that latin1 was not set anymore and causes failures on MySQL 5.7. By forcing charset latin1 the load data infile of archives works again.
I found your solution here in GitHub for this kind of Problem: adding the line: charset = utf8 in the database section of config.ini.php. I did this as well and the websites all reappeared in the settings. But when I now look at my data the page titels are not displayed correct and look like this: Broschüren --> should be: Broschüren The database is utf-8 and the pages are all utf-8 as well. What can I do? Best regards Christoph
|
Did you find any solution for it? |
This is still an issue. |
I'm not sure if it's helpful, but this is the FAQ we have that explains how to convert a Matomo DB to utf8mb4 charset, which is what we've been using by default https://matomo.org/faq/how-to-update/how-to-convert-the-database-to-utf8mb4-charset/ |
Matt suggested me to create a bug report for this problem which was discussed in detail in this forum thread:
http://forum.piwik.org/read.php?2,120225,120539
Scenario
As Piwik admin user, navigate to Settings => Websites. I always get "You currently have access to 0 website". This happens when I am superuser as well as when I have dedicated "admin" permission to websites.
Problem source
Environment
Since Piwik allows umlauts for site name, I consider this a Piwik bug. We have an enterprise customer installation of Piwik running with a database of ~ 40 GB. Our customer is very upset with this at the moment, since, with regard to the time I needed to figure out the problem source (see forum thread), this issue has kept us from updating from 1.12 to 2.x for several months. So I kindly ask you to consider this a major issue. Also, as the forum thread shows, I'm not the only one affected.
The text was updated successfully, but these errors were encountered: