Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong database values when float representation is not set to English standard #6435

Closed
cauboy opened this issue Oct 13, 2014 · 22 comments
Closed
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@cauboy
Copy link

cauboy commented Oct 13, 2014

Problem:

The conversion rates in the Goal Plugin are floats but always integer valued (1,0%, 2,0%, …) on a server located in Germany.

Reason:

In PHP the automatic float-to-string conversion is based on the value of setlocale(LC_NUMERIC). In English speaking countries a dot is used as the decimal separator while in some countries, e.g. Germany, a comma is used.
Example, where this issue is critical:

setlocale(LC_NUMERIC, "en_US");
$query = "INSERT INTO `values` SET `value` = $PI, `key` = 'PI'";
echo $query;
// output: INSERT INTO `values` SET `value` = 3.1415926535898, `key` = 'PI'
setlocale(LC_NUMERIC, "de_DE");
$query = "INSERT INTO `values` SET `value` = $PI, `key` = 'PI'";
// output: INSERT INTO `values` SET `value` = 3,1415926535898, `key` = 'PI'

This is harmful as the SQL query becomes invalid in the latter example.

Fix

Probably not a good solution, but for me the following change in piwik/core/Db.php solved the problem:

    public static function query($sql, $parameters = array())
    {
        setlocale(LC_NUMERIC, 'en_US'); // this line is a quick fix for this issue
        try {
            self::logSql(__FUNCTION__, $sql, $parameters);

            return self::get()->query($sql, $parameters);
        } catch (Exception $ex) {
            self::logExtraInfoIfDeadlock($ex);
            throw $ex;
        }
    }

Here are two posts about the setlocale issue:
http://mark-story.com/posts/view/php-floats-localization-and-landmines
http://blog.azure.km.ua/2012/12/php-typecasting-float-to-string-trap.html

@mattab
Copy link
Member

mattab commented Nov 3, 2014

Thanks for the bug report!

until we properly support locale number representations in #4114 we should probably add setlocale(LC_NUMERIC, 'en_US'); to the core/frontcontroller.php init() function

could you confirm it works for you to add the code in https://github.com/piwik/piwik/blob/master/core/FrontController.php#L311-310 ?

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Nov 3, 2014
@mattab mattab added this to the Piwik 2.9.0 milestone Nov 3, 2014
@cauboy
Copy link
Author

cauboy commented Nov 3, 2014

Doesn't seem to work there. I updated PIWIK to v2.8.3 and cleared the numeric values via DROP TABLE archive_numeric_2014_11. Then I put setlocale(LC_NUMERIC, 'en_US'); just after the try{ in line 311. I am a bit wondering why my Piwik installation does have the try part and the code behind your link doesn't… Anyway, shouldn't make a difference. Here's the code fragment:

 public function init()
    {
        static $initialized = false;
        if ($initialized) {
            return;
        }
        $initialized = true;

        try {

            // make sure decimal points are used for string representation
            setlocale(LC_NUMERIC, 'en_US');

            Registry::set('timer', new Timer);

            $directoriesToCheck = array(
                '/tmp/',
                '/tmp/assets/',
                '/tmp/cache/',
                '/tmp/logs/',
                '/tmp/tcpdf/',
                '/tmp/templates_c/',
            );
            …

Not sure why that didn't work our. So I tested to put setlocale(LC_NUMERIC, 'en_US'); in core/Db.php and everything works fine again…

@tsteur
Copy link
Member

tsteur commented Nov 4, 2014

How did you start the archiving? Probably it was overwritten by https://github.com/piwik/piwik/blob/master/core/Translate.php#L222-L223 . Can you try to add it to FrontController and comment out the two lines in Translate.php temporarily?

@cauboy
Copy link
Author

cauboy commented Nov 4, 2014

For archiving I just opened …/piwik/index.php?module=CoreHome&action=index&idSite=1&period=day&date=today#/module=Goals&action=ecommerceReport&idSite=1&period=day&date=today&idGoal=ecommerceOrder

So It was indeed overwritten. Here's what my working code is looking like:

core/Translate.php (line 220 - 226)

 private static function setLocale()
    {
        $locale = $GLOBALS['Piwik_translations']['General']['Locale'];
        // $locale_variant = str_replace('UTF-8', 'UTF8', $locale);
        // setlocale(LC_ALL, $locale, $locale_variant);
        setlocale(LC_CTYPE, '');
    }

and

core/FrontController.php (Line 303 - 316)

    public function init()
    {
        static $initialized = false;
        if ($initialized) {
            return;
        }
        $initialized = true;

        try {

            // make sure decimal points are used in string representation
            setlocale(LC_NUMERIC, 'en_US');

            Registry::set('timer', new Timer);

@tsteur
Copy link
Member

tsteur commented Nov 5, 2014

Maybe we should use something like numer_format() http://us2.php.net/manual/en/function.number-format.php before writing the value to the database? Don't think there are so many float columns so might be doable?

@mattab
Copy link
Member

mattab commented Nov 5, 2014

👍 for number_format the value before writing in archive_numeric_*

@tsteur
Copy link
Member

tsteur commented Nov 5, 2014

What about goals / conversion table etc?

@mattab
Copy link
Member

mattab commented Nov 5, 2014

+1 forgot that are other FLOAT columns that this should apply also besides this DOUBLE

@tsteur
Copy link
Member

tsteur commented Nov 5, 2014

Alright. Figured out we cannot use number_format as we cannot tell it to keep the same precision. We would either round or add more precision. So looking for something else.

@tsteur
Copy link
Member

tsteur commented Nov 5, 2014

From what I have seen so far the best solution seems to be a str_replace(',', '.', $float). The code for this will be in one method so we will be able to change it.

@tsteur
Copy link
Member

tsteur commented Nov 5, 2014

The tests seem to still pass but the more I think about it the more it is a hack and it will most likely fail again at some point. Also it is very confusing when to use/convert it and when not. I am even sure I haven't covered all places and there could be also errors in some calculations.

It also affects API requests. Requesting something in JSON + language DE will return a float as an actual float as supported by JSON using a "dot" see http://demo.piwik.org/index.php?module=API&method=API.getProcessedReport&idSite=7&period=day&date=2014-11-04&apiModule=UserCountry&apiAction=getCountry&showTimer=1&format=JSON&token_auth=anonymous&language=de whereas when requesting XML one will get a float with a comma as it was converted to a string in PHP before rendering it http://demo.piwik.org/index.php?module=API&method=API.getProcessedReport&idSite=7&period=day&date=2014-11-04&apiModule=UserCountry&apiAction=getCountry&showTimer=1&format=XML&token_auth=anonymous&language=de

To actually fix this issue we should set the locale to en_* in the beginning and also make sure this locale is installed. If not, show an error in SystemCheck etc. Before rendering something we should set the locale to the correct value. Eg if someone has configured German language it should use de_DE. Problem is we render everywhere, even in files like Session.php before throwing an exception. Or sometimes we might render something, then query or calculate data and render something again. So there is no clear point I think when we can set the de_DE locale.

Alternatively we could maybe always set en_* as locale and then in the UI whenever we output any numbers call filters that make sure to output the numbers correctly formatted with a comma or dot etc. I think Twig supports this already and as we want to get rid of Twig long term there are similar libs for JS / AngularJS. This means we'd always have to take care of calling a filter when outputting a number / float but I think this is much better as the current implemented fix see 3804705 . As described in the first paragraph this is rather a hack and very hard to understand when to use this method and when not.

@diosmosis @mnapoli @mattab thoughts?

tsteur added a commit that referenced this issue Nov 5, 2014
…not have to care about this. Makes it a tiny bit slower but less prone to errors
tsteur added a commit that referenced this issue Nov 5, 2014
…sing locale de_DE... should try to submit a pull request forthis otherwise we will not be able to simply update this lib
tsteur added a commit that referenced this issue Nov 6, 2014
…e is used (or any locle using comma instead of dot)
@tsteur
Copy link
Member

tsteur commented Nov 6, 2014

Here is a nice example of what I meant: When using German locale (or any other locale using comma instead of dot) it results in 'UTC-5,5' instead of 'UTC-5.5' which probably nobody would expect... resulting in a bug that it is not possible to add a site under circumstances as it says "Invalid Timezone" Such bugs are almost impossible to find: https://github.com/piwik/piwik/blob/19a2adbe7d8d7c9033cb99b4e57d2c5db515e836/plugins/SitesManager/API.php#L1247

Even safe_serialize did serialize wrong: 6e01277 and getRequestVar was buggy f824e9a

Not sure how many more such issues there are. They are hard to identify and it is almost impossible to prevent more of those bugs unless we always work with en_* locale I think

@tsteur
Copy link
Member

tsteur commented Nov 6, 2014

If we do not always use a German locale I propose to run all tests using German locale in the future as most likely most of our users are using a locale that use a comma instead of a dot. We wouldn't notice such bugs otherwise

tsteur added a commit that referenced this issue Nov 6, 2014
@mattab
Copy link
Member

mattab commented Nov 6, 2014

It seems to me that we should try to set the locale to en_US or so - wouldnt' that help solve many problems and we wouldnt have to care as much about wrong locales?

Though I know it's hard to set the setlocale since I see in codebase some other setlocale calls: https://github.com/piwik/piwik/search?utf8=%E2%9C%93&q=setlocale

@tsteur
Copy link
Member

tsteur commented Nov 6, 2014

Yes, as mentioned. Problem is we then have to always format numbers manually when rendering anything to make sure we display a number using a comma instead of a dot in floats when a German or similar locale is used. This can be loads of work as well.

Ideally Piwik wouldn't render everywhere in the code and then it would be easier to handle... so we probably have to weigh up the pros and cons

@tsteur
Copy link
Member

tsteur commented Nov 6, 2014

There are code parts where it is hard to tell whether it causes bugs or not see https://github.com/piwik/piwik/blob/master/plugins/Actions/Reports/Base.php#L74

tsteur added a commit that referenced this issue Nov 6, 2014
This caused me a lot of headache. I ran the system tests with a German
locale and noticed so many values are completely different. I was looking
for places where there were still any float to string conversions and I
was wondering why the existing fix didn't work until I noticed the
PiwikTracker did not send floats but floats with a comma as decimal
point which in the end was casted to ints etc.
@tsteur
Copy link
Member

tsteur commented Nov 6, 2014

I just checked the Twig templates if they output any floats for usage in JavaScript but haven't found any yet. Also clicked through the UI with a German locale and couldn't find any JS error. If someone outputs something like var x = {{ 5.55 }} in a Twig template it would lead in a JS error because var x = 5,5

@tsteur
Copy link
Member

tsteur commented Nov 7, 2014

I think most of the issues related to German locale (or other locale) are fixed but it is hard to tell as most parts of the code are not really covered by tests. Especially not by integration / unit tests which would be very helpful here.

I think we can merge the current changes so it'll work at least for now (mostly). Then I think there can be only one conclusion the more I think about it: We always have to set an en_* locale and format numbers in Twig templates and API (if we want so) etc using Twig filter or using a library or so.

It is easy to explain: While we can live with the fact or bug when a number is not formatted correct in the UI (eg 5.5 instead of 5,5 => easy to fix) we cannot let the platform track or calculate wrong values. It is so easy to cause a bug / wrong values if not a en_* locale is used and those bugs are in places where nobody would expect them see the changes and my comments. Even the PiwikTracker was sending wrong values with non en_* locale. Another thing is that we cannot make sure all the libraries work correctly. For instance the pdf library uses floats quite a lot but also Zend etc. Last but not least we cannot expect plugin developers to be aware of this problem. The worst part: As we do not use a German locale in our tests we would not even notice once we track or display wrong metrics.

Be aware that our code sets the German locale as soon as someone uses German language. Probably most of our users use a language where a comma is used instead of a dot. As soon as this locale is installed on the server we track wrong values.

I am not sure how much work it is and whether we will be able to format the numbers everywhere but it is worth starting it even if it'll take a while. I think it is also worth it to run the UI tests with a locale/language that uses a comma instead of a dot in floats. Shall we handle all this in a separate issue?

tsteur added a commit that referenced this issue Nov 7, 2014
…not have to care about this. Makes it a tiny bit slower but less prone to errors
tsteur added a commit that referenced this issue Nov 7, 2014
…sing locale de_DE... should try to submit a pull request forthis otherwise we will not be able to simply update this lib
tsteur added a commit that referenced this issue Nov 7, 2014
…e is used (or any locle using comma instead of dot)
tsteur added a commit that referenced this issue Nov 7, 2014
tsteur added a commit that referenced this issue Nov 7, 2014
This caused me a lot of headache. I ran the system tests with a German
locale and noticed so many values are completely different. I was looking
for places where there were still any float to string conversions and I
was wondering why the existing fix didn't work until I noticed the
PiwikTracker did not send floats but floats with a comma as decimal
point which in the end was casted to ints etc.
tsteur added a commit that referenced this issue Nov 7, 2014
This one was not easy to find either. The segment value contained a
comma instead of a dot causing the API to not find anything and to not
return any data.
tsteur added a commit that referenced this issue Nov 7, 2014
…le is used the wrong value will be saved (5,55 instead of 5.55)
@tsteur tsteur self-assigned this Nov 7, 2014
@mattab
Copy link
Member

mattab commented Nov 7, 2014

@cauboy can you please try with 2.9.0-b7 just released? http://piwik.org/faq/how-to-update/faq_159/

This should be fixed, we'll close the issue once you confirm it's ok!

@mattab
Copy link
Member

mattab commented Nov 10, 2014

ping @cauboy

@cauboy
Copy link
Author

cauboy commented Nov 10, 2014

@mattab: Sorry for the late response… was a busy weekend.

Approved: I updated to 2.9.0-b8 and it works fine for me.

@mattab
Copy link
Member

mattab commented Nov 11, 2014

@cauboy that's what we like to hear! kuddos @tsteur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

3 participants