@yhlin00001 opened this Issue on September 13th 2018

Wimdows 2016 IIS + PHP 7.1.10
If the page url is chinese, it will display garbled in visit record page and export to json/xml (but data is correct in db).

Hiw can I solve it ?

@yhlin00001 commented on September 13th 2018

in download
image

in visit record
image

@tsteur commented on September 13th 2018 Member

Can you maybe paste the path part of the URL in here? This will make it easier to reproduce so we can copy/paste. Cheers

@yhlin00001 commented on September 13th 2018
@yhlin00001 commented on September 13th 2018

And piwik can not export to json format data if piwik had this kind of visit record .
We tested in matomo 3.2.0 & 3.5.1.

@yhlin00001 commented on September 15th 2018

I think, I have solved this issue. But please help me to check if it will have any site-effect. Thanks

in PageUrl.php, I change as the follow:

public static function reconstructNormalizedUrl($url, $prefixId)
{
    $map = array_flip(self::$urlPrefixMap);

    if ($prefixId !== null && isset($map[$prefixId])) {
        $fullUrl = $map[$prefixId] . $url;
    } else {
        $fullUrl = $url;
    }

    // Clean up host & hash tags, for URLs
    // YH
        $fullUrl = urlencode($fullUrl);
    $parsedUrl = <a class='mention' href='https://github.com/parse_url'>@parse_url</a>($fullUrl);
    // YH
        $parsedUrl[path] = urldecode($parsedUrl[path]);
        $parsedUrl[query] = urldecode($parsedUrl[query]);
    echo '--parseUrl_1===';
    print_r($parsedUrl);
    $parsedUrl = PageUrl::cleanupHostAndHashTag($parsedUrl);
    echo '--parseUrl_2===';
    print_r($parsedUrl);
    $url       = UrlHelper::getParseUrlReverse($parsedUrl);

    if (!empty($url)) {
        echo '--url='.$url.'<br/>';
        return $url;
    }

    echo '--fullUrl='.$fullUrl.'<br/>';
    return $fullUrl;
}
@yhlin00001 commented on September 15th 2018

Sorry, this is correct code:

public static function reconstructNormalizedUrl($url, $prefixId)
{
    $map = array_flip(self::$urlPrefixMap);

    if ($prefixId !== null && isset($map[$prefixId])) {
        $fullUrl = $map[$prefixId] . $url;
    } else {
        $fullUrl = $url;
    }

    // Clean up host & hash tags, for URLs
    // YH
        $fullUrl = urlencode($fullUrl);
    $parsedUrl = <a class='mention' href='https://github.com/parse_url'>@parse_url</a>($fullUrl);
    // YH
        $parsedUrl[path] = urldecode($parsedUrl[path]);
        $parsedUrl[query] = urldecode($parsedUrl[query]);
    $parsedUrl = PageUrl::cleanupHostAndHashTag($parsedUrl);
    $url       = UrlHelper::getParseUrlReverse($parsedUrl);

    if (!empty($url)) {
        return $url;
    }

    return $fullUrl;
}
@fdellwing commented on September 15th 2018 Contributor

Best would be you create a PR its easier to read and test.

@tsteur commented on September 16th 2018 Member

FYI: Path and query is not defined in above code. The method seems like a good place where it may be buggy since it is used by both the downloads and the visitor details report.

@yhlin00001 commented on September 21st 2018

finally, we replace parse_url with the following method:
public static function mb_parse_url($url)
{
$enc_url = preg_replace_callback(
'%[^:/@?&=#]+%usD',
function ($matches)
{
return urlencode($matches[0]);
},
$url
);

    $parts = parse_url($enc_url);

    if($parts === false)
    {
        throw new \InvalidArgumentException('Malformed URL: ' . $url);
    }

    foreach($parts as $name => $value)
    {
        $parts[$name] = urldecode($value);
    }

    return $parts;
}
@mattab commented on March 19th 2019 Member

Got another report from a customer:

Proper solution use mb_ functions?

This is the parse_url function that raises concerns about some configurations https://bugs.php.net/bug.php?id=52923

By replacing this function with this one http://php.net/manual/en/function.parse-url.php#114817
replacing parse_url with Common :: mb_parse_url here https://github.com/matomo-org/matomo/blob/af3a79c055bfe2c5778b5827ba3d165674315f4b/core/Tracker/PageUrl.php#L43

it may solve the problem?

I think it would take a larger fix such as for strtolower (https://github.com/matomo-org/matomo/issues/10083) since this is not the only place where matomo uses parse_url.

Initial error

URL = /index.php?forceView=1&viewDataTable=VisitorLog&module=Live&action=getLastVisitsDetails&small=1&idSite=7&period=day&date=today&showtitle=1&random=6013

The following error just broke Matomo (v3.8.1):
The string to escape is not a valid UTF-8 string.
plugins/CoreHome/templates/_dataTable.twig line 67

The error is triggered: it seems at this line twig |e('html_attr') on this special inputs + with our server configs:

https://github.com/matomo-org/matomo/blob/6d39aaaf57a710c0f8314a2165d295ffc054868f/plugins/Live/templates/_actionCommon.twig#L25

@mattab commented on March 19th 2019 Member

FYI In customer's case (in previous comment) the issue was fixed (or rather: worked around) by configuring PHP to use LC_CTYPE of fr_FR.UTF-8 instead of the previous value (where error was triggered) of fr_FR.

@guytarr commented on March 20th 2019

I add in Core/Common.php http://php.net/manual/en/function.parse-url.php#114817 with support of optionnal arg "component" (see below) and replace everywhere parse_url by Common::mb_parse_url .
Be Careful to add

use Piwik\Common;

in files when it is not present

    /**
     * parse_url() UTF-8 aware.
     * See https://bugs.php.net/bug.php?id=52923
     * See https://secure.php.net/manual/en/function.parse-url.php<a href='/114817'>#114817</a>
     *
     * <a class='mention' href='https://github.com/param'>@param</a> url $string
     * <a class='mention' href='https://github.com/param'>@param</a> int $component
     * <a class='mention' href='https://github.com/return'>@return</a> string
     */
    public static function mb_parse_url($url, $component = -1)
    {
        $enc_url = preg_replace_callback(
            '%[^:/@?&=#]+%usD',
            function ($matches)
            {
                return urlencode($matches[0]);
            },
            $url
        );

        $parts = parse_url($enc_url, $component);

        if($parts === false)
        {
            throw new \InvalidArgumentException('Malformed URL: ' . $url);
        }

        if($component != -1) 
        {
            return urldecode($parts);
        }

        foreach($parts as $name => $value)
        {
            $parts[$name] = urldecode($value);
        }

        return $parts;
    }
@sgiehl commented on March 23rd 2019 Member

@guytarr would you mind creating a PR with your changes?

@slawa-dev commented on August 20th 2019

Not only the URL but also the page title can not display all UFT-8 characters in matomo.

If page title is
📧 Contact
it will show up as
� Contact

1566287100456

@sgiehl commented on August 20th 2019 Member

@s1awa that's on purpose as the database tables are not yet utf8mb4. Not supported characters are currently converted to �. This will be changed in Matomo 4. See https://github.com/matomo-org/matomo/issues/9785

Powered by GitHub Issue Mirror