Wimdows 2016 IIS + PHP 7.1.10
If the page url is chinese, it will display garbled in visit record page and export to json/xml (but data is correct in db).
Hiw can I solve it ?
Can you maybe paste the path part of the URL in here? This will make it easier to reproduce so we can copy/paste. Cheers
And piwik can not export to json format data if piwik had this kind of visit record .
We tested in matomo 3.2.0 & 3.5.1.
I think, I have solved this issue. But please help me to check if it will have any site-effect. Thanks
in PageUrl.php, I change as the follow:
public static function reconstructNormalizedUrl($url, $prefixId)
{
$map = array_flip(self::$urlPrefixMap);
if ($prefixId !== null && isset($map[$prefixId])) {
$fullUrl = $map[$prefixId] . $url;
} else {
$fullUrl = $url;
}
// Clean up host & hash tags, for URLs
// YH
$fullUrl = urlencode($fullUrl);
$parsedUrl = <a class='mention' href='https://github.com/parse_url'>@parse_url</a>($fullUrl);
// YH
$parsedUrl[path] = urldecode($parsedUrl[path]);
$parsedUrl[query] = urldecode($parsedUrl[query]);
echo '--parseUrl_1===';
print_r($parsedUrl);
$parsedUrl = PageUrl::cleanupHostAndHashTag($parsedUrl);
echo '--parseUrl_2===';
print_r($parsedUrl);
$url = UrlHelper::getParseUrlReverse($parsedUrl);
if (!empty($url)) {
echo '--url='.$url.'<br/>';
return $url;
}
echo '--fullUrl='.$fullUrl.'<br/>';
return $fullUrl;
}
Sorry, this is correct code:
public static function reconstructNormalizedUrl($url, $prefixId)
{
$map = array_flip(self::$urlPrefixMap);
if ($prefixId !== null && isset($map[$prefixId])) {
$fullUrl = $map[$prefixId] . $url;
} else {
$fullUrl = $url;
}
// Clean up host & hash tags, for URLs
// YH
$fullUrl = urlencode($fullUrl);
$parsedUrl = <a class='mention' href='https://github.com/parse_url'>@parse_url</a>($fullUrl);
// YH
$parsedUrl[path] = urldecode($parsedUrl[path]);
$parsedUrl[query] = urldecode($parsedUrl[query]);
$parsedUrl = PageUrl::cleanupHostAndHashTag($parsedUrl);
$url = UrlHelper::getParseUrlReverse($parsedUrl);
if (!empty($url)) {
return $url;
}
return $fullUrl;
}
Best would be you create a PR its easier to read and test.
FYI: Path and query is not defined in above code. The method seems like a good place where it may be buggy since it is used by both the downloads and the visitor details report.
finally, we replace parse_url with the following method:
public static function mb_parse_url($url)
{
$enc_url = preg_replace_callback(
'%[^:/@?&=#]+%usD',
function ($matches)
{
return urlencode($matches[0]);
},
$url
);
$parts = parse_url($enc_url);
if($parts === false)
{
throw new \InvalidArgumentException('Malformed URL: ' . $url);
}
foreach($parts as $name => $value)
{
$parts[$name] = urldecode($value);
}
return $parts;
}
Got another report from a customer:
This is the parse_url function that raises concerns about some configurations https://bugs.php.net/bug.php?id=52923
By replacing this function with this one http://php.net/manual/en/function.parse-url.php#114817
replacing parse_url with Common :: mb_parse_url here https://github.com/matomo-org/matomo/blob/af3a79c055bfe2c5778b5827ba3d165674315f4b/core/Tracker/PageUrl.php#L43
it may solve the problem?
I think it would take a larger fix such as for strtolower (https://github.com/matomo-org/matomo/issues/10083) since this is not the only place where matomo uses parse_url.
URL = /index.php?forceView=1&viewDataTable=VisitorLog&module=Live&action=getLastVisitsDetails&small=1&idSite=7&period=day&date=today&showtitle=1&random=6013
The following error just broke Matomo (v3.8.1):
The string to escape is not a valid UTF-8 string.
plugins/CoreHome/templates/_dataTable.twig line 67
The error is triggered: it seems at this line twig |e('html_attr')
on this special inputs + with our server configs:
FYI In customer's case (in previous comment) the issue was fixed (or rather: worked around) by configuring PHP to use LC_CTYPE
of fr_FR.UTF-8
instead of the previous value (where error was triggered) of fr_FR
.
I add in Core/Common.php http://php.net/manual/en/function.parse-url.php#114817 with support of optionnal arg "component" (see below) and replace everywhere parse_url by Common::mb_parse_url .
Be Careful to add
use Piwik\Common;
in files when it is not present
/**
* parse_url() UTF-8 aware.
* See https://bugs.php.net/bug.php?id=52923
* See https://secure.php.net/manual/en/function.parse-url.php<a href='/114817'>#114817</a>
*
* <a class='mention' href='https://github.com/param'>@param</a> url $string
* <a class='mention' href='https://github.com/param'>@param</a> int $component
* <a class='mention' href='https://github.com/return'>@return</a> string
*/
public static function mb_parse_url($url, $component = -1)
{
$enc_url = preg_replace_callback(
'%[^:/@?&=#]+%usD',
function ($matches)
{
return urlencode($matches[0]);
},
$url
);
$parts = parse_url($enc_url, $component);
if($parts === false)
{
throw new \InvalidArgumentException('Malformed URL: ' . $url);
}
if($component != -1)
{
return urldecode($parts);
}
foreach($parts as $name => $value)
{
$parts[$name] = urldecode($value);
}
return $parts;
}
@guytarr would you mind creating a PR with your changes?
Not only the URL but also the page title can not display all UFT-8 characters in matomo.
If page title is📧 Contact
it will show up as� Contact
@s1awa that's on purpose as the database tables are not yet utf8mb4. Not supported characters are currently converted to �. This will be changed in Matomo 4. See https://github.com/matomo-org/matomo/issues/9785