Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

visit action url is garbled #13424

Closed
yhlin00001 opened this issue Sep 13, 2018 · 17 comments
Closed

visit action url is garbled #13424

yhlin00001 opened this issue Sep 13, 2018 · 17 comments
Labels
answered For when a question was asked and we referred to forum or answered it. Bug For errors / faults / flaws / inconsistencies etc.

Comments

@yhlin00001
Copy link

Wimdows 2016 IIS + PHP 7.1.10
If the page url is chinese, it will display garbled in visit record page and export to json/xml (but data is correct in db).

Hiw can I solve it ?

@yhlin00001
Copy link
Author

in download
image

in visit record
image

@tsteur
Copy link
Member

tsteur commented Sep 13, 2018

Can you maybe paste the path part of the URL in here? This will make it easier to reproduce so we can copy/paste. Cheers

@yhlin00001
Copy link
Author

And piwik can not export to json format data if piwik had this kind of visit record .
We tested in matomo 3.2.0 & 3.5.1.

@yhlin00001
Copy link
Author

I think, I have solved this issue. But please help me to check if it will have any site-effect. Thanks

in PageUrl.php, I change as the follow:

public static function reconstructNormalizedUrl($url, $prefixId)
{
    $map = array_flip(self::$urlPrefixMap);

    if ($prefixId !== null && isset($map[$prefixId])) {
        $fullUrl = $map[$prefixId] . $url;
    } else {
        $fullUrl = $url;
    }

    // Clean up host & hash tags, for URLs
	// YH
		$fullUrl = urlencode($fullUrl);
    $parsedUrl = @parse_url($fullUrl);
	// YH
		$parsedUrl[path] = urldecode($parsedUrl[path]);
		$parsedUrl[query] = urldecode($parsedUrl[query]);
	echo '--parseUrl_1===';
	print_r($parsedUrl);
    $parsedUrl = PageUrl::cleanupHostAndHashTag($parsedUrl);
	echo '--parseUrl_2===';
	print_r($parsedUrl);
    $url       = UrlHelper::getParseUrlReverse($parsedUrl);

    if (!empty($url)) {
		echo '--url='.$url.'<br/>';
        return $url;
    }

	echo '--fullUrl='.$fullUrl.'<br/>';
    return $fullUrl;
}

@yhlin00001
Copy link
Author

Sorry, this is correct code:

public static function reconstructNormalizedUrl($url, $prefixId)
{
    $map = array_flip(self::$urlPrefixMap);

    if ($prefixId !== null && isset($map[$prefixId])) {
        $fullUrl = $map[$prefixId] . $url;
    } else {
        $fullUrl = $url;
    }

    // Clean up host & hash tags, for URLs
	// YH
		$fullUrl = urlencode($fullUrl);
    $parsedUrl = @parse_url($fullUrl);
	// YH
		$parsedUrl[path] = urldecode($parsedUrl[path]);
		$parsedUrl[query] = urldecode($parsedUrl[query]);
    $parsedUrl = PageUrl::cleanupHostAndHashTag($parsedUrl);
    $url       = UrlHelper::getParseUrlReverse($parsedUrl);

    if (!empty($url)) {
        return $url;
    }

    return $fullUrl;
}

@fdellwing
Copy link
Contributor

Best would be you create a PR its easier to read and test.

@tsteur
Copy link
Member

tsteur commented Sep 16, 2018

FYI: Path and query is not defined in above code. The method seems like a good place where it may be buggy since it is used by both the downloads and the visitor details report.

@yhlin00001
Copy link
Author

finally, we replace parse_url with the following method:
public static function mb_parse_url($url)
{
$enc_url = preg_replace_callback(
'%[^:/@?&=#]+%usD',
function ($matches)
{
return urlencode($matches[0]);
},
$url
);

    $parts = parse_url($enc_url);
    
    if($parts === false)
    {
        throw new \InvalidArgumentException('Malformed URL: ' . $url);
    }
    
    foreach($parts as $name => $value)
    {
        $parts[$name] = urldecode($value);
    }
    
    return $parts;
}

@mattab
Copy link
Member

mattab commented Mar 19, 2019

Got another report from a customer:

Proper solution use mb_ functions?

This is the parse_url function that raises concerns about some configurations https://bugs.php.net/bug.php?id=52923

By replacing this function with this one http://php.net/manual/en/function.parse-url.php#114817
replacing parse_url with Common :: mb_parse_url here

$parsedUrl = @parse_url($originalUrl);

it may solve the problem?

I think it would take a larger fix such as for strtolower (#10083) since this is not the only place where matomo uses parse_url.

Initial error

URL = /index.php?forceView=1&viewDataTable=VisitorLog&module=Live&action=getLastVisitsDetails&small=1&idSite=7&period=day&date=today&showtitle=1&random=6013

The following error just broke Matomo (v3.8.1):
The string to escape is not a valid UTF-8 string.
plugins/CoreHome/templates/_dataTable.twig line 67

The error is triggered: it seems at this line twig |e('html_attr') on this special inputs + with our server configs:

<a href="{{ action.url|safelink|e('html_attr') }}" rel="noreferrer noopener" target="_blank"

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Mar 19, 2019
@mattab
Copy link
Member

mattab commented Mar 19, 2019

FYI In customer's case (in previous comment) the issue was fixed (or rather: worked around) by configuring PHP to use LC_CTYPE of fr_FR.UTF-8 instead of the previous value (where error was triggered) of fr_FR.

@tsteur tsteur added this to the Priority Backlog (Help wanted) milestone Mar 19, 2019
@guytarr
Copy link

guytarr commented Mar 20, 2019

I add in Core/Common.php http://php.net/manual/en/function.parse-url.php#114817 with support of optionnal arg "component" (see below) and replace everywhere parse_url by Common::mb_parse_url .
Be Careful to add

use Piwik\Common;

in files when it is not present

    /**
     * parse_url() UTF-8 aware.
     * See https://bugs.php.net/bug.php?id=52923
     * See https://secure.php.net/manual/en/function.parse-url.php#114817
     *
     * @param url $string
     * @param int $component
     * @return string
     */
    public static function mb_parse_url($url, $component = -1)
	{
		$enc_url = preg_replace_callback(
			'%[^:/@?&=#]+%usD',
			function ($matches)
			{
				return urlencode($matches[0]);
			},
			$url
		);
		
		$parts = parse_url($enc_url, $component);
		
		if($parts === false)
		{
			throw new \InvalidArgumentException('Malformed URL: ' . $url);
		}
		
		if($component != -1) 
		{
			return urldecode($parts);
		}
		
		foreach($parts as $name => $value)
		{
			$parts[$name] = urldecode($value);
		}
		
		return $parts;
	}

@sgiehl
Copy link
Member

sgiehl commented Mar 23, 2019

@guytarr would you mind creating a PR with your changes?

@slawa-dev
Copy link

slawa-dev commented Aug 20, 2019

Not only the URL but also the page title can not display all UFT-8 characters in matomo.

If page title is
📧 Contact
it will show up as
� Contact

1566287100456

@sgiehl
Copy link
Member

sgiehl commented Aug 20, 2019

@S1awa that's on purpose as the database tables are not yet utf8mb4. Not supported characters are currently converted to �. This will be changed in Matomo 4. See #9785

@slawa-dev
Copy link

I updated to version 4 and converted the database tables to utf8mb4. Now everything shows up as expected!

@mattab
Copy link
Member

mattab commented Dec 10, 2023

Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again!

@mattab mattab closed this as completed Dec 10, 2023
@mattab mattab added the answered For when a question was asked and we referred to forum or answered it. label Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it. Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

7 participants