New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace unsupported characters in all tracking request params #13437
Conversation
Should we add a test for this (ie, tracking data w/ emojis for example)? |
@@ -402,7 +413,7 @@ public function getParam($name) | |||
$paramType = $supportedParams[$name][1]; | |||
|
|||
if ($this->hasParam($name)) { | |||
$this->paramsCache[$name] = Common::getRequestVar($name, $paramDefaultValue, $paramType, $this->params); | |||
$this->paramsCache[$name] = $this->replaceUnsupportedUtf8Chars(Common::getRequestVar($name, $paramDefaultValue, $paramType, $this->params), $name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this... is it needed right now? After a request has been sanitized... I wonder if you can craft certain requests that end up being basically "unsanitized" afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added that as some parameters are sent json_encode'd. Otherwise sending a 4byte char e.g. in ecommerce items (as done in the test) will result in an error. (The Common::getRequestVar
will make the json_decode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll just need to make sure it can ideally not lead to any security issues. In other code it is applied before calling sanitize
but calling it afterwards doesn't look as safe to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the reason why the chars gets replaced with a supported char, to avoid someone sends some XSS content that gets "valid" when the unsupported chars are removed in between...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense 👍
282ba0d
to
7b9eb92
Compare
} | ||
|
||
if (is_array($value)) { | ||
array_walk_recursive ($value, function(&$value, $key){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I haven't checked it... is it possible that there are nested arrays? reading on the documentation:
You may notice that the key 'sweet' is never displayed. Any key that holds an array will not be passed to the function.
Also I presume it keeps indexes/keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_walk_recursive
will "visit" all leafs of a nested array and it keeps indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good, we're not looking at the key so it is all fine :)
if (array_key_exists('url', $this->params) && preg_match('/[\x{10000}-\x{10FFFF}]/u', $this->params['url'])) { | ||
Common::printDebug("Unsupport character detected. Replacing with \xEF\xBF\xBD"); | ||
$this->params['url'] = preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $this->params['url']); | ||
$this->params = $this->replaceUnsupportedUtf8Chars($this->params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand... I'm testing it right now and wondering why we need the replace here and below in getParam()
? I understand in getParam
might be because it returns an array from json? Is it still needed here as well? Is this in case someone does ->getParams()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That was meant for getParams()
. Not the perfect solution. But as it will be removed in Matomo 4 again, guess it's "good enough".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK was just wondering re performance etc but shouldn't be big.
There are some failing tests. not sure if related to this PR? otherwise if they pass lgtm |
7b9eb92
to
dbf9840
Compare
@tsteur tests related to that PR should be fixed now |
…-org#13437) * replace unsupported characters in all tracking request params * Ensure unsupported chars are replaced in json_encoded params as well * Adds simple test for 4byte UTF8 chars
ping @mattab @tsteur
When tracking emojis in e.g. page titles, the page title currently will result in an empty value (or maybe an error, depending on mysql config)
Not sure why we currently replace those characters only for the url and not all params...
refs #8790
also kind of fixes #13148