Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace unsupported characters in all tracking request params #13437

Merged
merged 3 commits into from Sep 28, 2018

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented Sep 17, 2018

ping @mattab @tsteur

When tracking emojis in e.g. page titles, the page title currently will result in an empty value (or maybe an error, depending on mysql config)

Not sure why we currently replace those characters only for the url and not all params...

refs #8790

also kind of fixes #13148

@sgiehl sgiehl added the Needs Review PRs that need a code review label Sep 17, 2018
@sgiehl sgiehl added this to the 3.6.1 milestone Sep 17, 2018
@tsteur
Copy link
Member

tsteur commented Sep 17, 2018

Not sure why it wasn't done on the whole URL but makes sense I suppose. Original issue was
#7766 and fixed in #8765 . Couldn't find any discussion around this. As it is done in the constructor before data is sanitized it should be fine I presume.

@diosmosis
Copy link
Member

Should we add a test for this (ie, tracking data w/ emojis for example)?

@@ -402,7 +413,7 @@ public function getParam($name)
$paramType = $supportedParams[$name][1];

if ($this->hasParam($name)) {
$this->paramsCache[$name] = Common::getRequestVar($name, $paramDefaultValue, $paramType, $this->params);
$this->paramsCache[$name] = $this->replaceUnsupportedUtf8Chars(Common::getRequestVar($name, $paramDefaultValue, $paramType, $this->params), $name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this... is it needed right now? After a request has been sanitized... I wonder if you can craft certain requests that end up being basically "unsanitized" afterwards

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that as some parameters are sent json_encode'd. Otherwise sending a 4byte char e.g. in ecommerce items (as done in the test) will result in an error. (The Common::getRequestVar will make the json_decode)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll just need to make sure it can ideally not lead to any security issues. In other code it is applied before calling sanitize but calling it afterwards doesn't look as safe to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the reason why the chars gets replaced with a supported char, to avoid someone sends some XSS content that gets "valid" when the unsupported chars are removed in between...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense 👍

}

if (is_array($value)) {
array_walk_recursive ($value, function(&$value, $key){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I haven't checked it... is it possible that there are nested arrays? reading on the documentation:

You may notice that the key 'sweet' is never displayed. Any key that holds an array will not be passed to the function.

Also I presume it keeps indexes/keys?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_walk_recursive will "visit" all leafs of a nested array and it keeps indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good, we're not looking at the key so it is all fine :)

if (array_key_exists('url', $this->params) && preg_match('/[\x{10000}-\x{10FFFF}]/u', $this->params['url'])) {
Common::printDebug("Unsupport character detected. Replacing with \xEF\xBF\xBD");
$this->params['url'] = preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $this->params['url']);
$this->params = $this->replaceUnsupportedUtf8Chars($this->params);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand... I'm testing it right now and wondering why we need the replace here and below in getParam()? I understand in getParam might be because it returns an array from json? Is it still needed here as well? Is this in case someone does ->getParams()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That was meant for getParams(). Not the perfect solution. But as it will be removed in Matomo 4 again, guess it's "good enough".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK was just wondering re performance etc but shouldn't be big.

@tsteur
Copy link
Member

tsteur commented Sep 25, 2018

There are some failing tests. not sure if related to this PR? otherwise if they pass lgtm

@sgiehl
Copy link
Member Author

sgiehl commented Sep 28, 2018

@tsteur tests related to that PR should be fixed now

@diosmosis diosmosis merged commit c90a702 into 3.x-dev Sep 28, 2018
@diosmosis diosmosis deleted the utf8mb4chars branch September 28, 2018 21:30
InfinityVoid pushed a commit to InfinityVoid/matomo that referenced this pull request Oct 11, 2018
…-org#13437)

* replace unsupported characters in all tracking request params

* Ensure unsupported chars are replaced in json_encoded params as well

* Adds simple test for 4byte UTF8 chars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

I can not display properly if there are pictograms etc. in the title
3 participants