@sgiehl opened this Pull Request on October 29th 2020 Member

Let's assume we have the search url on the marketplace containing a & in search query (see #14715):
https://plugins.matomo.org/search?query=Heatmap+%26+session+recording&post_type=product

Currently Matomo tracker js would run a safeDecodeWrapper on it. So the resulting url, that would be used for tracking would be
https://plugins.matomo.org/search?query=Heatmap+&+session+recording&post_type=product, which is actually not the same

fixes #14715

@tsteur commented on October 29th 2020 Member

We might need to check if there's any other way to fix this as this was added because of https://github.com/matomo-org/matomo/issues/8413 in https://github.com/matomo-org/matomo/pull/8510

Anyway, let's for now focus on Matomo 4 release etc.

@sgiehl commented on December 21st 2020 Member

build js

@sgiehl commented on December 21st 2020 Member

@tsteur This might actually fix some more issues that are related to encoding and page charset.
It shouldn't make any problems like #8413 or #8510, as before decodeURIComponant was used, which throwed the error. Now the url would not be decoded. I'm actually still wondering why that should be needed at all. Are there any cases where the location object might hold any values that are double encoded?

@tsteur commented on December 21st 2020 Member

@sgiehl I think the original urldecode was added because of https://github.com/matomo-org/matomo/issues/7220 and maybe also https://github.com/matomo-org/matomo/issues/7218 . If we now removed that method, we'd have a problem that URL's would be likely stored URL encoded in the actions meaning it will create a new ID action for every URL and it can cause a lot of problems eg if some reports don't group by URL etc. Might need to test this but there's a chance we can't merge this.

@sgiehl commented on December 22nd 2020 Member

I guess since #7220 a lot has happened in browser development. Seems the url and referrer are no longer url encoded in modern browsers

@tsteur commented on December 22nd 2020 Member

@sgiehl do you have maybe some reference to that where browsers changed that? I just tested with some URLs and it seems to be still URL encoded. I would assume it could break a lot of things if browsers simply changed that but it's possible they did.

eg viewing /test.html?x=34L#$3L<a href='/4'>#4</a>L1>124:56:7&test then some characters were encoded
image

@sgiehl commented on December 23rd 2020 Member

Ok. Now I understand what you are referring to. That's still normal that some special characters are encoded. But it's actually not the full url. So we shouldn't manipulate the complete url using any decodeWrapper.
Let's assume you request something like test.html?q=test%26q%3Dp, which is a valid request with one query parameter.
After decoding it you get test.html?q=test&q=p, which is obviously something else.

Powered by GitHub Issue Mirror