New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure current url is not decoded too often #16628
base: 4.x-dev
Are you sure you want to change the base?
Conversation
build js |
@tsteur This might actually fix some more issues that are related to encoding and page charset. |
@sgiehl I think the original urldecode was added because of #7220 and maybe also #7218 . If we now removed that method, we'd have a problem that URL's would be likely stored URL encoded in the actions meaning it will create a new ID action for every URL and it can cause a lot of problems eg if some reports don't group by URL etc. Might need to test this but there's a chance we can't merge this. |
I guess since #7220 a lot has happened in browser development. Seems the url and referrer are no longer url encoded in modern browsers |
@sgiehl do you have maybe some reference to that where browsers changed that? I just tested with some URLs and it seems to be still URL encoded. I would assume it could break a lot of things if browsers simply changed that but it's possible they did. eg viewing |
Ok. Now I understand what you are referring to. That's still normal that some special characters are encoded. But it's actually not the full url. So we shouldn't manipulate the complete url using any decodeWrapper. |
@sgiehl I tested this PR against 4.x-dev using matomo-org/test-examples#9 to track encoded urls and noticed there are changes in what's tracked now. For ISO-8859-1 encoded pages: The first action is tracked via 4.x-dev, and the second via this branch. For other URL encoding: In one case it fixes the bug referenced in the PR, but for others it changes what's tracked. I'm not sure exactly what's ok to change (cc @tsteur), but it might break some segments if we just merge this. I didn't test referrer URL tracking, but that would be affected too. |
I'm actually not sure regarding this. But a special character like |
I think for a bug fix it would be better to maintain existing behavior if possible. We can't know exactly how much might be affected downstream, it could be more than just what is displayed to the user (eg, segments no longer matching). I'm also noticing the first test doesn't save the file in test-examples in ISO-8859-1. Not sure exactly how to do that. |
@tsteur don't think that is possible. We parse the page url to check if some parameters for site search are contained, so if we don't change the tracked page url, site search will stay the same buggy way. |
👍 thanks. moving the issue to 4.7 for now as this current fix would likely introduce quite a few regressions (I remember when we added them and there were various regressions re encoding). We'll look into it again when it's a higher priority. |
@tsteur milestone 4.7.0 has arrived, are you wanting this in this milestone now or still deferring for later? |
hmm looks like maybe it is in 4.8.0 since the milestone got renamed or something, so still deferred 👍 |
@sgiehl be great to explore some other options if there are other ways to fix this in 4.8 milestone without possible regressions |
As far as I can tell there's no way to fix it without affecting the logic that was in effect until now. Any custom logic to try and recreate the original encoding would be pretty complex, and likely to introduce new bugs in some unforeseen special case. The only way out that comes to mind is adding a setting so each user can decide whether they want this fix applied (off by default for existing instances of Matomo, on by default when a new one is created). |
So in the end I agree with @sgiehl that decoding by default is wrong, as it will necessarily lead to the loss of information, if a URL has encoded components, but I also agree that a bug that remains unfixed for long enough becomes expected behavior, and a bug fix shouldn't change it too much. I figured the best way to do this would be to create a tracker plugin, this way if someone wants to keep the old behavior in it's entirety, they can just not install it / keep it disabled. A proof of concept is available in the feature/disable_url_decoding branch of my fork. The decode-disabling can be triggered either from Javascript with a Right now all there is in the settings, is a simple on/off switch, but I was thinking of providing extra options, like:
But I wanted to get some feedback before I go ahead. |
Let's assume we have the search url on the marketplace containing a
&
in search query (see #14715):https://plugins.matomo.org/search?query=Heatmap+%26+session+recording&post_type=product
Currently Matomo tracker js would run a
safeDecodeWrapper
on it. So the resulting url, that would be used for tracking would behttps://plugins.matomo.org/search?query=Heatmap+&+session+recording&post_type=product
, which is actually not the samefixes #14715