New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper encoding of segment values for Actions reports. #13481
Conversation
If it fixes your issue and tests pass probably all good to go. It's almost impossible to review it since the segments are just a mess and we never know where they are urlencoded or urldecoded etc. |
It will change a lot of the test output I think (another encoding in the segment property in API output). Think this won't be a BC break? at least not a potentially huge one. |
I really don't know. As long as we can still handle the segments urlencoded and urldecoded it should be fine. |
Applied the patch locally and it resolves the issue for me. |
@@ -26,6 +26,6 @@ | |||
<exit_rate>100%</exit_rate> | |||
<avg_time_generation>0.333</avg_time_generation> | |||
<url>http://www.example.org/page</url> | |||
<segment>pageUrl==http%3A%2F%2Fwww.example.org%2Fpage</segment> | |||
<segment>pageUrl==http%253A%252F%252Fwww.example.org%252Fpage</segment> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this double encoded on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, because one decode is for the condition, and one for the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's correct, would expect to have here either:
pageUrl==http%253A%252F%252Fwww.example.org%252Fpage
(current behavior)
or
pageUrl%3D%3Dhttp%253A%252F%252Fwww.example.org%252Fpage
(current behavior URL encoded)
but pageUrl==http%253A%252F%252Fwww.example.org%252Fpage
wouldn't be a correct value since Matomo would then segment based on Page URL http%3A%2F%2Fwww.example.org%2Fpage
which wouldn't exist (didn't check but it's possible it currently works anyway since we try the url decoded version in
Lines 99 to 108 in 2020b12
} | |
// First try with url decoded value. If that fails, try with raw value. | |
// If that also fails, it will throw the exception | |
try { | |
$this->initializeSegment(urldecode($segmentCondition), $idSites); | |
} catch (Exception $e) { | |
$this->initializeSegment($segmentCondition, $idSites); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is required to have the value encoded 3 times because it is urldecoded 3 times, see the PR description for more info.
Most URLs don't need to be double encoded, but if the value includes a value that can be decoded, like a plus sign, then it will be interpreted incorrectly if not double encoded. In this case the old API output would be wrong.
I've looked at the PR for quite a while but the PR is quite complex and not trivial to understand especially re side effects etc. Left some comments though. Do you think there would be otherwise a solution where we don't need to change anything except for maybe the segment expression? |
3d48cc2
to
95313cb
Compare
It's not the case, it looked like it was, but I think I tracked something wrong. |
@diosmosis Feedback:
Looking forward to merging this 👍 |
This change turned out to be rather complex, I doubt I'll be able to finish it soon, and it's not as high value as other tickets. Would vote to put it in the priority backlog. |
29d9443
to
a37e21d
Compare
… is present, otherwise the segmentValue will not be used.
…ng first would break segments w/ correcrtly encoded values.
Nevermind, now all the other tests fail. |
@diosmosis let me know when it's ready for a review |
Changes:
Encode segmentValue for Actions reports. This extra encoding is required because of how many times a segment is urldecoded:
pageUrl==...
): https://github.com/matomo-org/matomo/blob/3.x-dev/core/Segment/SegmentExpression.php#L91This means the value needs to be triple encoded for values like plus signs to be used properly in the segment.
Change the urldecode in Actions filter to a urlencode, since SafeDecodeLabel will already run a urldecode later. This fixes the display for URLs w/, eg, '+' signs in them (eg,
http://example.org/with+plus/
will display like that, instead ofhttp://example.org/with plus/
).I am not sure if this fix will have repercussions so I haven't modified the tests. Will do so after someone else gives their opinion.
Fixes #13460