New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmented visitor log faulted for specific urls #10126
Comments
refs #8395 I believe we have a few more similar bug reports eg with row evolution #6287 It is likely there would be a regression, eg segments can be urlencoded or just plain etc and there are likely many other cases to consider and to not break any other features / reports that handle the data differently. Fixing this one similar to the other URL related issues will be tricky. |
Yeah, I realize that, so I opened a discussion first. I will try to introduce some PR soon. Maybe we can work something out. Thx for looking into this. |
I wanted to let you know, that I experienced similar problems. Some of our tracked URLs contain actual percent signs (because in some cases we encode special chars beforehand), see this example of an URL containing an emoji:
Tracking works correctly, however, the record gets stored in the database actually containing the encoded form of the emoji (which is expected as I sent it this way to the tracking API. I could also sent the record in UTF-8 format without previous HTML encoding, then it will get stored in its UTF-8 representation - given the database supports for example utf8mb4 - again also as expected). At first, everything looks fine when I open the reports in the Piwik frontend. However, in the displayed URLs the encoded emoji actually gets displayed in the graphical form '😃' which in my opinion is the first sign that something might be wrong. As the URL was sent in encoded form, it should also be displayed as that in the reports. Then, when I want to see the segmented visitor log or try to get reporting data using the reporting API I finally run into real trouble. In the request sent from the piwik frontend, the requested segment in the query string looks like this:
That is actually wrong, I would expect the emoji to be double encoded at this point like that:
Even more intriguing is that when I try to request the information using the API I actually do double encode the emoji but still don't get the data I want.
Seemingly, in the PHP backend, piwik does multiple decoding rounds, resulting in the emoji encoding being transformed from Going from there I then tried successively adding more and more encoding rounds to my request and finally got to a state that worked when I actually performed 5 (!!) encoding steps, ending up with this:
This finally returns sensible data. To me it seems there is something really broken here with Piwik's assumptions about the encoding states. |
Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again! |
I'm dealing with situation, where segmented Visitor Log is showing no visits for specific urls. Story goes like that:
Suggested solution:
Disable urldecoding in the Actions > Pages report
or
switch 'urldecode' to 'urlencode' here: https://github.com/piwik/piwik/blob/master/plugins/Actions/DataTable/Filter/Actions.php#L36
then remove 'urldecode' from here: https://github.com/piwik/piwik/blob/master/core/Segment/SegmentExpression.php#L113 (string was already decoded a couple of lines earlier: https://github.com/piwik/piwik/blob/master/core/Segment/SegmentExpression.php#L92)
This fixes my problem but I haven't give much thought in the potential other consequences. What do you think guys?
The text was updated successfully, but these errors were encountered: