Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmented visitor log faulted for specific urls #10126

Closed
wronan opened this issue May 6, 2016 · 4 comments
Closed

Segmented visitor log faulted for specific urls #10126

wronan opened this issue May 6, 2016 · 4 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc. worksforme The issue cannot be reproduced and things work as intended.

Comments

@wronan
Copy link
Contributor

wronan commented May 6, 2016

I'm dealing with situation, where segmented Visitor Log is showing no visits for specific urls. Story goes like that:

  • I have such url: http://some.domain.tld/path/something?%2F=
  • it's correctly inserted into log_action as some.domain.tld/path/something?%2F= (OK)
  • in Visitors > Visitor Log it's also presented as some.domain.tld/path/something?%2F= (OK)
  • in Actions > Pages it's presented as some.domain.tld/path/something?/= (which itself is not the real problem but still some inconsistency) (wrong)
  • when I click on 'Open segmented Visitor Log' for that page (while I'm still in Actions > Pages), I get empty result, because the url sent to Segment is (...)something?/= instead of (...)something?%2F= and there is no such action name of course like (...)something?/=

Suggested solution:
Disable urldecoding in the Actions > Pages report
or
switch 'urldecode' to 'urlencode' here: https://github.com/piwik/piwik/blob/master/plugins/Actions/DataTable/Filter/Actions.php#L36

then remove 'urldecode' from here: https://github.com/piwik/piwik/blob/master/core/Segment/SegmentExpression.php#L113 (string was already decoded a couple of lines earlier: https://github.com/piwik/piwik/blob/master/core/Segment/SegmentExpression.php#L92)

This fixes my problem but I haven't give much thought in the potential other consequences. What do you think guys?

@tsteur
Copy link
Member

tsteur commented May 10, 2016

refs #8395

I believe we have a few more similar bug reports eg with row evolution #6287

It is likely there would be a regression, eg segments can be urlencoded or just plain etc and there are likely many other cases to consider and to not break any other features / reports that handle the data differently. Fixing this one similar to the other URL related issues will be tricky.

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label May 10, 2016
@wronan
Copy link
Contributor Author

wronan commented May 10, 2016

Yeah, I realize that, so I opened a discussion first. I will try to introduce some PR soon. Maybe we can work something out. Thx for looking into this.

@mattab mattab added this to the 2.16.x (LTS) milestone May 27, 2016
@mattab mattab modified the milestones: 2.16.x (LTS), Mid term Aug 25, 2016
@FewKinG
Copy link

FewKinG commented Oct 12, 2016

I wanted to let you know, that I experienced similar problems. Some of our tracked URLs contain actual percent signs (because in some cases we encode special chars beforehand), see this example of an URL containing an emoji:

http://our.app/path-%f0%9f%98%83

Tracking works correctly, however, the record gets stored in the database actually containing the encoded form of the emoji (which is expected as I sent it this way to the tracking API. I could also sent the record in UTF-8 format without previous HTML encoding, then it will get stored in its UTF-8 representation - given the database supports for example utf8mb4 - again also as expected).

At first, everything looks fine when I open the reports in the Piwik frontend. However, in the displayed URLs the encoded emoji actually gets displayed in the graphical form '😃' which in my opinion is the first sign that something might be wrong. As the URL was sent in encoded form, it should also be displayed as that in the reports.

Then, when I want to see the segmented visitor log or try to get reporting data using the reporting API I finally run into real trouble. In the request sent from the piwik frontend, the requested segment in the query string looks like this:

?segment=pageUrl==http%3A%2F%2Four.app%2Fpath-%F0%9F%98%83

That is actually wrong, I would expect the emoji to be double encoded at this point like that:

?segment=pageUrl==http%3A%2F%2Four.app%2Fpath-%25F0%259F%2598%2583

Even more intriguing is that when I try to request the information using the API I actually do double encode the emoji but still don't get the data I want.
I inspected the actual SQL queries performed during the request and found out that it makes no difference whether I double encode or not, the query always looks for the UTF-8 representation. This is part of the SQL query performed in both cases:

SELECT idaction FROM piwik_log_action WHERE ( name LIKE CONCAT('our.app/path-😃', '%') AND type = 1 )'

Seemingly, in the PHP backend, piwik does multiple decoding rounds, resulting in the emoji encoding being transformed from %25F0%259F%2598%2583 to %F0%9F%98%83 to 😃. That's what I assumed anyway.

Going from there I then tried successively adding more and more encoding rounds to my request and finally got to a state that worked when I actually performed 5 (!!) encoding steps, ending up with this:

?segment=pageUrl==http%3A%2F%2Four.app%2Fpath-%25252525f0%252525259f%2525252598%2525252583.

This finally returns sensible data. To me it seems there is something really broken here with Piwik's assumptions about the encoding states.

@mattab
Copy link
Member

mattab commented Dec 10, 2023

Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again!

@mattab mattab closed this as not planned Won't fix, can't repro, duplicate, stale Dec 10, 2023
@sgiehl sgiehl added the worksforme The issue cannot be reproduced and things work as intended. label Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. worksforme The issue cannot be reproduced and things work as intended.
Projects
None yet
Development

No branches or pull requests

5 participants