Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmented visitor log does not load any content when the Outlink URL or Download URL contains an ampersand #11806

Closed
mattab opened this issue Jun 20, 2017 · 4 comments · Fixed by #16977
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Jun 20, 2017

Reproduce

  • Open outlinks report on demo
  • Find an outlink with an & in the URL such as http://demo.piwik.org/index.php?module=SitesManager&action=downloadPiwikTracker&idSite={$IDSITE}&piwikUrl=http://piwik.example.org/
  • Click on the segmented visitor log icon

Got

No data:

segmented visitor log no data

URL requested was

https://demo.piwik.org/index.php?segment=outlinkUrl==http%3A%2F%2Fdemo.piwik.org%2Findex.php%3Fmodule%3DSitesManager%26amp%3Baction%3DdownloadPiwikTracker%26amp%3BidSite%3D%7B%24IDSITE%7D%26amp%3BpiwikUrl%3Dhttp%3A%2F%2Fpiwik.example.org%2F&date=today&module=Live&action=indexVisitorLog&disableLink=1&small=1&hideProfileLink=1&period=week&idSite=1

Expected instead

The visitor log should show all users who clicked on that outlinks.

There seems to be a problem with the encoding, the & is encoded too many times and end up as amp etc.

Similar to #10126 #6287 #8395

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Jun 20, 2017
@mattab mattab added this to the 3.0.5 milestone Jun 20, 2017
@sgiehl
Copy link
Member

sgiehl commented Jul 17, 2017

Tried to fix that for a while now, but it seems to be a bit more tricky then I thought.
I don't think the problem is double encoding. The Outlink might be stored with & in the piwik database, so it would be correct to include that in the segment. Currently not sure, why it doesn't work. Always decoding the segment value didn't help.

@mneudert would you maybe be keen to have a look here? 🙂

@mneudert
Copy link
Member

I have taken a good look at the behavior and perhaps can explain what is going on (ignoring the potential encoding error in the popup title).

To clarify my findings I whipped together some hackish fix. One hackish enough to break a lot of other things completely unrelated...

Let's begin the journey with the popup URL getting opened. The URL is both url- and html-encoded (%26amp%3B) in the segment GET-Parameter. This way it gets transported to the constructor of \Piwik\Segment and the funny things begin. The first round of initializeSegment exits with an Exception because the semicolon of & gets treated as an AND-Operator and that leads to a broken expression tree. The second check is not much more helpful resulting in an SQL-Statement containing ( 1 = 0 ) (I think the SQL_WHERE_DO_NOT_MATCH_ANY_ROW from \Piwik\Segment\SegmentExpression is the one kicking in here) and therefore no results.

My first fix was to tell the Expression parser what an & is and that it is a good thing. Resulting in a properly parsed expression tree containing [ 'outlinkUrl', '==', 'url?with=some&ampersand=stuff' ]. But during the following cleaning process the TableLogAction::getIdActionFromSegment now messes up the encoding and returns it completely decoded. Without the & in place. And now we are back to no results because the database has the url encoded. Removing the sanitization for outlinks helped to fix that one but that seems really off to me and might break more than worth it.

A different way might be to double-encode the segmentation parameter. But that seems equally hackish... perhaps Travis can shed some light on what gets broken...

@sgiehl
Copy link
Member

sgiehl commented Jul 19, 2017

@mneudert Well, no tests have been failing due to your changes. But guess we should start by adding a couple of new tests for segments (see #11384), so we can later prove changes required to fix this here, don't break anything else.

Note: Moving out of 3.0.5 as we aim to release this soon

@eldk
Copy link

eldk commented Nov 13, 2018

Hello,

When viewing "Segmented visitor log" in outlink report, it seems that if outlinkUrl parameter contain & all the special characters are double encoded.

Example :
Real outlink url :

http://sub.domain.tld/folder/1/709-53476-19255-0/1?aa3=4&toolid=99999&campid=99999&customid=11111&lgeo=1&vectorid=88888&item=77777&mpre=https://www.domain.tld/itm/66666

outlinkUrl parameter in Nginx access log when request segmented visitor log :

http%253A%252F%252Fsub.domain.tld%252Ffolder%252F1%252F709-53476-19255-0%252F1%253Faa3%253D4%2526amp%253Btoolid%253D99999%2526amp%253Bcampid%253D99999%2526amp%253Bcustomid%253D11111%2526amp%253Blgeo%253D1%2526amp%253Bvectorid%253D88888
%2526amp%253Bitem%253D77777%2526amp%253Bmpre%253Dhttps%253A%252F%252Fwww.domain.tld%252Fitm%252F66666
Real url outlinkUrl from Nginx Access logs outlinkUrl from browser post parameters In database : piwik_log_action.name data-segment-filter
http http http http
:// %253A%252F%252F %3A%2F%2F ://
sub.domain.tld sub.domain.tld sub.domain.tld sub.domain.tld
/ %252F %2F /
folder folder folder folder
/ %252F %2F /
1 1 1 1
/ %252F %2F /
709-53476-19255-0 709-53476-19255-0 709-53476-19255-0 709-53476-19255-0
/ %252F %2F /
1 1 1 1
? %253F %3F ?
aa3 aa3 aa3 aa3
= %253D %3D =
4 4 4 4
& %2526amp%253B %26amp%3B & %26amp%3B
toolid toolid toolid toolid
= %253D %3D =
99999 99999 99999 99999
& %2526amp%253B %26amp%3B &
campid campid campid campid
= %253D %3D =
99999 99999 99999 99999
& %2526amp%253B %26amp%3B &
customid customid customid customid
= %253D %3D =
11111 11111 11111 11111
& %2526amp%253B %26amp%3B &
lgeo lgeo lgeo lgeo
= %253D %3D =
1 1 1 1
& %2526amp%253B %26amp%3B &
vectorid vectorid vectorid vectorid
= %253D %3D =
88888 88888 88888 88888
& %2526amp%253B %26amp%3B &
item item item item
= %253D %3D =
77777 77777 77777 77777
& %2526amp%253B %26amp%3B &
mpre mpre mpre mpre
= %253D %3D =
https https https https
:// %253A%252F%252F %3A%2F%2F ://
www.domain.tld www.domain.tld www.domain.tld www.domain.tld
/ %252F %2F /
itm itm itm itm
/ %252F %2F /
66666 66666 66666 66666

@diosmosis diosmosis modified the milestones: 3.8.0, 3.9.0 Jan 7, 2019
@mattab mattab modified the milestones: 3.9.0, 3.10.0 Mar 18, 2019
@mattab mattab modified the milestones: 3.10.0, 3.12.0 Jun 11, 2019
@mattab mattab modified the milestones: 3.13.0, 4.1.0 Oct 30, 2019
@mattab mattab changed the title Segmented visitor log does not load any content when the Outlink URL contains an ampersand Segmented visitor log does not load any content when the Outlink URL or Download URL contains an ampersand Aug 20, 2020
@sgiehl sgiehl self-assigned this Dec 18, 2020
@mattab mattab modified the milestones: 4.1.0, 4.2.0 Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants