@mattab opened this Issue on June 20th 2017 Member

Reproduce

  • Open outlinks report on demo
  • Find an outlink with an & in the URL such as http://demo.piwik.org/index.php?module=SitesManager&action=downloadPiwikTracker&idSite={$IDSITE}&piwikUrl=http://piwik.example.org/
  • Click on the segmented visitor log icon

Got

No data:

segmented visitor log no data

URL requested was

https://demo.piwik.org/index.php?segment=outlinkUrl==http%3A%2F%2Fdemo.piwik.org%2Findex.php%3Fmodule%3DSitesManager%26amp%3Baction%3DdownloadPiwikTracker%26amp%3BidSite%3D%7B%24IDSITE%7D%26amp%3BpiwikUrl%3Dhttp%3A%2F%2Fpiwik.example.org%2F&date=today&module=Live&action=indexVisitorLog&disableLink=1&small=1&hideProfileLink=1&period=week&idSite=1

Expected instead

The visitor log should show all users who clicked on that outlinks.

There seems to be a problem with the encoding, the & is encoded too many times and end up as amp etc.

Similar to https://github.com/piwik/piwik/issues/10126 https://github.com/piwik/piwik/issues/6287 https://github.com/piwik/piwik/issues/8395

@sgiehl commented on July 17th 2017 Member

Tried to fix that for a while now, but it seems to be a bit more tricky then I thought.
I don't think the problem is double encoding. The Outlink might be stored with & in the piwik database, so it would be correct to include that in the segment. Currently not sure, why it doesn't work. Always decoding the segment value didn't help.

@mneudert would you maybe be keen to have a look here? :slightly_smiling_face:

@mneudert commented on July 18th 2017 Member

I have taken a good look at the behavior and perhaps can explain what is going on (ignoring the potential encoding error in the popup title).

To clarify my findings I whipped together some hackish fix. One hackish enough to break a lot of other things completely unrelated...

Let's begin the journey with the popup URL getting opened. The URL is both url- and html-encoded (%26amp%3B) in the segment GET-Parameter. This way it gets transported to the constructor of \Piwik\Segment and the funny things begin. The first round of initializeSegment exits with an Exception because the semicolon of & gets treated as an AND-Operator and that leads to a broken expression tree. The second check is not much more helpful resulting in an SQL-Statement containing ( 1 = 0 ) (I think the SQL_WHERE_DO_NOT_MATCH_ANY_ROW from \Piwik\Segment\SegmentExpression is the one kicking in here) and therefore no results.

My first fix was to tell the Expression parser what an & is and that it is a good thing. Resulting in a properly parsed expression tree containing [ 'outlinkUrl', '==', 'url?with=some&ampersand=stuff' ]. But during the following cleaning process the TableLogAction::getIdActionFromSegment now messes up the encoding and returns it completely decoded. Without the & in place. And now we are back to no results because the database has the url encoded. Removing the sanitization for outlinks helped to fix that one but that seems really off to me and might break more than worth it.

A different way might be to double-encode the segmentation parameter. But that seems equally hackish... perhaps Travis can shed some light on what gets broken...

@sgiehl commented on July 19th 2017 Member

@mneudert Well, no tests have been failing due to your changes. But guess we should start by adding a couple of new tests for segments (see #11384), so we can later prove changes required to fix this here, don't break anything else.

Note: Moving out of 3.0.5 as we aim to release this soon

@eldk commented on November 13th 2018

Hello,

When viewing "Segmented visitor log" in outlink report, it seems that if outlinkUrl parameter contain & all the special characters are double encoded.

Example :
Real outlink url :

http://sub.domain.tld/folder/1/709-53476-19255-0/1?aa3=4&toolid=99999&campid=99999&customid=11111&lgeo=1&vectorid=88888&item=77777&mpre=https://www.domain.tld/itm/66666

outlinkUrl parameter in Nginx access log when request segmented visitor log :

http%253A%252F%252Fsub.domain.tld%252Ffolder%252F1%252F709-53476-19255-0%252F1%253Faa3%253D4%2526amp%253Btoolid%253D99999%2526amp%253Bcampid%253D99999%2526amp%253Bcustomid%253D11111%2526amp%253Blgeo%253D1%2526amp%253Bvectorid%253D88888
%2526amp%253Bitem%253D77777%2526amp%253Bmpre%253Dhttps%253A%252F%252Fwww.domain.tld%252Fitm%252F66666
Real url outlinkUrl from Nginx Access logs outlinkUrl from browser post parameters In database : piwik_log_action.name data-segment-filter
http http http http
:// %253A%252F%252F %3A%2F%2F ://
sub.domain.tld sub.domain.tld sub.domain.tld sub.domain.tld
/ %252F %2F /
folder folder folder folder
/ %252F %2F /
1 1 1 1
/ %252F %2F /
709-53476-19255-0 709-53476-19255-0 709-53476-19255-0 709-53476-19255-0
/ %252F %2F /
1 1 1 1
? %253F %3F ?
aa3 aa3 aa3 aa3
= %253D %3D =
4 4 4 4
& %2526amp%253B %26amp%3B & %26amp%3B
toolid toolid toolid toolid
= %253D %3D =
99999 99999 99999 99999
& %2526amp%253B %26amp%3B &
campid campid campid campid
= %253D %3D =
99999 99999 99999 99999
& %2526amp%253B %26amp%3B &
customid customid customid customid
= %253D %3D =
11111 11111 11111 11111
& %2526amp%253B %26amp%3B &
lgeo lgeo lgeo lgeo
= %253D %3D =
1 1 1 1
& %2526amp%253B %26amp%3B &
vectorid vectorid vectorid vectorid
= %253D %3D =
88888 88888 88888 88888
& %2526amp%253B %26amp%3B &
item item item item
= %253D %3D =
77777 77777 77777 77777
& %2526amp%253B %26amp%3B &
mpre mpre mpre mpre
= %253D %3D =
https https https https
:// %253A%252F%252F %3A%2F%2F ://
www.domain.tld www.domain.tld www.domain.tld www.domain.tld
/ %252F %2F /
itm itm itm itm
/ %252F %2F /
66666 66666 66666 66666
Powered by GitHub Issue Mirror