Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript tracker exclude query parameters #18031

Merged
merged 18 commits into from Oct 4, 2021
Merged

Conversation

bx80
Copy link
Contributor

@bx80 bx80 commented Sep 19, 2021

Description:

Fixes #11627

(This is a re-submission of PR 17887 to work around merge failure)

Adds a new option to the JavaScript tracker to exclude query parameters from the tracked URL.

_paq.push(["setExcludedQueryParams", ["uid", "sid"]]);

Existing purify() and removeUrlParameter() functions were reused.

The tracker code generator has been updated to automatically add this option to the generated javascript tracking code using the excluded query parameters configured under Websites > Settings.

Tracker code generator unit tests have also been updated to test for valid tracker code and string escaping with the option enabled.

(Initially used "setIgnoreUrlParams" as the option name as per issue report, but though it would be more consistent to use "setExcludedQueryParams" to match the existing config option and UI terminology.)

Review

@bx80 bx80 added the Needs Review PRs that need a code review label Sep 19, 2021
@bx80
Copy link
Contributor Author

bx80 commented Sep 19, 2021

build js

Copy link
Contributor

@justinvelluppillai justinvelluppillai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have just been looking at this and I can't see existing excluded params from Administration > Websites > Manage appearing in the generated tracking code. Is there something I'm missing?

edit: I see, the API endpoint only pulls the global excluded query params, not those applicable to the specific website I was expecting.

Copy link
Contributor

@justinvelluppillai justinvelluppillai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works here.

@sgiehl
Copy link
Member

sgiehl commented Sep 20, 2021

If I see that correctly the Tracking Code Generator currently automatically adds a setExcludedQueryParams to the tracking code if any excluded params are defined globally or for the website. I'm wondering if it might make sense to make it possible to adjust that value (only within the code generator). @tsteur do you think we should add that?

@justinvelluppillai
Copy link
Contributor

@sgiehl I don't see where it gets the excluded params for the website - looks like it only gets the global ones currently?

@tsteur
Copy link
Member

tsteur commented Sep 20, 2021

@sgiehl not sure. I'm kind of thinking we wouldn't need to expose this to the tracker code generator UI at all as it's rarely needed maybe. I'm not sure if we should expose the global list of excluded URL parameters. Not sure if there could be any risk with this or not. I suppose it might not be. Generally, there is also the list of excluded parameters from the config url_query_parameter_to_exclude_from_url=gclid,fbclid,fb_xd_fragment,fb_comment_id,phpsessid,jsessionid,sessionid,aspsessionid,doing_wp_cron,sid,pk_vid which maybe would need to be added automatically too. However, excluding some parameters like gclid,fbclid can break functionality.

@mattab what are your thoughts? The above logic implements that global excluded config parameter gets exposed in the tracking code generator. Say you have configured this:
image

then it adds below code _paq.push(["setExcludedQueryParams", ["foo","bar"]]);.

image

Having the excludedParameters API parameter in the tracking code generator is 👍 but I would maybe not add any global excluded parameter by default. I would probably also not add a UI to enter extra parameters to exclude as it's simply needed so rarely. On the other side someone could argue when they copy/paste the tracking code they can still remove the line for removing URL parameters if they didn't want to expose this information but often users wouldn't even know what it is.

Note that the global excluded parameters list supports REGEXP which I think the JS logic isn't.

To have a bit more security by default I would exclude these parameters by default phpsessid,jsessionid,sessionid,aspsessionid. There's also sid|zenid etc but these may be used for other purposes in an application meaning we cannot exclude them by default.

js/piwik.js Show resolved Hide resolved
@bx80
Copy link
Contributor Author

bx80 commented Sep 20, 2021

If we're not wanting to expose the excluded parameters in the generated tracking code then maybe we could hash the values when generating the tracking code and then have purify() check those hashes against the hashed value of the parameters? That way it wouldn't be possible for users to just read off a list of interesting parameters from the tracking code.

@tsteur
Copy link
Member

tsteur commented Sep 20, 2021

I reckon with rainbow tables etc the original value could be often found out quite quickly. See internal slack we would for now go only with exposing the website specific excluded parameters.
That means we probably can't exclude any other parameters like phpsessid,jsessionid,sessionid,aspsessionid by default which is fine.

If someone needs to exclude parameters then they can still exclude them manually by adjusting the tracking code.

@sgiehl
Copy link
Member

sgiehl commented Sep 21, 2021

Is there actually any need to include the server side excluded parameters into the tracking code? Imho we could simply not include them there at all. The server side exclusion will still work the same...
But having the option in the tracker nevertheless might make sense, as it allows to remove parameters from being sent along with the tracking request.

@tsteur
Copy link
Member

tsteur commented Sep 21, 2021

@sgiehl we don't need it there. In the issue back then I mentioned we could do it. I reckon now that it's already it can be a good benefit that we show the site specific excluded parameters automatically. It's good for privacy and security to remove these already in the site.

@bx80
Copy link
Contributor Author

bx80 commented Sep 22, 2021

build js

@bx80 bx80 added this to the 4.6.0 milestone Sep 27, 2021
@bx80 bx80 added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Sep 27, 2021
@sgiehl
Copy link
Member

sgiehl commented Sep 27, 2021

build js

@sgiehl
Copy link
Member

sgiehl commented Sep 27, 2021

It could make sense to also reflect those changes in the UI tests. We already have a test that shows the tracking code generator here:

it('should load the Manage > Tracking Code admin page correctly', async function () {
await page.goto("?" + generalParams + "&module=CoreAdminHome&action=trackingCodeGenerator");
// replace container id in tagmanager code, as it changes when updating omnifixture
await page.evaluate(function () {
$('.tagManagerTrackingCode pre').html($('.tagManagerTrackingCode pre').html().replace(/container_[A-z0-9]+\.js/, 'container_REPLACED.js'));
});
pageWrap = await page.$('.pageWrap');
expect(await pageWrap.screenshot()).to.matchImage('admin_manage_tracking_code');
});

So maybe we could simply add some globally excluded site parameters, so they are displayed within the tests. Guess we could add that code somewhere around here:
UsersManagerAPI::getInstance()->addUser('anotheruser', 'anotheruser', 'someemail@email.com');
UsersManagerAPI::getInstance()->setUserAccess('anotheruser', 'view', array(1));
// add xss scheduled report

To ensure some "bad parameters" won't hurt, it might be good to use some xss parameters like:

        \Piwik\Plugins\SitesManager\API::getInstance()->setGlobalExcludedQueryParameters(implode(',', [
            $this->xssTesting->forTwig('excludedParameter'),
            $this->xssTesting->forAngular('excludedParameter'),
        ]));

bx80 and others added 2 commits September 28, 2021 09:50
@sgiehl
Copy link
Member

sgiehl commented Sep 30, 2021

Tried updating the remaining UI tests, but the artifacts upload is currently not working, so I'm not able to check if the failing tests are related. Once the related UI tests are fixed guess this should be ready to merge

@bx80 bx80 merged commit 2404b24 into 4.x-dev Oct 4, 2021
@bx80 bx80 deleted the m-17887-exclude-query-params branch October 4, 2021 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JavaScript Tracker: Add possibility to set URL parameters to ignore
4 participants