(This is a re-submission of PR 17887 to work around merge failure)
_paq.push(["setExcludedQueryParams", ["uid", "sid"]]);
Existing purify() and removeUrlParameter() functions were reused.
Tracker code generator unit tests have also been updated to test for valid tracker code and string escaping with the option enabled.
(Initially used "setIgnoreUrlParams" as the option name as per issue report, but though it would be more consistent to use "setExcludedQueryParams" to match the existing config option and UI terminology.)
If I see that correctly the Tracking Code Generator currently automatically adds a
setExcludedQueryParams to the tracking code if any excluded params are defined globally or for the website. I'm wondering if it might make sense to make it possible to adjust that value (only within the code generator). @tsteur do you think we should add that?
@sgiehl I don't see where it gets the excluded params for the website - looks like it only gets the global ones currently?
@sgiehl not sure. I'm kind of thinking we wouldn't need to expose this to the tracker code generator UI at all as it's rarely needed maybe. I'm not sure if we should expose the global list of excluded URL parameters. Not sure if there could be any risk with this or not. I suppose it might not be. Generally, there is also the list of excluded parameters from the config
url_query_parameter_to_exclude_from_url=gclid,fbclid,fb_xd_fragment,fb_comment_id,phpsessid,jsessionid,sessionid,aspsessionid,doing_wp_cron,sid,pk_vid which maybe would need to be added automatically too. However, excluding some parameters like
gclid,fbclid can break functionality.
@mattab what are your thoughts? The above logic implements that global excluded config parameter gets exposed in the tracking code generator. Say you have configured this:
then it adds below code
Having the excludedParameters API parameter in the tracking code generator is 👍 but I would maybe not add any global excluded parameter by default. I would probably also not add a UI to enter extra parameters to exclude as it's simply needed so rarely. On the other side someone could argue when they copy/paste the tracking code they can still remove the line for removing URL parameters if they didn't want to expose this information but often users wouldn't even know what it is.
Note that the global excluded parameters list supports REGEXP which I think the JS logic isn't.
To have a bit more security by default I would exclude these parameters by default
phpsessid,jsessionid,sessionid,aspsessionid. There's also
sid|zenid etc but these may be used for other purposes in an application meaning we cannot exclude them by default.
If we're not wanting to expose the excluded parameters in the generated tracking code then maybe we could hash the values when generating the tracking code and then have purify() check those hashes against the hashed value of the parameters? That way it wouldn't be possible for users to just read off a list of interesting parameters from the tracking code.
I reckon with rainbow tables etc the original value could be often found out quite quickly. See internal slack we would for now go only with exposing the website specific excluded parameters.
That means we probably can't exclude any other parameters like
phpsessid,jsessionid,sessionid,aspsessionid by default which is fine.
If someone needs to exclude parameters then they can still exclude them manually by adjusting the tracking code.
Is there actually any need to include the server side excluded parameters into the tracking code? Imho we could simply not include them there at all. The server side exclusion will still work the same...
But having the option in the tracker nevertheless might make sense, as it allows to remove parameters from being sent along with the tracking request.
@sgiehl we don't need it there. In the issue back then I mentioned we could do it. I reckon now that it's already it can be a good benefit that we show the site specific excluded parameters automatically. It's good for privacy and security to remove these already in the site.
It could make sense to also reflect those changes in the UI tests. We already have a test that shows the tracking code generator here: https://github.com/matomo-org/matomo/blob/03dbff2aaff77b93cef5158e8509f3c370947748/tests/UI/specs/UIIntegration_spec.js#L793-L803
So maybe we could simply add some globally excluded site parameters, so they are displayed within the tests. Guess we could add that code somewhere around here: https://github.com/matomo-org/matomo/blob/8c3f495c64a5209f0110f49588c4840ff2cd07fe/tests/PHPUnit/Fixtures/UITestFixture.php#L107-L110
To ensure some "bad parameters" won't hurt, it might be good to use some xss parameters like:
\Piwik\Plugins\SitesManager\API::getInstance()->setGlobalExcludedQueryParameters(implode(',', [ $this->xssTesting->forTwig('excludedParameter'), $this->xssTesting->forAngular('excludedParameter'), ]));
Tried updating the remaining UI tests, but the artifacts upload is currently not working, so I'm not able to check if the failing tests are related. Once the related UI tests are fixed guess this should be ready to merge