Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If "disableBrowserFeatureDetection" is enabled and remember(Cookie)ConsentGiven is called, a new visitor is recognized #19279

Closed
peterbo opened this issue May 26, 2022 · 24 comments
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@peterbo
Copy link
Contributor

peterbo commented May 26, 2022

If "disableBrowserFeatureDetection" is enabled and remember(Cookie)ConsentGiven is called, a new visitor is recognized

Expected Behavior

After calling remember(Cookie)ConsentGiven, the same visitor should be recognized and no new (unique) visitor should be created.

Possible Solution

When re-enabling browserFeatureDetection, try to sync / match the "old" config_id without browserFeatures in the fingerprint.

Steps to Reproduce (for Bugs)

  1. Enable disableBrowserFeatureDetection
  2. Call remember(Cookie)ConsentGiven

Context

Giving consent enables the browserFeatureDetection - https://github.com/matomo-org/matomo/blob/4.x-dev/js/piwik.js#L6030 - That's why the current visior is not recognized again (and a new visitor is created), because the config_id (fingerprint) changed.

Your Environment

  • Matomo Version: 4.10.1
  • Browser: Multiple tested
@peterbo peterbo added the Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. label May 26, 2022
@bx80
Copy link
Contributor

bx80 commented May 30, 2022

Hi @peterbo, thanks for reporting this, the extra detail is very helpful 👍 Our product team will review and assign priority for this issue.

@bx80 bx80 added the Needs priority decision This issue may need to be added to the current milestone by Product Manager label May 30, 2022
@sgiehl sgiehl added Bug For errors / faults / flaws / inconsistencies etc. and removed Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. labels May 30, 2022
@justinvelluppillai justinvelluppillai added this to the For Prioritization milestone Jun 2, 2022
@justinvelluppillai justinvelluppillai removed the Needs priority decision This issue may need to be added to the current milestone by Product Manager label Jun 2, 2022
@peterbo
Copy link
Contributor Author

peterbo commented Aug 8, 2022

Hi @bx80 - is there a current / new status to this ticket? - Currently, this basically renders the consent system unusable. Thank you in advance!

@bx80
Copy link
Contributor

bx80 commented Aug 8, 2022

Hi @peterbo, unfortunately this issue hasn't been prioritized yet so I can't say when it will be worked on. Did this functionality work in a previous release?

@peterbo
Copy link
Contributor Author

peterbo commented Aug 15, 2022

Hi @bx80 - this bug exists since the introduction of the new feature (disableBrowserFeatureDetection).

@bx80 bx80 added the Needs priority decision This issue may need to be added to the current milestone by Product Manager label Aug 16, 2022
@bx80
Copy link
Contributor

bx80 commented Aug 16, 2022

I've tagged this issue for a priority review as it appears to be regression.

@bx80 bx80 added the Regression Indicates a feature used to work in a certain way but it no longer does even though it should. label Aug 16, 2022
@justinvelluppillai justinvelluppillai removed the Needs priority decision This issue may need to be added to the current milestone by Product Manager label Aug 16, 2022
@peterhashair peterhashair self-assigned this Aug 17, 2022
@peterhashair
Copy link
Contributor

Hi @peterbo trying to reproduce locally, but it seems like before the rememberCookieConsentGiven is called, the unique user id won't be generated, once the content is given the user Id will load from cookies.
Here is my config, let me know if I am on the wrong setup.

 _paq.push(['setCookieConsentGiven']);
 _paq.push(['disableBrowserFeatureDetection']);
<a href="#" onclick="window._paq.push(['rememberCookieConsentGiven'])">accept consent</a>

@peterbo
Copy link
Contributor Author

peterbo commented Aug 17, 2022

Hi @peterhashair - thanks for tackling this issue!

Correct - before the consent is given (remember(Cookie)ConsentGiven), the corresponding visitor / session is recognized via config id hash (fingerprint). When you call remember(Cookie)ConsentGiven, the visitor's config id changes, because now, the browser's plugins and resolution (don't know if that's part of the fingerprint currently) is sent by the tracking code additonally.

This is the moment, when, for the last time, a visitor would have to be recognized by the config id (before the visitor id cookie is set) and this fails (the config id was changed, because of the additional information that the tracker is now allowed to collect). The result is, that the same visitor's session is split between pre- and post-consent), which should not be the case

Possible solutions:

  • remove browser plugins from the fingerprint (could make the fingerprint less accurate)
  • search for both fingerprints (with and without browser plugins) to identify the correct visitor pre-consent

@peterhashair
Copy link
Contributor

@peterbo sorry to bother, but when I try to reproduce it when the second page missing requireCookieConsent in the tracking, it states as different visits, but when having it, it tracking as one visit.

@peterhashair peterhashair removed their assignment Aug 24, 2022
@peterbo
Copy link
Contributor Author

peterbo commented Aug 29, 2022

@peterhashair sorry for letting you wait for an answer. I cannot quite follow your explanation. How can I help you reproduce it? But in the end, everything is in the source code - activating the browserFeatureDetection will change the config_id, so the same visitor will split its session. You can see it here: https://github.com/matomo-org/matomo/blob/4.x-dev/core/Tracker/Settings.php#L145 and https://github.com/matomo-org/matomo/blob/4.x-dev/js/piwik.js#L3265

@peterhashair
Copy link
Contributor

@peterbo yes,but there is another code in the piwik.js that skips the detection. it only happens when one page missing requireCookieConsent

@sgiehl
Copy link
Member

sgiehl commented Aug 30, 2022

@peterhashair I think you are on the wrong track here. What @peterbo has reported here is quite simple to reproduce.

        _paq.push(['requireCookieConsent']);
        _paq.push(['disableBrowserFeatureDetection']);
        _paq.push(['trackPageView']);
        _paq.push(['setCookieConsentGiven']);
        _paq.push(['trackPageView']);

The code above will currently result in two visits. This is caused by the fact, that setCookieConsentGiven will indirectly enable the browser feature detection again and will force to set a visitor id cookie, which will result in a different visitor id.

@tsteur I'm wondering if it was on purpose to enable the browser feature detection again as soon as cookie consent is given. I doubt it's expected behaviour if disableBrowserFeatureDetection was previously called on purpose.

@peterbo
Copy link
Contributor Author

peterbo commented Aug 30, 2022

@sgiehl thanks for phrasing / explaining this much better than I was able to!

I'm wondering if it was on purpose to enable the browser feature detection again as soon as cookie consent is given.

Theoretically, following ePrivacy / TTDSG regulation, it should be OK to implicitly re-enable the featureDetection, if consent was given. The tricky part is the changing config_id.

Perhaps, just as a thought, it would be an option to get rid of the browserFeatures as data points of the config_id completely. As an example, I attached a screenshot that looks similar in many instances:

browserFeatures

Cookies: ~100% (if allowed) - not suited as a recognition data point, because nearly everyone supports it
PDF: ~25% - acceptable as a recognition data point, because just 1/4th of visitors support it
Rest of the plugins: ~0-1% - not suited as a recognition data point, because just <0,x% of browsers support them

As a result, we would lose a tiny bit of precision, if the PDF plugin data would not be part of the config_id anymore, but probably negligible.

@sgiehl
Copy link
Member

sgiehl commented Aug 30, 2022

I agree. Removing the browser features from the visitorid hash at all might be an option we should consider. @tsteur what's your opinion on that?

@tsteur
Copy link
Member

tsteur commented Aug 31, 2022

Interesting, we actually have very different stats for these browser plugins. About 80 & 60%.

We could think about removing these browser plugins from the config_id but it certainly can have quite an impact on the number of visitors that will be grouped together. The impact of such a change will depend highly on the kind of traffic (eg B2B vs B2C) and whether the "Also use the Anonymized IP addresses when enriching visits." is enabled or disabled.

When the full IP address is used for creating the config_id, then the impact be likely little. Especially for B2C sites.

When the anonmyised IP address is used for config_id then it can quickly make a difference of say 5-10% where visits are grouped together falsely. Especially for B2B sites.

Potentially, we could not use the browser plugins when full IP address is used, and do use the browser plugins when anonymised IP address is used. But then some users would still have this particular problem and it would make harder to reproduce certain issues so it may not be a good idea.

Random thought: Would a fix work below where we enable browser requests only after the next tracking request?

diff --git a/js/piwik.js b/js/piwik.js
index 1e8d17157e..54005491c5 100644
--- a/js/piwik.js
+++ b/js/piwik.js
@@ -7158,7 +7158,6 @@ if (typeof window.Matomo !== 'object') {
              */
             this.setConsentGiven = function (setCookieConsent) {
                 configHasConsent = true;
-                configBrowserFeatureDetection = true;
                 deleteCookie(CONSENT_REMOVED_COOKIE_NAME, configCookiePath, configCookieDomain);
 
                 var i, requestType;
@@ -7183,6 +7182,8 @@ if (typeof window.Matomo !== 'object') {
                 if (!isDefined(setCookieConsent) || setCookieConsent) {
                     this.setCookieConsentGiven();
                 }
+                
+                configBrowserFeatureDetection = true;
             };

It's probably not guaranteed that a request was sent by the time it reaches the end of the method but a solution could look slightly differently. Potentially we would need to force sending at least one request before enabling browser feature detection. This could then cause an extra tracking request on every page view when consent is given which wouldn't be good. Although typically there should be always at least one tracking request as these methods are typically executed before sending any tracking request. Basically, as soon as Matomo has the visitorId the configId wouldn't be needed anymore and can be different as the visitorId is used for identifying the visit. This requires one initial tracking request with the browser detection disabled. Depending how things are implemented it could result in an extra request I assume. We should only send that extra request when we don't have a visitorId yet.

Not sure if any of that makes sense? I'm meaning there may be also a way to fix this problem without needing to change the config_id generation and without sacrificing data quality.

@peterbo
Copy link
Contributor Author

peterbo commented Aug 31, 2022

Hey @tsteur thanks for your detailed thoughts on that one! I totally agree with you, that the precision of recognition is one of the most critical things in Matomo. I'll just comment a few statements, not necessarily because I oppose them (I'm also absolutely not sure, what's the best solution), but just to have another opinion.

I would not steer in the direction of using the full IP address for everything / for enriching visits. This would probably re-introduce problems, that we left behind a long time ago and make the whole privacy statement topic a nightmare.

Just a few thoughts on what you wrote:

When the anonmyised IP address is used for config_id then it can quickly make a difference of say 5-10% where visits are grouped together falsely. Especially for B2B sites.

In business environments, recognition / grouping of visitors is most difficult (at least without cookies). Mostly because, a lot of users use the same IP address and have the same Browser (& settings). But probably, we (in my opinion) wouldn't benefit from the browser plugin data points, because Browser configuration is the same (centralized administration of workstations). Or employees using their own browsers (visitor recognition would work again as in "normal" environments).

Random thought: Would a fix work below where we enable browser requests only after the next tracking request?

Interesting! - If I understand correctly, you're delaying the request with the browser features (next PageView), set a cookie and on the next PageView, the recognition works with the cookie (even if the config_id has changed).
But we would limit Cookieless setups with that, using the consent just as a switch between tracking yes/no, but do not want cookies. Also, if the Cookie is lost (e.g. requireCookieConsent on every page), we would split the visit again.

What about a config option to activate/disable the use of browser plugins (corresponding to all data that is added with the enabled browser feature detection) in the config_id (default true)? - Then users could assess, if that setting would have a big or small impact on their fingerprint precision, based on their visitor demographic (B2B/C, tech savvy, etc.). It's not ideal to have a feature not working perfectly without a corresponding config setting, but that's the best idea I can come up with currently, to make things not overly complex.

@tsteur
Copy link
Member

tsteur commented Sep 1, 2022

I would not steer in the direction of using the full IP address for everything / for enriching visits. This would probably re-introduce problems, that we left behind a long time ago and make the whole privacy statement topic a nightmare.

Just FYI I only mentioned it because it's already implemented that way. Depending on the setting the full or the anonymised IP is used in the config_id.

In business environments, recognition / grouping of visitors is most difficult (at least without cookies). Mostly because, a lot of users use the same IP address and have the same Browser (& settings).

Yes, totally.

What about a config option to activate/disable the use of browser plugins (corresponding to all data that is added with the enabled browser feature detection) in the config_id (default true)? -

This could work. It's just that almost nobody will find this config and it's hard to figure this out and hard to change etc. If we can find a solution that works in most or all cases that be great.

Interesting! - If I understand correctly, you're delaying the request with the browser features (next PageView), set a cookie and on the next PageView, the recognition works with the cookie (even if the config_id has changed).

Basically, as soon as the consent is given and we set consent and therefore a cookie with the visitorId, then we need to make sure that a request is sent so Matomo identifies the user by visitorId and not config_id. If a visitorId is already set, then we basically don't need to do this.

But we would limit Cookieless setups with that, using the consent just as a switch between tracking yes/no, but do not want cookies. Also, if the Cookie is lost (e.g. requireCookieConsent on every page), we would split the visit again.

Sorry can't fully follow you there. If you have cookieless setup then you don't run into this problem anyway maybe? Generally, the cookie should only get lost when the user removes consent or if the cookie lifetime is reached (or manually clears cookies or the browser deletes cookies after certain amount of days which can happen in some browsers).

The config_id can only identify the same visitor for the same day anyway though. So this shouldn't be an issue. That's because the config_id salt changes every day and the config_id cannot be used to identify the visitor over multiple days. Meaning I can't really think of too many cases where this could cause an issue except if the user would delete cookies manually or does remove consent specifically.

For people that do use remember[Cookie]ConsentGiven a fix could be maybe already below code:

diff --git a/js/piwik.js b/js/piwik.js
index 1e8d17157e..97d1c4a31a 100644
--- a/js/piwik.js
+++ b/js/piwik.js
@@ -6595,8 +6595,10 @@ if (typeof window.Matomo !== 'object') {
                 }
             };
 
-            this.disableBrowserFeatureDetection = function () {
-                configBrowserFeatureDetection = false;
+            this.disableBrowserFeatureDetection = function (force) {
+                if (force || (!this.getRememberedCookieConsent() && !this.getRememberedConsent())) {
+                  configBrowserFeatureDetection = false;
+                }
             };

There may be also a fix for setConsentGiven depending how we structure things. It would potentially need an extra tracking request though when the user opts in the first time.

It be great if we could check first if there's a solution in the tracking code maybe to have this use case working and as accurate data ideally as possible without needing a config :)

@sgiehl
Copy link
Member

sgiehl commented Sep 6, 2022

After reading the last comments I was actually a bit confused. It sound a bit like we are mixing up the visitorId that is generated in javascript tracker and the config_id, that might be generated on php side if there is no other id yet.

Anyway, looking through the code it looks like if the browser feature detection is disabled initially and a tracking request is sent, Matomo will create a visit, with all browser plugins set to 0 and browser / os detection will not use client hints. If then browser feature detection would be enabled client side and a new tracking request is sent, the request would include the additional browser details. But our current implementation in php would actually throw those new details away. The related visitor column classes only record a value on new visit, but don't update the values. See e.g.

public function onNewVisit(Request $request, Visitor $visitor, $action)
{
return Common::getRequestVar('cookie', 0, 'int', $request->getParams());
}

This should also apply when requireConsent is used. If a first tracking request is being queued before consent was given, the request in the queue will not contain any browser feature details. So the initial visit will be created without them.

Browser plugins will thus always be reported as 0, if the first tracking request is triggered without browser feature detection, even if the user would later give consent.

So from a logical point of view it does not have any effect to enable the browser feature detection again in JS in any case.

@tsteur regarding the solutions you suggested:
Enabling the browser features again, after the cookie has been set, should solve the reported issue I guess, but as mentioned above, it would actually not have any benefit, as the additional data isn't used.

@tsteur
Copy link
Member

tsteur commented Sep 6, 2022

Enabling the browser features again, after the cookie has been set, should solve the reported issue I guess, but as mentioned above, it would actually not have any benefit, as the additional data isn't used.

I guess the tracker can't know if we're dealing with a new or an existing visit (especially if/when server side different session lifetime is configured)? I guess it's also about subsequent visits. Not sure how much this plays a role.

FYI I was mentioning visitorId because once we have sent a request using the visitorId to Matomo, then the config_id becomes irrelevant as we can identify the visitor using the visitorId from the cookie.

I guess it's mainly about the first request after opting in so Matomo can find the correct visit using the config_id to set the correct visitorId. The problem may be for example when there's no tracking request following after that. Then it would not update the correct visit I assume unless we forced a request when needed. I haven't thought this through though about implications etc.

@justinvelluppillai justinvelluppillai modified the milestones: 4.12.0, 5.0.0 Oct 5, 2022
@peterbo
Copy link
Contributor Author

peterbo commented Dec 2, 2022

Hey @tsteur,

I guess it's mainly about the first request after opting in so Matomo can find the correct visit using the config_id to set the correct visitorId. The problem may be for example when there's no tracking request following after that. Then it would not update the correct visit I assume unless we forced a request when needed. I haven't thought this through though about implications etc.

Yes, that would probably fix the issue (to use the correct visitorId). But the tracker doesn't know the visitorId (because it is only known serverside and keyed with the configId).
Currently, unfortunately, the "best-practice" model (tracking without consent (no cookies, no browser features), and adding browser features & cookies after consent) does not work. Can I help getting this fixed ìn the 4.x branch through testing or other assistance?

@sgiehl
Copy link
Member

sgiehl commented Dec 6, 2022

To me it seems there are at least two issues around queuing tracking request before consent is given:

  • Queued requests and the requests tracked, after consent was given, will not end up in the same visit.
    This could be solved by enabling the browser feature detection after processing the consent request queue, as it would set a cookie, that would be be reused for ongoing requests.

  • Queued request (and resulting visit) will never contain any browser feature details (as such data won't be updated in ongoing requests)
    To solve this we might need to adjust the way we are building the tracking requests. Currently the full request url will be built and pushed to the consent queue. Instead we might need to only include parameters in the generated & queued tracking urls that might not change when enabling browser feature detection. So we would attach those additional parameters in a later step, before the request is finally sent out.

@tsteur any further thoughts on that one?

@tsteur
Copy link
Member

tsteur commented Dec 7, 2022

Currently the full request url will be built and pushed to the consent queue. Instead we might need to only include parameters in the generated & queued tracking urls that might not change when enabling browser feature detection. So we would attach those additional parameters in a later step, before the request is finally sent out.

I'm thinking that can cause maybe issues for example in single page application where you might do something like below?

user is on /foo
queueRequest('...') is called
... then clicks on something and the SPA changes to /foo/bar
... the app will call setCustomUrl('...')
the queued request would be executed and it would attribute the request to the wrong URL

Just a thought.

@sgiehl
Copy link
Member

sgiehl commented Dec 7, 2022

@tsteur I would suggest to only attach those browser feature parameters later that are subject to change (when consent is given). So parameters like url, time, referrer and so on would be already in the url, but cookie, plugins and client hints would be added before the request is sent.

@tsteur
Copy link
Member

tsteur commented Dec 8, 2022

Nice, that could work 👍

@sgiehl
Copy link
Member

sgiehl commented Dec 12, 2022

@peterbo you could check if the changes in #20096 would solve the issue for you.

@mattab mattab added the 5.0.0 label Jan 4, 2023
@sgiehl sgiehl closed this as completed Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
Development

No branches or pull requests

7 participants