@peterbo opened this Issue on May 26th 2022 Contributor

If "disableBrowserFeatureDetection" is enabled and remember(Cookie)ConsentGiven is called, a new visitor is recognized

Expected Behavior

After calling remember(Cookie)ConsentGiven, the same visitor should be recognized and no new (unique) visitor should be created.

Possible Solution

When re-enabling browserFeatureDetection, try to sync / match the "old" config_id without browserFeatures in the fingerprint.

Steps to Reproduce (for Bugs)

  1. Enable disableBrowserFeatureDetection
  2. Call remember(Cookie)ConsentGiven

Context

Giving consent enables the browserFeatureDetection - https://github.com/matomo-org/matomo/blob/4.x-dev/js/piwik.js#L6030 - That's why the current visior is not recognized again (and a new visitor is created), because the config_id (fingerprint) changed.

Your Environment

  • Matomo Version: 4.10.1
  • Browser: Multiple tested
@bx80 commented on May 30th 2022 Contributor

Hi @peterbo, thanks for reporting this, the extra detail is very helpful :+1: Our product team will review and assign priority for this issue.

@peterbo commented on August 8th 2022 Contributor

Hi @bx80 - is there a current / new status to this ticket? - Currently, this basically renders the consent system unusable. Thank you in advance!

@bx80 commented on August 8th 2022 Contributor

Hi @peterbo, unfortunately this issue hasn't been prioritized yet so I can't say when it will be worked on. Did this functionality work in a previous release?

@peterbo commented on August 15th 2022 Contributor

Hi @bx80 - this bug exists since the introduction of the new feature (disableBrowserFeatureDetection).

@bx80 commented on August 16th 2022 Contributor

I've tagged this issue for a priority review as it appears to be regression.

@peterhashair commented on August 17th 2022 Contributor

Hi @peterbo trying to reproduce locally, but it seems like before the rememberCookieConsentGiven is called, the unique user id won't be generated, once the content is given the user Id will load from cookies.
Here is my config, let me know if I am on the wrong setup.

 _paq.push(['setCookieConsentGiven']);
 _paq.push(['disableBrowserFeatureDetection']);
<a href="#" onclick="window._paq.push(['rememberCookieConsentGiven'])">accept consent</a>
@peterbo commented on August 17th 2022 Contributor

Hi @peterhashair - thanks for tackling this issue!

Correct - before the consent is given (remember(Cookie)ConsentGiven), the corresponding visitor / session is recognized via config id hash (fingerprint). When you call remember(Cookie)ConsentGiven, the visitor's config id changes, because now, the browser's plugins and resolution (don't know if that's part of the fingerprint currently) is sent by the tracking code additonally.

This is the moment, when, for the last time, a visitor would have to be recognized by the config id (before the visitor id cookie is set) and this fails (the config id was changed, because of the additional information that the tracker is now allowed to collect). The result is, that the same visitor's session is split between pre- and post-consent), which should not be the case

Possible solutions:

  • remove browser plugins from the fingerprint (could make the fingerprint less accurate)
  • search for both fingerprints (with and without browser plugins) to identify the correct visitor pre-consent
@peterhashair commented on August 22nd 2022 Contributor

@peterbo sorry to bother, but when I try to reproduce it when the second page missing requireCookieConsent in the tracking, it states as different visits, but when having it, it tracking as one visit.

@peterbo commented on August 29th 2022 Contributor

@peterhashair sorry for letting you wait for an answer. I cannot quite follow your explanation. How can I help you reproduce it? But in the end, everything is in the source code - activating the browserFeatureDetection will change the config_id, so the same visitor will split its session. You can see it here: https://github.com/matomo-org/matomo/blob/4.x-dev/core/Tracker/Settings.php#L145 and https://github.com/matomo-org/matomo/blob/4.x-dev/js/piwik.js#L3265

@peterhashair commented on August 30th 2022 Contributor

@peterbo yes,but there is another code in the piwik.js that skips the detection. it only happens when one page missing requireCookieConsent

@sgiehl commented on August 30th 2022 Member

@peterhashair I think you are on the wrong track here. What @peterbo has reported here is quite simple to reproduce.

        _paq.push(['requireCookieConsent']);
        _paq.push(['disableBrowserFeatureDetection']);
        _paq.push(['trackPageView']);
        _paq.push(['setCookieConsentGiven']);
        _paq.push(['trackPageView']);

The code above will currently result in two visits. This is caused by the fact, that setCookieConsentGiven will indirectly enable the browser feature detection again and will force to set a visitor id cookie, which will result in a different visitor id.

@tsteur I'm wondering if it was on purpose to enable the browser feature detection again as soon as cookie consent is given. I doubt it's expected behaviour if disableBrowserFeatureDetection was previously called on purpose.

@peterbo commented on August 30th 2022 Contributor

@sgiehl thanks for phrasing / explaining this much better than I was able to!

I'm wondering if it was on purpose to enable the browser feature detection again as soon as cookie consent is given.

Theoretically, following ePrivacy / TTDSG regulation, it should be OK to implicitly re-enable the featureDetection, if consent was given. The tricky part is the changing config_id.

Perhaps, just as a thought, it would be an option to get rid of the browserFeatures as data points of the config_id completely. As an example, I attached a screenshot that looks similar in many instances:

browserFeatures

Cookies: ~100% (if allowed) - not suited as a recognition data point, because nearly everyone supports it
PDF: ~25% - acceptable as a recognition data point, because just 1/4th of visitors support it
Rest of the plugins: ~0-1% - not suited as a recognition data point, because just <0,x% of browsers support them

As a result, we would lose a tiny bit of precision, if the PDF plugin data would not be part of the config_id anymore, but probably negligible.

@sgiehl commented on August 30th 2022 Member

I agree. Removing the browser features from the visitorid hash at all might be an option we should consider. @tsteur what's your opinion on that?

@tsteur commented on August 31st 2022 Member

Interesting, we actually have very different stats for these browser plugins. About 80 & 60%.

We could think about removing these browser plugins from the config_id but it certainly can have quite an impact on the number of visitors that will be grouped together. The impact of such a change will depend highly on the kind of traffic (eg B2B vs B2C) and whether the "Also use the Anonymized IP addresses when enriching visits." is enabled or disabled.

When the full IP address is used for creating the config_id, then the impact be likely little. Especially for B2C sites.

When the anonmyised IP address is used for config_id then it can quickly make a difference of say 5-10% where visits are grouped together falsely. Especially for B2B sites.

Potentially, we could not use the browser plugins when full IP address is used, and do use the browser plugins when anonymised IP address is used. But then some users would still have this particular problem and it would make harder to reproduce certain issues so it may not be a good idea.

Random thought: Would a fix work below where we enable browser requests only after the next tracking request?

diff --git a/js/piwik.js b/js/piwik.js
index 1e8d17157e..54005491c5 100644
--- a/js/piwik.js
+++ b/js/piwik.js
@@ -7158,7 +7158,6 @@ if (typeof window.Matomo !== 'object') {
              */
             this.setConsentGiven = function (setCookieConsent) {
                 configHasConsent = true;
-                configBrowserFeatureDetection = true;
                 deleteCookie(CONSENT_REMOVED_COOKIE_NAME, configCookiePath, configCookieDomain);

                 var i, requestType;
@@ -7183,6 +7182,8 @@ if (typeof window.Matomo !== 'object') {
                 if (!isDefined(setCookieConsent) || setCookieConsent) {
                     this.setCookieConsentGiven();
                 }
+                
+                configBrowserFeatureDetection = true;
             };

It's probably not guaranteed that a request was sent by the time it reaches the end of the method but a solution could look slightly differently. Potentially we would need to force sending at least one request before enabling browser feature detection. This could then cause an extra tracking request on every page view when consent is given which wouldn't be good. Although typically there should be always at least one tracking request as these methods are typically executed before sending any tracking request. Basically, as soon as Matomo has the visitorId the configId wouldn't be needed anymore and can be different as the visitorId is used for identifying the visit. This requires one initial tracking request with the browser detection disabled. Depending how things are implemented it could result in an extra request I assume. We should only send that extra request when we don't have a visitorId yet.

Not sure if any of that makes sense? I'm meaning there may be also a way to fix this problem without needing to change the config_id generation and without sacrificing data quality.

@peterbo commented on August 31st 2022 Contributor

Hey @tsteur thanks for your detailed thoughts on that one! I totally agree with you, that the precision of recognition is one of the most critical things in Matomo. I'll just comment a few statements, not necessarily because I oppose them (I'm also absolutely not sure, what's the best solution), but just to have another opinion.

I would not steer in the direction of using the full IP address for everything / for enriching visits. This would probably re-introduce problems, that we left behind a long time ago and make the whole privacy statement topic a nightmare.

Just a few thoughts on what you wrote:

When the anonmyised IP address is used for config_id then it can quickly make a difference of say 5-10% where visits are grouped together falsely. Especially for B2B sites.

In business environments, recognition / grouping of visitors is most difficult (at least without cookies). Mostly because, a lot of users use the same IP address and have the same Browser (& settings). But probably, we (in my opinion) wouldn't benefit from the browser plugin data points, because Browser configuration is the same (centralized administration of workstations). Or employees using their own browsers (visitor recognition would work again as in "normal" environments).

Random thought: Would a fix work below where we enable browser requests only after the next tracking request?

Interesting! - If I understand correctly, you're delaying the request with the browser features (next PageView), set a cookie and on the next PageView, the recognition works with the cookie (even if the config_id has changed).
But we would limit Cookieless setups with that, using the consent just as a switch between tracking yes/no, but do not want cookies. Also, if the Cookie is lost (e.g. requireCookieConsent on every page), we would split the visit again.

What about a config option to activate/disable the use of browser plugins (corresponding to all data that is added with the enabled browser feature detection) in the config_id (default true)? - Then users could assess, if that setting would have a big or small impact on their fingerprint precision, based on their visitor demographic (B2B/C, tech savvy, etc.). It's not ideal to have a feature not working perfectly without a corresponding config setting, but that's the best idea I can come up with currently, to make things not overly complex.

@tsteur commented on September 1st 2022 Member

I would not steer in the direction of using the full IP address for everything / for enriching visits. This would probably re-introduce problems, that we left behind a long time ago and make the whole privacy statement topic a nightmare.

Just FYI I only mentioned it because it's already implemented that way. Depending on the setting the full or the anonymised IP is used in the config_id.

In business environments, recognition / grouping of visitors is most difficult (at least without cookies). Mostly because, a lot of users use the same IP address and have the same Browser (& settings).

Yes, totally.

What about a config option to activate/disable the use of browser plugins (corresponding to all data that is added with the enabled browser feature detection) in the config_id (default true)? -

This could work. It's just that almost nobody will find this config and it's hard to figure this out and hard to change etc. If we can find a solution that works in most or all cases that be great.

Interesting! - If I understand correctly, you're delaying the request with the browser features (next PageView), set a cookie and on the next PageView, the recognition works with the cookie (even if the config_id has changed).

Basically, as soon as the consent is given and we set consent and therefore a cookie with the visitorId, then we need to make sure that a request is sent so Matomo identifies the user by visitorId and not config_id. If a visitorId is already set, then we basically don't need to do this.

But we would limit Cookieless setups with that, using the consent just as a switch between tracking yes/no, but do not want cookies. Also, if the Cookie is lost (e.g. requireCookieConsent on every page), we would split the visit again.

Sorry can't fully follow you there. If you have cookieless setup then you don't run into this problem anyway maybe? Generally, the cookie should only get lost when the user removes consent or if the cookie lifetime is reached (or manually clears cookies or the browser deletes cookies after certain amount of days which can happen in some browsers).

The config_id can only identify the same visitor for the same day anyway though. So this shouldn't be an issue. That's because the config_id salt changes every day and the config_id cannot be used to identify the visitor over multiple days. Meaning I can't really think of too many cases where this could cause an issue except if the user would delete cookies manually or does remove consent specifically.

For people that do use remember[Cookie]ConsentGiven a fix could be maybe already below code:

diff --git a/js/piwik.js b/js/piwik.js
index 1e8d17157e..97d1c4a31a 100644
--- a/js/piwik.js
+++ b/js/piwik.js
@@ -6595,8 +6595,10 @@ if (typeof window.Matomo !== 'object') {
                 }
             };

-            this.disableBrowserFeatureDetection = function () {
-                configBrowserFeatureDetection = false;
+            this.disableBrowserFeatureDetection = function (force) {
+                if (force || (!this.getRememberedCookieConsent() && !this.getRememberedConsent())) {
+                  configBrowserFeatureDetection = false;
+                }
             };

There may be also a fix for setConsentGiven depending how we structure things. It would potentially need an extra tracking request though when the user opts in the first time.

It be great if we could check first if there's a solution in the tracking code maybe to have this use case working and as accurate data ideally as possible without needing a config :)

@sgiehl commented on September 6th 2022 Member

After reading the last comments I was actually a bit confused. It sound a bit like we are mixing up the visitorId that is generated in javascript tracker and the config_id, that might be generated on php side if there is no other id yet.

Anyway, looking through the code it looks like if the browser feature detection is disabled initially and a tracking request is sent, Matomo will create a visit, with all browser plugins set to 0 and browser / os detection will not use client hints. If then browser feature detection would be enabled client side and a new tracking request is sent, the request would include the additional browser details. But our current implementation in php would actually throw those new details away. The related visitor column classes only record a value on new visit, but don't update the values. See e.g. https://github.com/matomo-org/matomo/blob/3c64683bf85aee739bc7aa18eeefe9dcf4c7896c/plugins/DevicePlugins/Columns/PluginCookie.php#L34-L37

This should also apply when requireConsent is used. If a first tracking request is being queued before consent was given, the request in the queue will not contain any browser feature details. So the initial visit will be created without them.

Browser plugins will thus always be reported as 0, if the first tracking request is triggered without browser feature detection, even if the user would later give consent.

So from a logical point of view it does not have any effect to enable the browser feature detection again in JS in any case.

@tsteur regarding the solutions you suggested:
Enabling the browser features again, after the cookie has been set, should solve the reported issue I guess, but as mentioned above, it would actually not have any benefit, as the additional data isn't used.

@tsteur commented on September 6th 2022 Member

Enabling the browser features again, after the cookie has been set, should solve the reported issue I guess, but as mentioned above, it would actually not have any benefit, as the additional data isn't used.

I guess the tracker can't know if we're dealing with a new or an existing visit (especially if/when server side different session lifetime is configured)? I guess it's also about subsequent visits. Not sure how much this plays a role.

FYI I was mentioning visitorId because once we have sent a request using the visitorId to Matomo, then the config_id becomes irrelevant as we can identify the visitor using the visitorId from the cookie.

I guess it's mainly about the first request after opting in so Matomo can find the correct visit using the config_id to set the correct visitorId. The problem may be for example when there's no tracking request following after that. Then it would not update the correct visit I assume unless we forced a request when needed. I haven't thought this through though about implications etc.

Powered by GitHub Issue Mirror