New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the fingerprint #15886
Limit the fingerprint #15886
Conversation
config/global.ini.php
Outdated
@@ -761,6 +761,15 @@ | |||
; Note: setting this to 0 increases your users' privacy. | |||
enable_fingerprinting_across_websites = 0 | |||
|
|||
; When enabled (set to 1) and cookies are disabled in the tracker, then the fingerprint will be restricted even further | |||
; to prevent a visitor to be tracked over multiple days. This means typically that no cookie consent is needed when cookies | |||
; are disabled. You may disable this feature if you don't want to use cookies in the browser, but still be able to get more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may disable this feature if you don't want to use cookies in the browser, but still be able to get more accurate unique visitor numbers.
Actually afaik, you would disable this feature only if you have set a window_look_back_for_visitor
value higher than 12 or 24 hours. By default window_look_back_for_visitor = 0 which means we only look back 30min to match a visit. Returning visitors wouldn't even be matched from 2 hours ago. So it's only when cookies are disabled and window_look_back_for_visitor > say 43200 then it might be useful to deactivate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you certain fingerprint is only looked back 30 minutes by default? Then I'd remove the config setting, and also the extra tracking parameter cd
when cookies are disabled and we simply do apply the shorter fingerprint always unless window_look_back_for_visitor > 24 hours
(but even then it would be only needed probably when not a new visit should be created after midnight). So we could apply this more privacy friendly fingerprint pretty much always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the code, yes it looks like the window_look_back_for_visitor is only used for fingerprint and defaults to 30min (defaults to the visit duration setting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to see this pr. this will improve privacy and limiting the fingerprint so it's clear that people can use without consent for cookie-less analytics.
PS: didn't review the code
core/Updates/3.13.6-b1.php
Outdated
{ | ||
$config = Config::getInstance(); | ||
$tracker = $config->Tracker; | ||
$tracker['enable_fingerprinting_limited_when_cookie_disabled'] = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we do this only when window_look_back_for_visitor > say 43200? (if window_look_back_for_visitor is not custom, then no impact to users afaik)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned we're creating new visits after midnight anyway... so this should probably not be important...
unless create_new_visit_after_midnight = 0
. Right now it looks like we can safely always apply the shorter fingerprint.
|
||
public function getDateString(Date $date, $timezone) | ||
{ | ||
$dateString = Date::factory($date->getTimestampUTC(), $timezone)->toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not 100% sure if this is correct? Had initially Date::factory()->setTimezone($timezone)
but then noticed in tests it might not return correct values unless I'm expecting the wrong values. In ArchiveProcessor we're using for example setTimezone
: https://github.com/matomo-org/matomo/blob/3.13.5/core/ArchiveProcessor/Parameters.php#L166 so would have maybe expected setTimezone
is right but then in CronArchive we're also using it as a parameter https://github.com/matomo-org/matomo/blob/3.13.5/core/CronArchive.php#L1504 ... looking at the test results the parameter seems to give expected result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess that's something we should probably check again with #13829
@@ -794,8 +794,8 @@ | |||
<eventCategory>Music</eventCategory> | |||
<eventAction>play50%</eventAction> | |||
<bandwidth /> | |||
<timeSpent>60</timeSpent> | |||
<timeSpentPretty>1 min 0s</timeSpentPretty> | |||
<timeSpent>0</timeSpent> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't figure out where these changes are from. The tests seem to pass in 3.x-dev.
Not sure how this would change in this PR since in the tests itself the fingerprintSalt is basically not set and nothing changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the tests it seems the changes to the test files can be reverted.
Should be ready for a review. If I see things right, nothing changes for users in the end but hard to say. |
Haven't looked deeper into the code yet. But wouldn't changing the config id will have the effect, that every visitor will be a new visitor after the update? 🤔 |
32e0cb2
to
d265bf4
Compare
d265bf4
to
6968f2d
Compare
@sgiehl did you maybe have another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some short local testing and it seems to work as expected I guess.
core/Config/IniFileChain.php
Outdated
} | ||
return $result; | ||
} else { | ||
return preg_replace('/[^a-zA-Z0-9_-]/', '', $value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually causes problems doing something like this:
./console config:set --section="General" --key="login_whitelist_ip[]" --value="127.0.0.1"
as it will result in login_whitelist_ip = "127.0.0.1"
in the config file, causing all previously set ips in the array to be removed...
Maybe it would be better to throw an exception if unexpected chars occur instead of silently removing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't solve this problem, because [...]
is valid in this case, but ]asldkjf
, eg, is not. Throwing an exception isn't an issue for me though so if that is more desired than removing the chars then that's fine too.
@@ -794,8 +794,8 @@ | |||
<eventCategory>Music</eventCategory> | |||
<eventAction>play50%</eventAction> | |||
<bandwidth /> | |||
<timeSpent>60</timeSpent> | |||
<timeSpentPretty>1 min 0s</timeSpentPretty> | |||
<timeSpent>0</timeSpent> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the tests it seems the changes to the test files can be reverted.
|
||
public function getDateString(Date $date, $timezone) | ||
{ | ||
$dateString = Date::factory($date->getTimestampUTC(), $timezone)->toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess that's something we should probably check again with #13829
Co-authored-by: Stefan Giehl <stefan@matomo.org>
Co-authored-by: Stefan Giehl <stefan@matomo.org>
@tsteur I've reverted the test file changes and the tests seem to pass again. From my point a view the changes look fine now. Maybe you can also have a look at the changes from @diosmosis. Otherwise feel free to merge |
The purpose of the fingerprint is to put all tracking requests within the same visit into the same visit in the DB. A fingerprint is basically valid for up to 30 minutes after the visit ends (unless different visit length is configured).
By adding a random hash to the fingerprint we make sure when a user disables cookies, a visitor cannot be identified across multiple days and such no cookie consent should be needed.
This change should not really change behaviour for anyone AFAIK. In Matomo 4 we could remove the check for
create_new_visit_after_midnight
potentially. We cannot limit the fingerprint in that case as when someone hascreate_new_visit_after_midnight=0
and has cookies disabled, then a new visit would be still created after midnight since the configId would change.The limited fingerprint will also not apply when enabling
[Tracker]enable_fingerprinting_across_websites = 1
since different sites can have different timezones and therefore we can't add any date to the fingerprint.refs #13655