Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the fingerprint #15886

Merged
merged 12 commits into from May 21, 2020
Merged

Limit the fingerprint #15886

merged 12 commits into from May 21, 2020

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Apr 29, 2020

The purpose of the fingerprint is to put all tracking requests within the same visit into the same visit in the DB. A fingerprint is basically valid for up to 30 minutes after the visit ends (unless different visit length is configured).

By adding a random hash to the fingerprint we make sure when a user disables cookies, a visitor cannot be identified across multiple days and such no cookie consent should be needed.

This change should not really change behaviour for anyone AFAIK. In Matomo 4 we could remove the check for create_new_visit_after_midnight potentially. We cannot limit the fingerprint in that case as when someone has create_new_visit_after_midnight=0 and has cookies disabled, then a new visit would be still created after midnight since the configId would change.

The limited fingerprint will also not apply when enabling [Tracker]enable_fingerprinting_across_websites = 1 since different sites can have different timezones and therefore we can't add any date to the fingerprint.

refs #13655

@tsteur tsteur added the Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. label Apr 29, 2020
@tsteur tsteur added this to the 3.13.6 milestone Apr 29, 2020
@@ -761,6 +761,15 @@
; Note: setting this to 0 increases your users' privacy.
enable_fingerprinting_across_websites = 0

; When enabled (set to 1) and cookies are disabled in the tracker, then the fingerprint will be restricted even further
; to prevent a visitor to be tracked over multiple days. This means typically that no cookie consent is needed when cookies
; are disabled. You may disable this feature if you don't want to use cookies in the browser, but still be able to get more
Copy link
Member

@mattab mattab Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may disable this feature if you don't want to use cookies in the browser, but still be able to get more accurate unique visitor numbers.

Actually afaik, you would disable this feature only if you have set a window_look_back_for_visitor value higher than 12 or 24 hours. By default window_look_back_for_visitor = 0 which means we only look back 30min to match a visit. Returning visitors wouldn't even be matched from 2 hours ago. So it's only when cookies are disabled and window_look_back_for_visitor > say 43200 then it might be useful to deactivate this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain fingerprint is only looked back 30 minutes by default? Then I'd remove the config setting, and also the extra tracking parameter cd when cookies are disabled and we simply do apply the shorter fingerprint always unless window_look_back_for_visitor > 24 hours (but even then it would be only needed probably when not a new visit should be created after midnight). So we could apply this more privacy friendly fingerprint pretty much always

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code, yes it looks like the window_look_back_for_visitor is only used for fingerprint and defaults to 30min (defaults to the visit duration setting)

core/Tracker/Settings.php Outdated Show resolved Hide resolved
Copy link
Member

@mattab mattab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to see this pr. this will improve privacy and limiting the fingerprint so it's clear that people can use without consent for cookie-less analytics.

PS: didn't review the code

{
$config = Config::getInstance();
$tracker = $config->Tracker;
$tracker['enable_fingerprinting_limited_when_cookie_disabled'] = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do this only when window_look_back_for_visitor > say 43200? (if window_look_back_for_visitor is not custom, then no impact to users afaik)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned we're creating new visits after midnight anyway... so this should probably not be important...
unless create_new_visit_after_midnight = 0. Right now it looks like we can safely always apply the shorter fingerprint.

@tsteur tsteur changed the title When cookies are disabled, also make sure the fingerprint is limited Limit the fingerprint May 4, 2020

public function getDateString(Date $date, $timezone)
{
$dateString = Date::factory($date->getTimestampUTC(), $timezone)->toString();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure if this is correct? Had initially Date::factory()->setTimezone($timezone) but then noticed in tests it might not return correct values unless I'm expecting the wrong values. In ArchiveProcessor we're using for example setTimezone: https://github.com/matomo-org/matomo/blob/3.13.5/core/ArchiveProcessor/Parameters.php#L166 so would have maybe expected setTimezone is right but then in CronArchive we're also using it as a parameter https://github.com/matomo-org/matomo/blob/3.13.5/core/CronArchive.php#L1504 ... looking at the test results the parameter seems to give expected result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess that's something we should probably check again with #13829

@@ -794,8 +794,8 @@
<eventCategory>Music</eventCategory>
<eventAction>play50%</eventAction>
<bandwidth />
<timeSpent>60</timeSpent>
<timeSpentPretty>1 min 0s</timeSpentPretty>
<timeSpent>0</timeSpent>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't figure out where these changes are from. The tests seem to pass in 3.x-dev.

Not sure how this would change in this PR since in the tests itself the fingerprintSalt is basically not set and nothing changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests it seems the changes to the test files can be reverted.

@tsteur
Copy link
Member Author

tsteur commented May 5, 2020

Should be ready for a review. If I see things right, nothing changes for users in the end but hard to say.

@sgiehl sgiehl added Needs Review PRs that need a code review and removed Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. labels May 5, 2020
@sgiehl
Copy link
Member

sgiehl commented May 5, 2020

Haven't looked deeper into the code yet. But wouldn't changing the config id will have the effect, that every visitor will be a new visitor after the update? 🤔

@tsteur
Copy link
Member Author

tsteur commented May 5, 2020

@sgiehl only when cookies are disabled. Otherwise not. The fingerprint as @mattab mentioned was always only used to look 30 minutes back. So it would only affect it when cookies are disabled, and a visitor was active in the last 30 minutes. Then one new visit would be created which should be fine.

@tsteur
Copy link
Member Author

tsteur commented May 18, 2020

@sgiehl did you maybe have another look?

Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some short local testing and it seems to work as expected I guess.

tests/PHPUnit/Integration/Tracker/FingerprintSaltTest.php Outdated Show resolved Hide resolved
}
return $result;
} else {
return preg_replace('/[^a-zA-Z0-9_-]/', '', $value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually causes problems doing something like this:
./console config:set --section="General" --key="login_whitelist_ip[]" --value="127.0.0.1"
as it will result in login_whitelist_ip = "127.0.0.1" in the config file, causing all previously set ips in the array to be removed...
Maybe it would be better to throw an exception if unexpected chars occur instead of silently removing them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't solve this problem, because [...] is valid in this case, but ]asldkjf, eg, is not. Throwing an exception isn't an issue for me though so if that is more desired than removing the chars then that's fine too.

@@ -794,8 +794,8 @@
<eventCategory>Music</eventCategory>
<eventAction>play50%</eventAction>
<bandwidth />
<timeSpent>60</timeSpent>
<timeSpentPretty>1 min 0s</timeSpentPretty>
<timeSpent>0</timeSpent>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests it seems the changes to the test files can be reverted.

plugins/SitesManager/SitesManager.php Outdated Show resolved Hide resolved

public function getDateString(Date $date, $timezone)
{
$dateString = Date::factory($date->getTimestampUTC(), $timezone)->toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess that's something we should probably check again with #13829

tsteur and others added 4 commits May 21, 2020 08:47
@sgiehl
Copy link
Member

sgiehl commented May 21, 2020

@tsteur I've reverted the test file changes and the tests seem to pass again. From my point a view the changes look fine now. Maybe you can also have a look at the changes from @diosmosis. Otherwise feel free to merge

@tsteur tsteur merged commit 3aa66a2 into 3.x-dev May 21, 2020
@tsteur tsteur deleted the cookiefingerprint branch May 21, 2020 22:05
jonasgrilleres pushed a commit to 1024pix/pix-analytics that referenced this pull request Sep 22, 2020
jbuget pushed a commit to 1024pix/pix-analytics that referenced this pull request Sep 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants