Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When IP anonimisation is enabled, hash the IP address in a way that prevents long-term fingerprinting #5907

Closed
mattab opened this issue Aug 3, 2014 · 13 comments
Assignees
Labels
c: New plugin For features that probably will not be added to Matomo, but could be implemented as plugins. c: Privacy For issues that impact or improve the privacy. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.

Comments

@mattab
Copy link
Member

mattab commented Aug 3, 2014

Problem: when IP anonimisation is enabled, it is still easy to profile users across days as they may be the only ones in their anonimised IP address range. So a Piwik admin could very easily find matching visitors in previous days despite the fact that the IP address is anonymised. This was suggested by Richard Stallman from the FSF.

Goal: Help users prevent future surveillance on themselves. Provide better privacy to users measured by Piwik. Preventing the long-term tracking of users is of the highest importance; if we hit an obstacle, we must not simply give up.

Proposed solution: When IP anonimisation is enabled, hash the IP address in a way that prevents fingerprinting. For example, hash the anonimised IP using a seed that changes each day.

Advantages:

  • the algorithm to detect unique visitors using IP address hash would still work for matching visitors to the current day's traffic
  • it would bring added privacy to users tracked in Piwik as their IP address would be hashed, and across days such hash would be different.
  • the fingerprinting hash should use the hashed visitor IP

What do you think?

refs #5052

@mattab mattab added this to the Current milestone milestone Aug 3, 2014
@kylekatarnls
Copy link
Contributor

Agree, but does it will be possible any longer to know if a visitor is new or returning with anonimisation if the hash change each day ?

@mattab
Copy link
Member Author

mattab commented Aug 3, 2014

@kylekatarnls if the visitor has First party cookies enabled, then yes Piwik will be able to detect returning visitors.

@mattab mattab added c: Privacy Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. labels Sep 3, 2014
@mattab
Copy link
Member Author

mattab commented Sep 9, 2014

This is requirement to become a GNU Package: #5276
see also #6160

@mattab mattab modified the milestones: Piwik 2.9.0, Short term Oct 20, 2014
@mattab mattab changed the title When IP anonimisation is enabled, hash the IP address in a way that prevents fingerprinting When IP anonimisation is enabled, hash the IP address in a way that prevents long-term fingerprinting Oct 20, 2014
@mattab
Copy link
Member Author

mattab commented Oct 20, 2014

From RMS:

http://www.theguardian.com/commentisfree/2014/oct/17/whisper-private-secret-sharing-app-anonymity
talks about the danger of identifying people from the approximate geolocations when they visit a site.

@mattab
Copy link
Member Author

mattab commented Dec 10, 2014

Trying to get my head around this issue.... I think that:

  • we need to change the meaning of what it means to be Anonymising IPs, and how we store anonymised IPs in the DB in log_visit.location_ip
    • To prevent surveillance, we must hash the IP address with a random seed that changes every day.
    • example: a user visiting from the same IP address today and then three days later visits again: in the Piwik database we will see a different value for each day in the field log_visit.location_ip
    • it will be impossible to display IPs in the Visitor Log or Visitor Profile. we would show anonymised IP address instead of 123.45.0.0.
  • Piwik has privacy built-in, we'd like to set this anonymisation as the new default.
    • users could disable via the config file rather than the UI, by default it will read eg. enable_ip_anonymisation_really_anonymous=1
  • Maybe we can remove the section Select how many bytes of the visitors' IPs should be masked. from the Privacy settings page?

Other

  • By default, enabled by default. Can be disabled in config eg. enable_ip_anonymisation_really_anonymous=0
  • when the IP is anonimised the segment visitIp should be hidden as it won't match the visit as expected
  • if the config setting window_look_back_for_visitor is non zero, then we should automatically set enable_ip_anonymisation_really_anonymous=0 as it is required for this feature to work (and BC)
  • need to tweak FAQs finding visitor IP, select visitor ip from database

@mnapoli
Copy link
Contributor

mnapoli commented Dec 10, 2014

To prevent surveillance, we must hash the IP address with a random seed that changes every day.

👍

users could disable via the config file rather than the UI

Why making it harder than what it is today? Why not leaving it in the UI?

enable_ip_anonymisation_really_anonymous=1

really_anonymous is like mysql_real_escape_string(), we should try to find a better naming

What about Piwik installs that use IP anonymisation today? Will the new anonymisation method replace the old one?

@mattab
Copy link
Member Author

mattab commented Dec 10, 2014

Why making it harder than what it is today? Why not leaving it in the UI?

because user won't be able anyway to view "bytes that were not anonimised from the IP addresses" then it may be confusing to user if we give him control there but in the end it will not affect the "visibility" of the reports. it seems to be that it will become an implementation detail of the new better anonymisation algorithm?

Will the new anonymisation method replace the old one?

I guess it's safer not to in case some users depend on it somehow. maybe we add upgrade task to set the setting to 0 for those users.

+1 to find a better setting name! anonymised_ip_prevent_big_brother or anonymised_ip_prevent_surveillance or anonymised_ip_look_different_each_day ?

@tsteur
Copy link
Member

tsteur commented Dec 11, 2014

Sounds rather like a new plugin for me in case someone wants to have another level of anonymization... Or if you think most users actually want this behavior replace the default behavior and move the old one into a plugin so users can still use the previous one.

@mnapoli
Copy link
Contributor

mnapoli commented Dec 11, 2014

+1

It's confusing:

  • no_anonymisation
  • anonymisation_but_not_really_anonymous
  • anonymisation_really_this_is_the_real_one

It needs to be clear and simple, either it's anonymized, either it's not. And in the end: do Piwik changes its definition on what "anonymisation" mean? If not, then we put the new method in a plugin because that's not the Piwik endorsed way. If yes, then we put the old method in a plugin (or we mark it "not-recommended" in the UI) because that's no longer the Piwik way. We can't go half heartedly, it will just confuse people (but we should still keep BC for users though!).

And coming back again on this because I'm not sure I understood your answer:

users could disable via the config file rather than the UI

As a user, I want to be able to disable IP anonymisation. Going in the config file is a no-no for me, e.g. I'm using Piwik Cloud.

@mattab
Copy link
Member Author

mattab commented Dec 11, 2014

Alright I think we do it this way:

  • we keep current setting Anonymize Visitors' IP addresses. when Yes is selected....
    • show below new setting Keep IP secret. The inline help reads Select "Yes" if you want to respect your visitors privacy by encrypting their IP addresses.
      • When yes is selected, hide the Select how many bytes of the visitors' IPs should be masked. (since this setting becomes almost irrelevant).
      • When no is selected, display Select how many bytes of the visitors' IPs should be masked. (current behavior)

I think this way makes full sense and nicely integrated (config setting was a bad idea)

@tsteur
Copy link
Member

tsteur commented Dec 11, 2014

Can we put this into a plugin? That's why there is a Tracker.setVisitorIp event ;) So people could just activate it by installing (or disable it by deactivating) the plugin

@mattab mattab modified the milestones: Piwik 2.10.0 , Piwik 2.11.0 Jan 4, 2015
@mattab mattab removed the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Jan 6, 2015
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Feb 9, 2015
@mattab mattab modified the milestones: Mid term, Piwik 2.11.0 Feb 9, 2015
@mattab mattab modified the milestones: Short term, Mid term Feb 27, 2015
@mattab mattab modified the milestones: Mid term, Short term Oct 2, 2015
@mattab mattab added duplicate For issues that already existed in our issue tracker and were reported previously. and removed Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. labels Dec 5, 2016
@mattab mattab modified the milestones: Long term, Mid term Dec 5, 2016
@mattab mattab removed the duplicate For issues that already existed in our issue tracker and were reported previously. label Dec 5, 2016
@mattab mattab added the c: New plugin For features that probably will not be added to Matomo, but could be implemented as plugins. label Jun 19, 2017
@tsteur
Copy link
Member

tsteur commented Sep 3, 2020

Note: we already added hashing the fingerprint with a salt that changes every day.

In #16377 we will also offer an option to not store the IP at all.

Not sure this is still needed then?

@mattab
Copy link
Member Author

mattab commented Sep 4, 2020

will be fixed in #16377

@mattab mattab closed this as completed Sep 4, 2020
@tsteur tsteur added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: New plugin For features that probably will not be added to Matomo, but could be implemented as plugins. c: Privacy For issues that impact or improve the privacy. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

No branches or pull requests

4 participants