@mattab opened this Issue on April 12th 2015 Owner

The goal of this issue is to expose a Privacy challenge in Piwik, regarding the ability to spy on users tracked in Piwik over time.

What is the Visitor ID?

The unique visitor ID is a 16 characters hexadecimal string. Every unique visitor is assigned a different ID and this ID is not changed after it is assigned.

  • It is stored in the first party cookie. After 13 months after the first action by this user, the ID will be renewed.

The Visitor ID is stored in the Piwik database in the field idvisitor

What is the fingerprinting hash?

When tracking a new user, Piwik processes a fingerprint hash for this user. The hash is built from a list of user attributes such as IP address, screen resolution, browser plugins used, etc. (this is done in the method getConfigHash.). The fingerprint hash is used by Piwik Tracking API to try to record the actions in the correct user visit. The fingerprint hash is used when the Visitor ID (in first party cookie) was not found (otherwise by default the Visitor ID is used).

Notes about fingerprint hash is created:

  • The fingerprint hash is currently seeded with a salt that is different for each Piwik instance.
    • (ensures that a same person tracked in multiple Piwik instances could be not be cross-matched across those several instances. )
  • The fingerprint hash is also seeded with the Website ID (done in #6824)
    • (ensures that a same person tracked on several websites within the Piwik instance could not be cross-matched across several websites within this Piwik instance).

The fingerprint hash is stored in the Piwik database in the field config_id

Privacy challenges

Imagine for example if a Piwik database is seized by ex-colleagues of Edward Snowden (spies) who would like to use the Piwik data to spy on users who were tracked in Piwik.

When seizing a Piwik Database:

  • if IP anonymisation is not enabled, the Piwik DB will give spies the complete trail of user actions on the website for a given 'IP address' or 'Visitor ID'
  • if IP anonymisation is enabled, ability to spy is bit more limited. The Piwik DB will give spies the complete trail of user actions on the website for a given 'Anonymised IP address' or 'Visitor ID'
    • Note: when #5907 will be implemented then spies will not be able to get complete trail of user actions for a given 'Anonymised IP address'. (why? this IP address will be hashed with a daily seed when IP anonymisation is enabled, and the fingerprint hash which uses this Anonymised IP address will also be changing every day for a given user preventing ability to spy over time)
  • Spies can always lookup all actions for a given 'Visitor ID' assuming:
    • the user had First party cookies enabled.
    • the user was using the same browser over time, and did not delete the cookies
  • Spies can lookup all actions for a user that uses a particular browser, and/or a particular OS, and/or a set of plugins
    • (Piwik stores the browser, OS and plugin info in the tracking log tables)

      Improve privacy

Since our goal is to improve the Privacy by default for users being tracked in Piwik (#6160), we wanted to explain how this works.

Note that to improve Privacy in your Piwik server and prevent long term surveillance of users via the Piwik database, you can already do the following:

To help limit surveillance we should work on: #5907

Maybe there isn't much more we can do but feel free to leave a comment if you have suggestions.

@mattab commented on December 29th 2016 Owner

I've documented in detail how the visitor recognition works here and in this FAQ: How does Piwik detect unique and returning visitors? (with User ID, Visitor ID from cookie and/or fingerprint)

Our privacy guidelines are documented in: https://piwik.org/docs/privacy/

Any further request or comment please comment here or create a new issue.

See also our Privacy label for issues: https://github.com/piwik/piwik/labels/c%3A%20Privacy

This Issue was closed on December 29th 2016
Powered by GitHub Issue Mirror