Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDPR: When "Pseudonimise User ID" is activated, still allow users to export data subjects requests for a given User ID #12839

Open
mattab opened this issue May 7, 2018 · 6 comments
Labels
c: Privacy For issues that impact or improve the privacy. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

Comments

@mattab
Copy link
Member

mattab commented May 7, 2018

Our User ID Pseudonymisation does a SHA1 and uses the Matomo salt during hashing. (refs #12836 #12641 #12600)

It is not full anonymisation because given the User ID, and knowing the Matomo salt, then it's possible to find back all the visits for this particular user. In the backend, since we know the Salt, we're in theory able to process the User ID hash (Pseudonym) and return all visits/actions data for this User ID only. The goal of this issue is to implement this behavior for full transparency towards data subjects.

Exporting data subjects data based on User ID

Current behavior

Currently, when User ID Pseudonymisation is activated on the instance, all User IDs are replaced by the hashed value. And exporting the data subject's data won't work because User ID is now hashed/pseudonymised.

Expected behavior

  • when Pseudonymisation of User IDs is enabled, and when the Matomo Super User exports data subject requests for a given User ID,
  • then the data export should work.
@mattab mattab added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. c: Privacy For issues that impact or improve the privacy. labels May 7, 2018
@tsteur
Copy link
Member

tsteur commented May 7, 2018

Do I understand this correctly that you want the export to work when searching for the original userId? I'm not getting why you would then anonymize it in the first place. I don't think this should be possible.

@mattab
Copy link
Member Author

mattab commented May 7, 2018

We could delete the feature, but It's definitely still useful to have this feature, because if one steals the DB, or steals access to the UI, one can't find out the original User ID (which brings several layers of security). So the idea is to be 100% transparent about it. Since it's technically possible (and relatively easy) to re-identify a User ID that was pseudonymised, then we should expose this through the UI by letting data subjects data export still work.

@tsteur
Copy link
Member

tsteur commented May 7, 2018

one can't find out the original User ID (which brings several layers of security)

That's not really true. You can still find out with brute force etc... doesn't take too long nowadays. Especially when that visitor has been on the site before logged in as user.

@tsteur
Copy link
Member

tsteur commented May 7, 2018

@mattab lets discuss later again. As enabling this feature turns the "Pseudo Anonymization" into "Pseudo Secure".

It implies that something is anonymized, but the data is still accessible. Even an attacker who has the token_auth would still be able to get the data through the API etc. And it may be more likely that an attacker gets access to token_auth than DB.

It is very important to clarify this feature before the 3.5.0 release as you cannot change the meaning of it from "Pseudo Anonymization" to "Pseudo Secure" later as this is not what users that have this enabled would expect.

@tsteur
Copy link
Member

tsteur commented May 7, 2018

Also once an attacker has access to the db, the attacker has access to the API being able to query the API for specific users.

On top, having the userId salt in the DB is not helping as it is neither secure nor anonymized.

@mattab
Copy link
Member Author

mattab commented May 7, 2018

It implies that something is anonymized, but the data is still accessible.

FYI the word is "Pseudonym-ise" where pseudonym means A pseudonym (/ˈsjuːdənɪm/ SEW-də-nim) or alias (/ˈeɪliəs/) is a name that a person or group assumes for a particular purpose, which can differ from their first or true name. So there is no notion of anonymity or security.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Privacy For issues that impact or improve the privacy. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

3 participants