@mattab opened this Issue on May 7th 2018 Member

Our User ID Pseudonymisation does a SHA1 and uses the Matomo salt during hashing. (refs https://github.com/matomo-org/matomo/issues/12836 https://github.com/matomo-org/matomo/issues/12641 #12600)

It is not full anonymisation because given the User ID, and knowing the Matomo salt, then it's possible to find back all the visits for this particular user. In the backend, since we know the Salt, we're in theory able to process the User ID hash (Pseudonym) and return all visits/actions data for this User ID only. The goal of this issue is to implement this behavior for full transparency towards data subjects.

Exporting data subjects data based on User ID

Current behavior

Currently, when User ID Pseudonymisation is activated on the instance, all User IDs are replaced by the hashed value. And exporting the data subject's data won't work because User ID is now hashed/pseudonymised.

Expected behavior

  • when Pseudonymisation of User IDs is enabled, and when the Matomo Super User exports data subject requests for a given User ID,
  • then the data export should work.
@tsteur commented on May 7th 2018 Member

Do I understand this correctly that you want the export to work when searching for the original userId? I'm not getting why you would then anonymize it in the first place. I don't think this should be possible.

@mattab commented on May 7th 2018 Member

We could delete the feature, but It's definitely still useful to have this feature, because if one steals the DB, or steals access to the UI, one can't find out the original User ID (which brings several layers of security). So the idea is to be 100% transparent about it. Since it's technically possible (and relatively easy) to re-identify a User ID that was pseudonymised, then we should expose this through the UI by letting data subjects data export still work.

@tsteur commented on May 7th 2018 Member

one can't find out the original User ID (which brings several layers of security)

That's not really true. You can still find out with brute force etc... doesn't take too long nowadays. Especially when that visitor has been on the site before logged in as user.

@tsteur commented on May 7th 2018 Member

@mattab lets discuss later again. As enabling this feature turns the "Pseudo Anonymization" into "Pseudo Secure".

It implies that something is anonymized, but the data is still accessible. Even an attacker who has the token_auth would still be able to get the data through the API etc. And it may be more likely that an attacker gets access to token_auth than DB.

It is very important to clarify this feature before the 3.5.0 release as you cannot change the meaning of it from "Pseudo Anonymization" to "Pseudo Secure" later as this is not what users that have this enabled would expect.

@tsteur commented on May 7th 2018 Member

Also once an attacker has access to the db, the attacker has access to the API being able to query the API for specific users.

On top, having the userId salt in the DB is not helping as it is neither secure nor anonymized.

@mattab commented on May 7th 2018 Member

It implies that something is anonymized, but the data is still accessible.

FYI the word is "Pseudonym-ise" where pseudonym means A pseudonym (/ˈsjuːdənɪm/ SEW-də-nim) or alias (/ˈeɪliəs/) is a name that a person or group assumes for a particular purpose, which can differ from their first or true name. So there is no notion of anonymity or security.

Powered by GitHub Issue Mirror