It is not full anonymisation because given the User ID, and knowing the Matomo salt, then it's possible to find back all the visits for this particular user. In the backend, since we know the Salt, we're in theory able to process the User ID hash (Pseudonym) and return all visits/actions data for this User ID only. The goal of this issue is to implement this behavior for full transparency towards data subjects.
Currently, when User ID Pseudonymisation is activated on the instance, all User IDs are replaced by the hashed value. And exporting the data subject's data won't work because User ID is now hashed/pseudonymised.
Do I understand this correctly that you want the export to work when searching for the original userId? I'm not getting why you would then anonymize it in the first place. I don't think this should be possible.
We could delete the feature, but It's definitely still useful to have this feature, because if one steals the DB, or steals access to the UI, one can't find out the original User ID (which brings several layers of security). So the idea is to be 100% transparent about it. Since it's technically possible (and relatively easy) to re-identify a User ID that was pseudonymised, then we should expose this through the UI by letting data subjects data export still work.
one can't find out the original User ID (which brings several layers of security)
That's not really true. You can still find out with brute force etc... doesn't take too long nowadays. Especially when that visitor has been on the site before logged in as user.
@mattab lets discuss later again. As enabling this feature turns the "Pseudo Anonymization" into "Pseudo Secure".
It implies that something is anonymized, but the data is still accessible. Even an attacker who has the token_auth would still be able to get the data through the API etc. And it may be more likely that an attacker gets access to token_auth than DB.
It is very important to clarify this feature before the 3.5.0 release as you cannot change the meaning of it from "Pseudo Anonymization" to "Pseudo Secure" later as this is not what users that have this enabled would expect.
Also once an attacker has access to the db, the attacker has access to the API being able to query the API for specific users.
On top, having the userId salt in the DB is not helping as it is neither secure nor anonymized.
It implies that something is anonymized, but the data is still accessible.
FYI the word is "Pseudonym-ise" where pseudonym means
A pseudonym (/ˈsjuːdənɪm/ SEW-də-nim) or alias (/ˈeɪliəs/) is a name that a person or group assumes for a particular purpose, which can differ from their first or true name. So there is no notion of anonymity or security.