@tsteur opened this Issue on January 21st 2020 Member

I would say we'd want to have different options maybe:

  • Full
  • Exclude query parameters
  • Exclude path (basically only record the subdomain domain)
  • Exclude subdomain (eg someone coming from example.matomo.cloud be still identifiable possibly). We would in that case only record matomo.cloud
  • Nothing, only record the type of referrer type

This can avoid tracking personal data since the page a user comes from, might include it's name, or some ID, etc.

We maybe also want to look generally into some referrers and remove url parameters from most ad and social networks etc.

Not wanting to make this feature too complex but was thinking if it could be also configurable for social search engines specifically and overall. Eg we recognise a few as social acquisition and these we maybe know how to anonymise and therefore could still keep some of the URL parameters (whitelist). Maybe doesn't need to be a separate setting. This way would maybe still get some valuable information for some referrers while removing eg all query parameters for all others.

Besides this as a follow up from https://github.com/matomo-org/matomo/issues/15902 we want to further anonymise some referrers automatically:

Also below... not sure but this looks like it contains personal data... wonder if it makes sense to only store https://googleads.g.doubleclick.net/pagead/ads? or better https://googleads.g.doubleclick.net/pagead/ads?url=https://example.com/foobar.php?

https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-1086157892257495&output=html&h=300&slotname=9664612556&adk=3030006362&adf=4109546823&w=360&lmt=2588583301&rafmt=1&psa=1&guci=2.2.0.0.2.2.0.0&format=360x300&url=https://example.com/foobar.php?fid=3241&flash=0&fwr=1&rpe=1&resp_fmts=3&sfro=1&wgl=1&dt=1548583300834&bpp=63&bdt=168&idt=174&shv=r30200428&cbv=r24140131&ptt=9&saldr=aa&abxe=1&cookie=ID=7b741e56705a5595:T=158853181:S=A5NI_MZyqUUO8pNhi4diWmFjQk4H8Y-hJA&crv=1&correlator=8391657028466&frm=20&pv=2&ga_vid=1350038101.1588583682&ga_sid=1585583301&ga_hid=135474036&ga_fc=1&ga_wpids=UA-3641475-2&iag=0&icsg=8362&dssz=13&mdo=0&mso=0&u_tz=420&u_his=1&u_java=0&u_h=780&u_w=360&u_ah=780&u_aw=360&u_cd=24&u_nplug=0&u_nmime=0&adx=0&ady=80&biw=360&bih=648&scr_x=0&scr_y=0&eid=21065451,23465474,4471896&oid=3&pvsid=164627556405179&pem=809&ref=https://example.com/foobar.php?aff_sub3=ID-rdr-2&utm_campaign=38350&utm_source=IK-xd3&utm_medium=rdr&rx=0&eae=0&fc=644&brdim=0,0,0,0,360,0,360,648,360,650&vis=1&rsz=||leE|&abl=CS&pfx=0&fu=8334&bc=31&ifi=1&uci=b!1&fsb=1&xpc=iZjViRq6KJ&p=https://example.com&dtd=230

Then also any url starting with https://www.mgid.com/ghits we could remove all URL parameters.

https://www.mgid.com/ghits/5633098/i/113152/0/pp/6/1?h=BbYLxfMqnKZaGQ2xAYIHcOAQPcWj4dWQv_DGpWVIRdhZieoJnL-LlQSwR7epXrq5&rid=60ckde11-0dfd-11ea-8948-d194662c24f7&ts=com.google.android.googlequicksearchbox&tt=Organic&cpm=1&gbpp=1&k=784880fcib-45T5E4fIWfXH.hQZjfXH.hbQFfbD:fr;fx!fW~f=f4:faI:fV=fO:ffx!fQf.faHR0cHM6Ly9lY39ub215Lm9rZXpv5bmUuY29tL3JlYWQvMjAyMC8wN$8wMy8zMjAvMjI=fYW5kcm9pZC1hcHA6Ly9jb20uZ39vZ2xlLmFuZHJvaWQuZ29vZ2xlcXVpY2tzZWFyY2g=fK45vL2Nvb$5nb29nbGUuYW5kcm9pZC5nb29nbGVxdWlja3NlYXJjaG5veC9odHRwcy9529vZ2xlf*fMzQ2*DQxN5cx*DQwOTc=rMHwxMXw6MXwzNA==nMHwwf!fyfNjI2*Dt2MHww*DE4Mg==ft!fLQfXH.hR.Df!fTW96aWxsY$81LjAgrExpbnV4OyBBbmRyb4lkIDk7IFJlZG1pIDZBK$BBcHBsZVdlYktpdC81MzcuMzYgKEtIVE1MLCBsaWtlIEdlY2tvK$BDaHJ4bWUvNzguMC4zOTA0LjEwOCBNb2JpbGUgU2FmYXJpLzUzNy4zNg==ffMHwzfTGludXgyYXJtdjdsfNDIwfMHw2NQ==fMzYw*Dcy5A==fY2Vsb5VsYXJ8NGd8MA==f!f!fQf+f*f*&muid=kb3UBFHwRcc3
Powered by GitHub Issue Mirror