Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we ensure Latitude and Longitude columns are at best to the city level #15969

Open
mattab opened this issue May 22, 2020 · 8 comments
Open
Labels
c: Privacy For issues that impact or improve the privacy.

Comments

@mattab
Copy link
Member

mattab commented May 22, 2020

With regards to privacy, considering we have a column latitude and longitude in the database schema, how could we (or how do we) ensure Latitude and Longitude columns are at best to the city level?

It is a privacy concern that lat/long could be more precise than what might be expected.
In upcoming recommendations it will be important to limit geolocation to the city level at best.
afaik we use lat/long in order to plot the user on the real-time map.
independantly of whether the user would be geo-located using anonymised IP or not, it'd be great to ensure the lat/long are never too precise.

Is this already the case in Matomo? If not, could we limit lat/long precision to the city (and how)?

@mattab mattab added the c: Privacy For issues that impact or improve the privacy. label May 22, 2020
@mattab
Copy link
Member Author

mattab commented May 22, 2020

Also it'd be important to document this "feature" in the user guide at: https://matomo.org/docs/geo-locate/

@diosmosis
Copy link
Member

Possible solutions:

  • rounding lat/long values to some degree (the degree to be determined later)
  • keeping a db of city => lat/long pairs, though this seems far more difficult

@Findus23
Copy link
Member

See also #12735 for an even rougher rounding.

@tsteur
Copy link
Member

tsteur commented Sep 3, 2020

I just checked and both DB-IP and MaxMind seem to report the last three digits as 000 and basically round. This can change though in the future.

Also I'm thinking the rounding can still be a problem for rural areas where only few people live. You could then potentially maybe still identify individuals or households maybe?

I'm not sure we can generally find a solution to this besides optionally not tracking it at all (which breaks real time map only). If I see this correct only the real time map uses this info. Maybe the real time map could be changed to work like the regular visitor map and not use long/lat?

@diosmosis
Copy link
Member

I'm not sure, but it looks like the visitor map converts a city to a lat/long pair, so this might be do-able pretty easily... still checking though

@diosmosis
Copy link
Member

@tsteur nvm, that uses the tracked longitude/latitude. Probably easiest is to somehow map locations to longitude/latitude, otherwise I think we'd have to change the realtime map significantly. It's probably fairly simple to write a script to iterate over every location in the geoip database and set a lat/long in a file.

@mattab
Copy link
Member Author

mattab commented Sep 4, 2020

I'd say that for their own reasons, it's always in the geolocation DB providers interest to not provide more accurate lat/long.

geoip says for example https://www.maxmind.com/en/geoip2-city

Longitude (Latitude and Longitude are often near the center of population. These values are not precise and should not be used to identify a particular address or household.)

As a possible fix maybe we could always set the last 3 digits to zero if that's what maxmind does (in case they change it in the future)?

@tsteur
Copy link
Member

tsteur commented Sep 7, 2020

I reckon in this case for now we maybe don't need to do anything and if someone wants to use some more accurate provider then they can do this.

The problem would still remain with rounding etc if locations where only few people live but I suppose they would also be maybe assigned to a bigger nearby city (would need to be checked).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Privacy For issues that impact or improve the privacy.
Projects
None yet
Development

No branches or pull requests

5 participants