See doc: Geo Locate visitors countries cities and regions.
GeoIP plugin #45 is one of the most popular plugins. For a web analytics tool, getting user countries as accurately as possible is critical, and Piwik should help users in this direction.
When the plugin is released in trunk, we should update the FAQ, website pages and wiki pages mentionning GeoIP, and mark as closed the GeoIP ticket #45. For Goals compatibility of GeoIP plugin, see #1434.
Please let us know in the comments your feedback. If you would like to participate... well you know what to do!
Attachment: Patch of GeoIP.php that allows it to store region ids.
Just some input to clarify the terms "country" and "region". Refering to the list of administrative levels used in OpenStreetMaps a country would correspond to admin level 2 while regions would correspond to admin level 4.
Do we record lat/long for each visitor, or do we assume that other systems will know where to plot a given City
I think it is NOT necessary to record lat/long for each visitor. It is sufficient to record the city id. The GeoIP db would resolve each visitors location within the same city to the same lat/long anyway. In fact for each city there is only one lat/long stored in the GeoLite City DB (more precisely in the cityByCountry table).
As the number of available cities (= pairs of lat/long) differs between the different GeoIP databases, it makes no sense to put these information into other systems like the world map.
if I might add to this. The maxmind db gives a city lookup. This does not work how people think it will.
Blocks of IP numbers are sold to to service providers who resell to end users. However, the IP issuing authority assign the city of the ISP address to all the IP numbers. At least that is how it works in the UK. Things may vary in different countries and ISPs don't reallocate city when they sell dedicated IP numbers to end users.
The result is that city lookup generally only gives the city of the ISP and not where the visitor is visting from. The ISP can be anywhere in the country and hundreds of miles from where the visitor is based. In other words, city lookup is useless except for giving the location of ISPs. This also means that lat long is useless too since it seems to be based on city lookup.
When IPV6 is rolled out and if, and only if, ISPs allocate city to users when they purchase a fixed IP then city lookup may become useful. But many ISPs still use dynamically allocated IPs so it wouldn't work in that case either.
In short the concept of providing city and/or lat/long of vistors is fundamentlly flawed.
+1 for this, especially the Apache module detection routine. I get a few fatal errors in my logs because the plugin insists on loading the local files instead of getting the data from Apache.
I have the commercial db of Maxmind. You can use it if you want for developing the new plugin. Let me know how I can contact you.
See a bug report in the forum in the php script to update past visits: http://forum.piwik.org/read.php?2,71587,page=2#msg-71784
I'll take this on, in conjunction with the ipv6 ticket.
In the existing GeoIP plugin, there's a misc/.htaccess file. We don't want this in the new plugin. Access to geoipUpdateRows.php (or equivalent) should be guarded via token_auth.
Should I delete the .htaccess file in there?
Yes, you can remove the .htaccess file. After you've run it once, you shouldn't have to run it again.
$ php geoipUpdateRows.php
Fatal error: Call to undefined function _parse_ini_file() in /home/kiplingw/avaneya.com/piwik/core/Config.php on line 373
I also removed the .htaccess file.
This should be fixed in the updated .zip that I attached to #45. Heres the patch so you don't have to redownload the .zip:
Index: geoipUpdateRows.php =================================================================== --- geoipUpdateRows.php (revision 51) +++ geoipUpdateRows.php (working copy) @@ -20,8 +20,8 @@ . PATH_SEPARATOR . PIWIK_INCLUDE_PATH . '/libs' . PATH_SEPARATOR . PIWIK_INCLUDE_PATH . '/plugins'); +require_once PIWIK_INCLUDE_PATH . '/libs/upgradephp/upgrade.php'; require_once PIWIK_INCLUDE_PATH . '/core/testMinimumPhpVersion.php'; - require_once PIWIK_INCLUDE_PATH . '/core/Loader.php'; $GLOBALS['PIWIK_TRACKER_DEBUG'] = false;
Thanks. Applied. How can I test it?
$ php geoipUpdateRows.php
It finished execution (no output), and I noticed the UserCountry_ thing is still there in the stats. Should I just ignore that for now and assume new stats will not have that?
Thank you =)
To answer questions in the ticket:
Replying to matt:
- this means, that we don't store lat/long in log_visit
I'm thinking of keeping lat/long because:
I would like to propose:
I would like to propose:
- rolling the provider plugin into the geolocation plugin
- if the geolocation plugin can get the organization field, it populate location_provider
- otherwise, fallback to the gethostbyaddr() method
The only thing, is please make sure the few "Provider" special cases are still working. In particular, VisitorGenerator & proxy-piwik.php disable the Provider lookup because it is too slow
Ok. There are a couple of third party plugins (e.g., KSVisitorImport and TrackerSecondaryDb) that also disable the Provider plugin.
+1 vote for adding regions onto this as well. They are available in GeoLiteCity, so might as well use them. It would be great to include this into a regional map as well that the country map can drill down into.
Copy of my coment to #5465 (sorry, I used wrong ticket, apparantly, I knew there was one specifically for integration of GeoIP into core):
This new plugin sounds promising. But I hope you are going to also keep the old browser language/country detection, maybe named as such. I personally consider that language display equally important as the IP location display.
Following scenario: I'm on a travel around the world, and have a travel blog. People accessing that blog are often people I have met on the trip, often still traveling. Now, when I see my Piwik logs, the IP location (which I currently check manually) is surely interesting, but what tells me more about a visitor is actually his browser language. If you check the IP address I am writing this from, you will see that it is Malaysian. How much do I have to do with Malaysia? Nothing. My browser language is German of Germany, which tells more. And the combination of the two IP location and browser country (i.e. the current detection) actually provides one more detail: the visitor is most likely a traveler or an expat. I can imagine website who interested in that marketing information.
You would not believe how many travelers roam the world this days. And I would say most of them use the often free WiFi (at their place of stay, bars and restaurants all over Southeast Asia) with their own devices: Laptops, Phones, Tablets, etc. It seems to be the new way of travel, with people sticking their noses into displays half of their time, with most of that time on Facebook.
P.S.: Since there are countries with several languages (Belguim, etc.), but also countries with common language (UK, US, etc.), maybe both, the browser country and its language could be shown (if provided by browser). Additionally to the IP location provided by this plugin.
+1 on jawsmith proposal on having a combined vision on location against visitor's preferred language.
As a belgian developer I can tell you that this kind of information can be of crucial interest in a country like Belgium, but in many others too. For example, usage of the spanish language in some regions of the US can be an important factor I think...
I imagine an ideal "Vistor countries" GeoIP plugin offering the current "Countries" split, clicking a country name would open a "Regions" list, clicking a region name would open (the currently available) "Cities" list. Then an additional button could be fit at the bottom, between the "Display simple table" and "Display a table with more metrics" that would "Display a table with languages". That table could have one additional column for each language that was detected...
Do we recording regions as well as Countries?
Do we record Cities?
As the new world map widget will be able to display data for regions and cities, it would be amazing if Piwik would be able to record the data for regions and cities :)
Do we record lat/long for each visitor, or do we assume that other systems (eg. the world map) will know where to plot a given City (and maintain their own database)?
Nope, the world map doesn't store locations for every city. Instead, it will be able to plot any given lat/long onto the map.
Here is proposal of the API functions and returned data for the GeoIP integration in core:
Replying to vipsoft:
Replying to matt:
- this means, that we don't store lat/long in log_visit
I'm thinking of keeping lat/long because:
I am reluctant to include redundant information in the log_visit table.
At minimum, we should record in log_visit
The question remains if we need to store lat/long, depending how fast/easy it is to query lat/long from a given City using GeoIP (maybe this is not possible?)
Depends on what kind of database you're using. If you're using the CSV database and import it to MySQL tables, than you can run a query like
SELECT latitude,longitude FROM location WHERE city = 'Berlin'
in < 1ms. However, you will get ambiguous results when just looking for city names. Instead, a better idea would be to store the unique GeoIP location-id.
I don't know if any of the GeoIP APIs that work with the binary database (.dat) supports reverse-queries. All I saw was the IP --> location way..
Note: I propose to remove "Continent" and process this from the aggregated Country datatable in the Archiving function. It would be trivial/fast to process the Top continents.
+1, since 'classic' Continents are also quite useless for many scenarios. Often, people are more interested in political/economic regions, e.g. MENA
Replying to tlitody:
However, the IP issuing authority assign the city of the ISP address to all the IP numbers. At least that is how it works in the UK. Things may vary in different countries and ISPs don't reallocate city when they sell dedicated IP numbers to end users.
In Germany you haven't this Problem since AOL doesn't exist in Germany anymore.
You can locate the City. In rural Areas the difference between the real location and the indicated Area can be 55km... This is my experience.
(In ) Refs #2902
When Anonymize IP is enabled with only 1 byte removed, could we default the last byte to 1 so that we get at least an approximate User location? See also: #3023
In fact, in most cases that's the same level of accuracy as if you would use all 4 bytes..
Sounds good, we will most likely do this then. This will limit user frustration significantly since there has been many complaints that "Provider" reports is not working at all when IP anonymized (it would be even worse if GeoIP was broken!)
Reasonable assumption as long as the IP belongs to a class C address (or larger). It also depends on the quality of the geolocation data provider.
Thank you guys for your feedback
For some reason, the plugin behaves differently when called via the log import script and we get a fatal error.
PHP Fatal error: Cannot redeclare geoip_country_code_by_name() in /plugins/GeoIP/libs/geoip.inc on line 347
Checking if those functions have already been declared doesn't help as it seems the whole geoip.inc file shouldn't be called.
mod_geoIP in Apache
geoIP pecl extension in PHP
Interfasys: your php-cli has the geoip extension enabled which has the same api as the php library used by the GeoIP plugin (#45).
This conflict will be addressed by the new Geolocation plugin.
many users are discussing patches to the GeoIP files in: http://forum.piwik.org/read.php?2,71788
for each person posting in the forum there are probably 10 users having the same issue
it shows the very high interest of the community in having an integrated geoIP plugin in core :)
Btw, here's my patch of the GeoIP plugin (just GeoIP.php in this case). It enables the plugin to store region information, which is essential for the map widget I develop.
(In ) refs #1823 - commit geolocation adapters and plugin stubs
Thanks Anthon for the initial commit!!
There is quite some work left on this task:
If anyone is keen to help, please let me know ASAP!! :)
Thanks for all the hard work. Integration would confirm Piwik as a superior alternative to Google Analytics. I've posted this in the forum but will post here as well. While replacing the provider details with the organization details adds a ton of value to the reports, occasionally the listed organization will be the same as the ISP. This detracts value from the organization report and thus it would be nice to be able to filter out a list of ISPs using a single segment/parameter. The single segment/parameter would also allow for continual updating of the list.
I can't help on the coding side, but if there is any other way to help, please don't hesitate.
excellent news for the piwik community: we are going to work on GeoIP in core! Thanks Anthon for your initial commit :-)
I will post here the specs for the plugin.
Yeah, i am waiting for this for a long time =)
Note: please ignore all comments above this. The following spec replaces previous propositions:
Here is a proposal specification for the new "geoip" functionnality & the very useful feature of having more accurate visitor location information!
New Admin UI
The goal of the UI is to clearly report the status of GeoIP (Enabled / disabled / enabled but not working yet):
The GeoIP setting page would also show the GeoIP lookup for the Piwik super user looking at the page.
New reports & New APIs
Under Visitors > Location & Provider...
Also in Goals > Overview, and in each Goal > $goal_n report:
To process these new reports, there is going to be new archiving:
Other UI changes
List of Tests to check before release
On top of the "automatic" integration tests testing the API, Here are some ideas of things to test that things still work as expected:
Note: all these should not necessarily be unit tests, but at least manual testing once is very important...
Script that will enrich existing past data with GeoIP
For later / V2 and beyond
Not for a first release, but ideas backlog for the future:
End of Spec.
Let me know if there's any question or suggestion!
I've just posted a file which can be used for monthly updates via cron:
55 12 2 * * root /usr/local/bin/geoip.updater.sh
It could also be used for the initial DB setup.
Addition to the spec:
We probably need to write an FAQ on how to migrate from the old plugin to the new one.
Maybe I missed it, but we should consider province/state as well. In some countries city names are duplicated which could make study of GEOip traffic inaccurate. For example: http://www.canada-city.ca/duplicate-cities.php
(In ) Refs #1823, modified UserCountry plugin to allow use of GeoIP databases if desired. Added two reports, getVisitsByRegion + getVisitsByCity.
(In ) Cleaning up misc/ directory to prepare for GeoIP files, Refs #1823
(In ) Refs #1823, use misc dir instead of files-geolocation for GeoIP db files.
(In ) Refs #1823, several changes & tweaks to GeoIP modifications:
(In ) Refs #1823, added latitude/longitude + other metadata to getCity reports, added country name metadata to getRegion & tweaked admin UI a bit.
The admin UI explaining the different implementations is great. But it doesn't tell the user how to set up new methods. Maybe it should be linked to a doc page where that's explained?
(In ) Refs #1823, added missing constant that stops tracker from working / breaks tracker if location code is unknown.
(In ) Refs #1823, reverted change of Visit Class- Constant was already defined in UserCountry. Only the reference to LocationProvider was wrong. Changed static Reference.
(In ) Refs #1823, fixed conversion tracking omission in initial GeoIP commit & added tests for conversion locations & unknown location. Removed 'Unknown' regions & cities from visitor log tooltips, added more detailed error messages for issues w/ the PECL module, add test w/ test IP and known result to isWorking methods and some more admin UI tweaks.
(In ) Refs #1823
(In ) Refs #1823
The new reports don't have a report documentation. Next to the country, region and city headlines, the question mark icon apprears but clicking it shows no documentation. For the continent report, the question mark icon doesn't appear at all, which is OK. IMO the reports should either be documented or the question mark should not be shown.
(In ) Refs #1823, test GeoIP w/ normal tracking, bulk tracking & log importing. Test if Apache module is working using GEOIP_ADDR server variable instead of GEOIP_COUNTRY_CODE since the latter may not always be set.
(In ) Refs #1823, fade 'Done' in & out after successfully switched location providers.
(In ) Refs #1823, many changes including:
(In ) Refs #1823, display informative note when Region + City reports have no location data, merge unknown rows in region & city reports and make sure latitude/longitude is rounded in API output.
(In ) Refs #1823 Adding important-to-have target=_blank
This bug is fixed. :) I created a ticket for improvements here: #3442
(In ) Refs #1823, add note to geoipUpdateRows.php that tells user to re-process their reports.
(In ) Refs #1823, add note to IP anonymization about geolocation accuracy.
(In ) Refs #1823, add alternative check for GEOIP_COUNTRY_CODE $_SERVER var to ServerBased GeoIP implementation.
(In ) Refs #1823, do broken check w/ both GEOIP_ADDR & GEOIP_COUNTRY_CODE.
(In ) Refs #1823, added redundant trusted hosts warning to general settings page & display help icon that links to faq in warning.
We need YOUR help! We are running a crowd funding campaign to raise funds to implement the detailed Visitors Maps of Countries, Regions and Cities (for all countries)!
These maps will be beautiful, usable, and built using open standards SVG+JS. They will show detailed visitor count, conversion rates, by Country but also (New!) by city and region.
Piwik needs you!