New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geoip2 geolocation lookup is slow #12955
Comments
Maxmind provides a PHP extension that should improve performance: https://github.com/maxmind/MaxMind-DB-Reader-php/#optional-php-c-extension |
The reason why the lookup is "slow" is quite simple to explain. Each tracking request is handled in a separate process, which opens the mmdb file for lookup. Opening the mmdb and reading the metadata takes a while. The only way to speed up the lookup would be to handle everything in one process, so the mmdb only needs to be opened up once. |
Does the PHP extension also create a new process? |
Good question. Didn't have a closer look at the C code, but I don't think it would make sense otherwise: https://github.com/maxmind/MaxMind-DB-Reader-php/blob/master/ext/maxminddb.c |
Hard to tell from the C code, guess a benchmark + more information from the user's setup are the next steps. |
Not a C profi, but looking at the readme it should be able to provide one global reader and query this reader every time? global use MaxMind\Db\Reader;
$reader = new Reader('GeoIP2-City.mmdb'); query $foo = $reader->get($ipAddress); Or are you not able to do this because you cannot access a preinstanced reader in the piwik.php calls? |
@sgiehl @diosmosis How do-able is it to make this change and improve performance when QueuedTracking is used? |
Looks like it's already cached: https://github.com/matomo-org/matomo/blob/3.x-dev/plugins/GeoIp2/LocationProvider/GeoIp2/Php.php#L315 |
@mattab Do you know if they're using the PHP reader now? Do you know what they were using w/ GeoIP v1? |
They're using the default code we provide (no PHP extension) @diosmosis |
@mattab one reason why GeoIp2 might be slower: GeoIP Legacy did not include any IPv6 data. GeoIp2 includes all IPv6 data. That's a lot more data... |
Could we maybe profile the code and get to the bottom of the issue / understand exactly the slowness and if it can be improved? |
I just ran a simple benchmark by running 40.000 dynamic IP lookups within my virtual machine. With default PHP library without the extensions we have: With the extension installed: So everyone who want's to run fast lookups should install the extension. There is nothing we can improve that much to get such a speed improvement like the extension. |
Thanks @sgiehl - very useful... |
@mattab I have updated the FAQ to describe how to install the extension. Maybe you could have a look and maybe check if it's easy enough to understand... |
Could the FAQ entry also cover how to get notifications when there is an update of the extension available and that it needs to be recompiled when changing PHP version etc? If possible, that would be great as it is important to keep the extensions up to date re bugfixes etc and to eventually avoid random issues etc. |
Also I suggest to write a blog post about this, and maybe we mention it in the next newsletter in the "did you know" section as it can cause major problems to many Matomo's the performance break. Also in the release changelog should mention it. |
Should we create separate issue for the blog post and the newsletter entry? |
@sgiehl Could you please paste the benchmark script you ran? we'd like to run it in production on the powerful box and see how it behaves there. Also, can you compare Geoip1 lookup VS Geoip2 (no extension) on your virtual machine? |
<?php
require_once './vendor/autoload.php';
use GeoIp2\Database\Reader;
use GeoIp2\Exception\AddressNotFoundException;
$reader = new Reader('misc/GeoLite2-City.mmdb');
$count = 40000;
$startTime = microtime(true);
for ($i = 0; $i < $count; $i++) {
$ip = long2ip(rand(0, pow(2, 32) - 1));
try {
$t = $reader->city($ip);
} catch (AddressNotFoundException $e) {
}
if ($i % 1000 === 0) {
echo $i . ' ' . $ip . "\n";
}
}
$endTime = microtime(true);
$duration = $endTime - $startTime;
echo 'Requests per second: ' . $count / $duration . "\n"; should be runable directly in Matomo home dir. |
will this do the "heavy" work each time? or is there anything cached? |
It only opens geoip database once. so it only measures the time needed for each lookup. While tracking the database might get opened for each request (if not using queued). So that comes on top. But maybe that's not the case when using the extension. Guess that can be easily tested when moving the |
We are basically trying to figure out how it compares GeoIP1 vs GeoIP2 to know if we need the extension or not. Ideally we wouldn't need the extension. |
Here's the same script to benchmark geoip legacy with city database: <?php
require_once './libs/MaxMindGeoIP/geoipcity.inc';
$geoip = geoip_open('./misc/GeoIPCity.dat', GEOIP_STANDARD);
$count = 40000;
$startTime = microtime(true);
for ($i = 0; $i < $count; $i++) {
$ip = long2ip(rand(0, pow(2, 32) - 1));
$t = geoip_record_by_addr($geoip, $ip);
if ($i % 1000 === 0) {
echo $i . ' ' . $ip . "\n";
}
}
$endTime = microtime(true);
$duration = $endTime - $startTime;
echo 'Requests per second: ' . $count / $duration . "\n"; For my local instance that results in: Requests per second: 1710.4010527379 (0,000584658 s / req) so compared GeoIP2 is much faster with the extension, but a lot slower without the extension. |
GeoIp2 without extension #12955 (comment) took
GeoIp1 from #12955 (comment):
I have also tested the script slightly tweaked to open the DB each time in the for loop...
So it seams pretty much 1ms slower which adds quickly 2-3% to each tracking request (or more or less). |
and fyi I tried to install the extension as described on https://matomo.org/faq/how-to/faq_164/ through git, but there is a |
You have to run |
I've improve the FAQ and mentioned the additional commands needed when cloning from git |
What about the blog post announcement? |
I will probably draft something otherwise... |
A blog post has been published. Should we maybe add a more visible note in the description of the provider that it is "slow" without the extension? |
I think notice is not needed actually. |
We got a report which says: " it seems that it takes 0.08 seconds per request for it to lookup the GeoLite2-City.mmdb file, which is about 1.589 days of lookups based on the queue size in my earlier message."
This is for a customer using QueuedTracking who has 1.7 million requests in the queue and the queue doesn't process, likely because the Geoip2 lookup is too slow.
-> we need to investigate/profile the speed of Geoip2 lookup and figure out how long it takes and whether it regressed from Geoip1 in terms of performance.
Can we improve the performance in some ways? cc @sgiehl @diosmosis
The text was updated successfully, but these errors were encountered: