Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geoip2 geolocation lookup is slow #12955

Closed
mattab opened this issue May 22, 2018 · 33 comments
Closed

Geoip2 geolocation lookup is slow #12955

mattab opened this issue May 22, 2018 · 33 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@mattab
Copy link
Member

mattab commented May 22, 2018

We got a report which says: " it seems that it takes 0.08 seconds per request for it to lookup the GeoLite2-City.mmdb file, which is about 1.589 days of lookups based on the queue size in my earlier message."

This is for a customer using QueuedTracking who has 1.7 million requests in the queue and the queue doesn't process, likely because the Geoip2 lookup is too slow.

-> we need to investigate/profile the speed of Geoip2 lookup and figure out how long it takes and whether it regressed from Geoip1 in terms of performance.

Can we improve the performance in some ways? cc @sgiehl @diosmosis

@mattab mattab added c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should. labels May 22, 2018
@mattab mattab added this to the 3.5.1 milestone May 22, 2018
@sgiehl
Copy link
Member

sgiehl commented May 22, 2018

Maxmind provides a PHP extension that should improve performance: https://github.com/maxmind/MaxMind-DB-Reader-php/#optional-php-c-extension

@sgiehl
Copy link
Member

sgiehl commented May 22, 2018

The reason why the lookup is "slow" is quite simple to explain. Each tracking request is handled in a separate process, which opens the mmdb file for lookup. Opening the mmdb and reading the metadata takes a while. The only way to speed up the lookup would be to handle everything in one process, so the mmdb only needs to be opened up once.

@diosmosis
Copy link
Member

Does the PHP extension also create a new process?

@sgiehl
Copy link
Member

sgiehl commented May 22, 2018

Good question. Didn't have a closer look at the C code, but I don't think it would make sense otherwise: https://github.com/maxmind/MaxMind-DB-Reader-php/blob/master/ext/maxminddb.c

@diosmosis
Copy link
Member

Hard to tell from the C code, guess a benchmark + more information from the user's setup are the next steps.

@fdellwing
Copy link
Contributor

Not a C profi, but looking at the readme it should be able to provide one global reader and query this reader every time?

global

use MaxMind\Db\Reader;

$reader = new Reader('GeoIP2-City.mmdb');

query

$foo = $reader->get($ipAddress);

Or are you not able to do this because you cannot access a preinstanced reader in the piwik.php calls?

@mattab
Copy link
Member Author

mattab commented May 23, 2018

The only way to speed up the lookup would be to handle everything in one process, so the mmdb only needs to be opened up once.

@sgiehl @diosmosis How do-able is it to make this change and improve performance when QueuedTracking is used?

@diosmosis
Copy link
Member

@diosmosis
Copy link
Member

@mattab Do you know if they're using the PHP reader now? Do you know what they were using w/ GeoIP v1?

@mattab
Copy link
Member Author

mattab commented May 23, 2018

They're using the default code we provide (no PHP extension) @diosmosis

@sgiehl
Copy link
Member

sgiehl commented May 23, 2018

@mattab one reason why GeoIp2 might be slower: GeoIP Legacy did not include any IPv6 data. GeoIp2 includes all IPv6 data. That's a lot more data...

@mattab
Copy link
Member Author

mattab commented May 23, 2018

Could we maybe profile the code and get to the bottom of the issue / understand exactly the slowness and if it can be improved?

@mattab mattab modified the milestones: 3.5.1, 3.6.0 May 23, 2018
@sgiehl
Copy link
Member

sgiehl commented May 23, 2018

I just ran a simple benchmark by running 40.000 dynamic IP lookups within my virtual machine.

With default PHP library without the extensions we have:
Requests per second: 67.4305675507 (0,01483 s / req)

With the extension installed:
Request per second: 17707.655926848 (0,000056 s / req)

So everyone who want's to run fast lookups should install the extension. There is nothing we can improve that much to get such a speed improvement like the extension.

@mattab
Copy link
Member Author

mattab commented May 23, 2018

Thanks @sgiehl - very useful...
Since customer is waiting for our instructions, could you please propose doc for our FAQs about geolocation so they match the Geoip2 tool? Maybe this FAQ should be updated to include these instructions? https://matomo.org/faq/how-to/faq_164/

@sgiehl
Copy link
Member

sgiehl commented May 23, 2018

@mattab I have updated the FAQ to describe how to install the extension. Maybe you could have a look and maybe check if it's easy enough to understand...

@tsteur
Copy link
Member

tsteur commented May 23, 2018

Could the FAQ entry also cover how to get notifications when there is an update of the extension available and that it needs to be recompiled when changing PHP version etc? If possible, that would be great as it is important to keep the extensions up to date re bugfixes etc and to eventually avoid random issues etc.

@tsteur
Copy link
Member

tsteur commented May 23, 2018

Also I suggest to write a blog post about this, and maybe we mention it in the next newsletter in the "did you know" section as it can cause major problems to many Matomo's the performance break. Also in the release changelog should mention it.

@tsteur
Copy link
Member

tsteur commented May 27, 2018

Should we create separate issue for the blog post and the newsletter entry?

@mattab
Copy link
Member Author

mattab commented May 28, 2018

I just ran a simple benchmark by running 40.000 dynamic IP lookups within my virtual machine.

@sgiehl Could you please paste the benchmark script you ran? we'd like to run it in production on the powerful box and see how it behaves there.

Also, can you compare Geoip1 lookup VS Geoip2 (no extension) on your virtual machine?

@sgiehl
Copy link
Member

sgiehl commented May 28, 2018

<?php

require_once './vendor/autoload.php';

use GeoIp2\Database\Reader;
use GeoIp2\Exception\AddressNotFoundException;

$reader = new Reader('misc/GeoLite2-City.mmdb');
$count = 40000;
$startTime = microtime(true);
for ($i = 0; $i < $count; $i++) {
    $ip = long2ip(rand(0, pow(2, 32) - 1));
    try {
        $t = $reader->city($ip);
    } catch (AddressNotFoundException $e) {
    }
    if ($i % 1000 === 0) {
        echo $i . ' ' . $ip . "\n";
    }
}
$endTime = microtime(true);

$duration = $endTime - $startTime;
echo 'Requests per second: ' . $count / $duration . "\n";

should be runable directly in Matomo home dir.

@tsteur
Copy link
Member

tsteur commented May 28, 2018

will this do the "heavy" work each time? or is there anything cached?

@sgiehl
Copy link
Member

sgiehl commented May 28, 2018

It only opens geoip database once. so it only measures the time needed for each lookup. While tracking the database might get opened for each request (if not using queued). So that comes on top. But maybe that's not the case when using the extension.

Guess that can be easily tested when moving the $reader = ... in the for...

@tsteur
Copy link
Member

tsteur commented May 28, 2018

We are basically trying to figure out how it compares GeoIP1 vs GeoIP2 to know if we need the extension or not. Ideally we wouldn't need the extension.

@sgiehl
Copy link
Member

sgiehl commented May 30, 2018

Here's the same script to benchmark geoip legacy with city database:

<?php

require_once './libs/MaxMindGeoIP/geoipcity.inc';

$geoip = geoip_open('./misc/GeoIPCity.dat', GEOIP_STANDARD);

$count = 40000;
$startTime = microtime(true);
for ($i = 0; $i < $count; $i++) {
    $ip = long2ip(rand(0, pow(2, 32) - 1));
    $t = geoip_record_by_addr($geoip, $ip);
    if ($i % 1000 === 0) {
        echo $i . ' ' . $ip . "\n";
    }
}
$endTime = microtime(true);

$duration = $endTime - $startTime;
echo 'Requests per second: ' . $count / $duration . "\n";

For my local instance that results in:

Requests per second: 1710.4010527379 (0,000584658 s / req)

so compared GeoIP2 is much faster with the extension, but a lot slower without the extension.

@tsteur
Copy link
Member

tsteur commented May 31, 2018

GeoIp2 without extension #12955 (comment) took

Requests per second: 1991 (0.5ms)

GeoIp1 from #12955 (comment):

Requests per second: 36,000 - 42,000 (0.025ms/ req)

I have also tested the script slightly tweaked to open the DB each time in the for loop...

  • GeoIP1: 27500 requests per second (about 0.035ms/req)
  • GeoIP2: 950 requests per second only (about 1ms/req)

So it seams pretty much 1ms slower which adds quickly 2-3% to each tracking request (or more or less).

@tsteur
Copy link
Member

tsteur commented May 31, 2018

and fyi I tried to install the extension as described on https://matomo.org/faq/how-to/faq_164/ through git, but there is a cd libmaxminddb missing and ./configure doesn't work.

@diosmosis
Copy link
Member

diosmosis commented May 31, 2018

You have to run ./bootstrap (it's in the git repo README). EDIT: ./bootstrap then ./configure .

@sgiehl
Copy link
Member

sgiehl commented May 31, 2018

I've improve the FAQ and mentioned the additional commands needed when cloning from git

@tsteur
Copy link
Member

tsteur commented May 31, 2018

What about the blog post announcement?

@tsteur tsteur closed this as completed May 31, 2018
@tsteur tsteur reopened this May 31, 2018
@tsteur
Copy link
Member

tsteur commented May 31, 2018

I will probably draft something otherwise...

@sgiehl
Copy link
Member

sgiehl commented Jun 18, 2018

A blog post has been published. Should we maybe add a more visible note in the description of the provider that it is "slow" without the extension?
Otherwise guess we can close this issue, as there is imho not much more we could do about

@tsteur
Copy link
Member

tsteur commented Jun 18, 2018

I think notice is not needed actually.

@sgiehl sgiehl closed this as completed Jun 25, 2018
@mattab
Copy link
Member Author

mattab commented Aug 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
Development

No branches or pull requests

5 participants