Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize DeviceDetector and cache it using LazyCache #20224

Closed
wants to merge 1 commit into from

Conversation

oliverbestmann
Copy link

@oliverbestmann oliverbestmann commented Jan 17, 2023

The issue is that the constructor DeviceDetector is really really slow, as it is reading and compiling a lot of regular expressions from a yaml file every time a new instance is created. This is done on almost every tracking request. There is already an in-memory cache, but as each tracking requests starts with a new in-memory cache, this isn't really useful. The DeviceDetector itself has a cache, which prevents evaluating the regular expressions against a user agent string, but it looks like a lot of regular expressions still get compiled.

This shaves off 50% of time per tracking request, getting us from 400req/s to 700req/s or from above 150ms to below 100ms. We're using redis as a cache.

Response time for the tracker:
20230117-171159

Number of requests per second for the tracker:
20230117-171214

I guess it is clear that the change was applied at 16:50.

I don't know much php, so this solution is most certainly a dirty hack, but maybe someone can work with this and improve on the solution.

A better solution would probably be:

  • decouple the DeviceDetector from the detection result
  • cache only the detection result in redis

But that would mean breaking changes in the DeviceDetector library which were out of scope of my "hack".

Review

The issue is that the constructor DeviceDetector is really really slow, as it is reading and compiling a lot of regular expressions from a yaml file. This is done on almost every tracking request. There is already an in-memory cache, but as each tracking requests starts with a new in-memory cache, this isn't really useful. The DeviceDetector itself has a cache, which prevents evaluating the regular expressions against a user agent string, but it looks like a lot of regular expressions still get compiled.

This shaves off 50% of time per tracking request, getting us from 400req/s to 700req/s or from above 150ms to below 100ms. We're using redis as a cache.

I don't know much php, so this solution is most certainly a dirty hack, but maybe someone can work with this and improve on the solution.
@sgiehl
Copy link
Member

sgiehl commented Jan 17, 2023

Hi @oliverbestmann
Thanks for providing this PR. I'll have a closer look at that tomorrow. But normally Matomo should use the LazyCache as Cache for DeviceDetector. (See https://github.com/matomo-org/matomo/blob/4.x-dev/core/DeviceDetector/DeviceDetectorFactory.php#L66-L72 and https://github.com/matomo-org/matomo/blob/4.x-dev/core/DeviceDetector/DeviceDetectorCache.php#L30)
There though had been a bug in DeviceDetector so that some smaller yml files had been parsed every time if client hints were provided. See matomo-org/device-detector#7329
This fix was not yet included in a Matomo release, but it will be in the next patch release, which will most likely come up next week.

@oliverbestmann
Copy link
Author

Correct me if I am wrong, but the DeviceDetector cache only caches the regular expressions (as strings) that it reads from the yaml file. The code still compiles and runs the expressions using preg_* independently of any caching.

@sgiehl
Copy link
Member

sgiehl commented Jan 17, 2023

Yes. that's true. There is actually a plugin "DeviceDetectorCache", which allows caching the most common useragents.
Caching all results of parsed useragents / client hints combinations might be a bit too many data maybe...

@oliverbestmann
Copy link
Author

Yea. The plugin reads the log file afaik, but sadly that's not an option for us.
Regarding the amount of data: That's why I am putting the result into the cache with a short ttl. I am now using a ttl of an hour, but caching the results for only a few minutes gives about the same result. I guess that only works with the redis cache.
I can have a look on how the plugin stores the precomputed results and maybe adapt that approach into this pull request, if there is interest.

@tsteur
Copy link
Member

tsteur commented Jan 17, 2023

FYI I haven't read all the comments etc but just wanted to mention that we didn't go for such a solution back then as it can become a security issue for example. Like an attacker could send requests with all kind of different user agents and fill up disk space. Even with a TTL Matomo currently won't remove the outdated files and depending on server performance and available disk space someone could still fill up disk space maybe within an hour.

@oliverbestmann
Copy link
Author

Okay. That wouldn't be a problem if you're using redis as a cache and set a memory limit and some lru eviction policy in case of memory pressure.
I do see that this might be a problem with the disk based cache solution that's also available.

@sgiehl
Copy link
Member

sgiehl commented Jan 18, 2023

Redis is something that most users won't be using. So always using the LazyCache to cache device detector results isn't something we should add. Personally I would also move any kind of additional caching around that to the DeviceDetectorCache plugin. That one could e.g. also have an option to cache the results dynamically instead of doing that by parsing log files...

@sgiehl sgiehl closed this Jan 18, 2023
@oliverbestmann
Copy link
Author

I've now created a plugin with the functionality from this pull request:
https://plugins.matomo.org/DynamicDeviceDetectorCache

@sgiehl
Copy link
Member

sgiehl commented Jan 18, 2023

awesome 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants