New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize DeviceDetector and cache it using LazyCache #20224
Conversation
The issue is that the constructor DeviceDetector is really really slow, as it is reading and compiling a lot of regular expressions from a yaml file. This is done on almost every tracking request. There is already an in-memory cache, but as each tracking requests starts with a new in-memory cache, this isn't really useful. The DeviceDetector itself has a cache, which prevents evaluating the regular expressions against a user agent string, but it looks like a lot of regular expressions still get compiled. This shaves off 50% of time per tracking request, getting us from 400req/s to 700req/s or from above 150ms to below 100ms. We're using redis as a cache. I don't know much php, so this solution is most certainly a dirty hack, but maybe someone can work with this and improve on the solution.
Hi @oliverbestmann |
Correct me if I am wrong, but the DeviceDetector cache only caches the regular expressions (as strings) that it reads from the yaml file. The code still compiles and runs the expressions using |
Yes. that's true. There is actually a plugin "DeviceDetectorCache", which allows caching the most common useragents. |
Yea. The plugin reads the log file afaik, but sadly that's not an option for us. |
FYI I haven't read all the comments etc but just wanted to mention that we didn't go for such a solution back then as it can become a security issue for example. Like an attacker could send requests with all kind of different user agents and fill up disk space. Even with a TTL Matomo currently won't remove the outdated files and depending on server performance and available disk space someone could still fill up disk space maybe within an hour. |
Okay. That wouldn't be a problem if you're using redis as a cache and set a memory limit and some lru eviction policy in case of memory pressure. |
Redis is something that most users won't be using. So always using the LazyCache to cache device detector results isn't something we should add. Personally I would also move any kind of additional caching around that to the DeviceDetectorCache plugin. That one could e.g. also have an option to cache the results dynamically instead of doing that by parsing log files... |
I've now created a plugin with the functionality from this pull request: |
awesome 💯 |
The issue is that the constructor DeviceDetector is really really slow, as it is reading and compiling a lot of regular expressions from a yaml file every time a new instance is created. This is done on almost every tracking request. There is already an in-memory cache, but as each tracking requests starts with a new in-memory cache, this isn't really useful. The DeviceDetector itself has a cache, which prevents evaluating the regular expressions against a user agent string, but it looks like a lot of regular expressions still get compiled.
This shaves off 50% of time per tracking request, getting us from 400req/s to 700req/s or from above 150ms to below 100ms. We're using redis as a cache.
Response time for the tracker:
Number of requests per second for the tracker:
I guess it is clear that the change was applied at 16:50.
I don't know much php, so this solution is most certainly a dirty hack, but maybe someone can work with this and improve on the solution.
A better solution would probably be:
DeviceDetector
from the detection resultBut that would mean breaking changes in the
DeviceDetector
library which were out of scope of my "hack".Review