Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device detector cache #14585

Closed
tsteur opened this issue Jun 28, 2019 · 7 comments · Fixed by #14751
Closed

Device detector cache #14585

tsteur opened this issue Jun 28, 2019 · 7 comments · Fixed by #14751
Assignees
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Jun 28, 2019

@sgiehl how hard would it be to have a cache for say the top 20 or top 30 most commonly used user agents? Like an array say

$cache = array(
 'Mac Foo' => array('isBot' => false, 'OS' => 'Mac', ... ), 
);

We could every other month update the list of most commonly used user agents. This would avoid needing to parse and test a lot of regexp etc. We could then just test like if (isset($cache[$userAgent])). I think I looked up a while ago that the top 10 or top 30 user agents make up like 50% of the requests which means it would make them a lot faster.

We're sometimes seeing device detector can take 200-400ms "randomly" and maybe it would improve things. I'd be keen to move this to 3.11 if it's easy to do and we could have a look at it. If it's a bit more work, happy to move it into a different milestone.

fyi @mattab

@tsteur tsteur added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. c: Performance For when we could improve the performance / speed of Matomo. and removed Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. labels Jun 28, 2019
@tsteur tsteur modified the milestones: 3.11.0, 3.12.0 Jun 28, 2019
@Findus23
Copy link
Member

It might be worth it to take a large set of user agents and see what the most common ones are and especially which fraction they make up.
I feel like (without having looked it up) it might be really low as apart from iPhones which are fairly uniform there are too many combinations of browser patch versions, brands, etc. to make an impact with exact matches.

@tsteur
Copy link
Member Author

tsteur commented Jun 28, 2019

image

Looking at only a few requests this can become quickly worth it I would say. Especially considering Chrome's marketshare. Would just need to keep it up to date regularly. The top 6 or top 7 be enough already.

@mattab
Copy link
Member

mattab commented Jul 3, 2019

Another idea, could the device detector algorithm itself be "optimised" to run faster for the most common code paths and therefore most common user agents like Chrome on windows, safari on iphone, etc. This way we wouldn't have to maintain the list the user agents and just the code would always be faster for those? maybe that's already implemented in this way somehow? or maybe it's not really possible to make the algorithm faster in this way?

@sgiehl
Copy link
Member

sgiehl commented Jul 3, 2019

Unfortunately that doesn't work that easy. The detection for Chrome for example needs to be done very "late", as many other browsers rely on Chromium and also contain Chrome in the useragents together with their own identifier. Without changing the current structure it won't be possible to improve the speed much, besides caching full useragents and matching rules

@tsteur
Copy link
Member Author

tsteur commented Jul 17, 2019

We could btw also extract a list of say 500 or 1000 user agents... and update this list regularly to ideally not need to run any regex for most user agents. We could then update this list say monthly (especially once there are new chrome/iOS versions).

@tsteur tsteur modified the milestones: 3.13.0, 3.12.0 Jul 29, 2019
@tsteur
Copy link
Member Author

tsteur commented Aug 5, 2019

Here's some rough thoughts

diff --git a/core/DeviceDetectorFactory.php b/core/DeviceDetectorFactory.php
index 2217dbe7c0..bfef8aeae2 100644
--- a/core/DeviceDetectorFactory.php
+++ b/core/DeviceDetectorFactory.php
@@ -11,10 +11,44 @@ namespace Piwik;
 use DeviceDetector\DeviceDetector;
 use Piwik\Common;
 
+class DeviceDetectorCache extends DeviceDetector {
+
+    private $cachedAgents = array();
+
+    public function __construct(){
+        // load this list from a file, eg in misc/useragents.php
+        // we need a command to generate / update this list from a list of user agents
+        // on cloud we have like say 2000 user agents...
+        // another option maybe one file per user agent misc/useragents/$(md5(userAgent)).php which we could use
+        // if the cache file gets big for 1000 user agents...
+        $this->cachedAgents = array(
+            'the user agent...' => array('bot' => null, 'client' => '', ''),
+            'another user agent...' => array('bot' => null, 'client' => '')
+        );
+    }
+
+    public function isCached()
+    {
+        return !empty($this->cachedAgents[$this->userAgent]);
+    }
+
+    public function parse()
+    {
+        $this->bot = $this->cachedAgents[$this->userAgent]['bot'];
+        $this->client = $this->cachedAgents[$this->userAgent]['client'];
+        $this->device = $this->cachedAgents[$this->userAgent]['device'];
+        $this->model = $this->cachedAgents[$this->userAgent]['model'];
+        $this->brand = $this->cachedAgents[$this->userAgent]['brand'];
+        $this->os = $this->cachedAgents[$this->userAgent]['os'];
+    }
+}
+
 class DeviceDetectorFactory
 {
     protected static $deviceDetectorInstances = array();
 
+    private static $cache = null;
+
     /**
      * Returns a Singleton instance of DeviceDetector for the given user agent
      * @param string $userAgent
@@ -28,10 +62,19 @@ class DeviceDetectorFactory
             return self::$deviceDetectorInstances[$userAgent];
         }
 
+        $cache = new DeviceDetectorCache($userAgent);
+        if ($cache->isCached()) {
+            self::$deviceDetectorInstances[$userAgent] = $cache;
+            return $cache;
+        }
+
         $deviceDetector = new DeviceDetector($userAgent);
         $deviceDetector->discardBotInformation();
         $deviceDetector->setCache(new DeviceDetectorCache(86400));
         $deviceDetector->parse();
+        
+        // in the command ...
+        // $cached[$userAgent] = array('isBot' => $deviceDetector->isBot(), 'client' => $deviceDetector->getClient());
 
         self::$deviceDetectorInstances[$userAgent] = $deviceDetector;
 

@katebutler katebutler self-assigned this Aug 5, 2019
@tsteur
Copy link
Member Author

tsteur commented Aug 7, 2019

@katebutler just seeing for some different reason in core/DeviceDetectorCache.php in the constructor we are using the getEagerCache. As part of this ticket it be good to change it to getLazyCache() as otherwise it puts too much information into the eager cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants