@tsteur opened this Issue on June 28th 2019 Member

@sgiehl how hard would it be to have a cache for say the top 20 or top 30 most commonly used user agents? Like an array say

$cache = array(
 'Mac Foo' => array('isBot' => false, 'OS' => 'Mac', ... ), 
);

We could every other month update the list of most commonly used user agents. This would avoid needing to parse and test a lot of regexp etc. We could then just test like if (isset($cache[$userAgent])). I think I looked up a while ago that the top 10 or top 30 user agents make up like 50% of the requests which means it would make them a lot faster.

We're sometimes seeing device detector can take 200-400ms "randomly" and maybe it would improve things. I'd be keen to move this to 3.11 if it's easy to do and we could have a look at it. If it's a bit more work, happy to move it into a different milestone.

fyi @mattab

@Findus23 commented on June 28th 2019 Member

It might be worth it to take a large set of user agents and see what the most common ones are and especially which fraction they make up.
I feel like (without having looked it up) it might be really low as apart from iPhones which are fairly uniform there are too many combinations of browser patch versions, brands, etc. to make an impact with exact matches.

@tsteur commented on June 28th 2019 Member

image

Looking at only a few requests this can become quickly worth it I would say. Especially considering Chrome's marketshare. Would just need to keep it up to date regularly. The top 6 or top 7 be enough already.

@mattab commented on July 3rd 2019 Member

Another idea, could the device detector algorithm itself be "optimised" to run faster for the most common code paths and therefore most common user agents like Chrome on windows, safari on iphone, etc. This way we wouldn't have to maintain the list the user agents and just the code would always be faster for those? maybe that's already implemented in this way somehow? or maybe it's not really possible to make the algorithm faster in this way?

@sgiehl commented on July 3rd 2019 Member

Unfortunately that doesn't work that easy. The detection for Chrome for example needs to be done very "late", as many other browsers rely on Chromium and also contain Chrome in the useragents together with their own identifier. Without changing the current structure it won't be possible to improve the speed much, besides caching full useragents and matching rules

@tsteur commented on July 17th 2019 Member

We could btw also extract a list of say 500 or 1000 user agents... and update this list regularly to ideally not need to run any regex for most user agents. We could then update this list say monthly (especially once there are new chrome/iOS versions).

@tsteur commented on August 5th 2019 Member

Here's some rough thoughts

diff --git a/core/DeviceDetectorFactory.php b/core/DeviceDetectorFactory.php
index 2217dbe7c0..bfef8aeae2 100644
--- a/core/DeviceDetectorFactory.php
+++ b/core/DeviceDetectorFactory.php
@@ -11,10 +11,44 @@ namespace Piwik;
 use DeviceDetector\DeviceDetector;
 use Piwik\Common;

+class DeviceDetectorCache extends DeviceDetector {
+
+    private $cachedAgents = array();
+
+    public function __construct(){
+        // load this list from a file, eg in misc/useragents.php
+        // we need a command to generate / update this list from a list of user agents
+        // on cloud we have like say 2000 user agents...
+        // another option maybe one file per user agent misc/useragents/$(md5(userAgent)).php which we could use
+        // if the cache file gets big for 1000 user agents...
+        $this->cachedAgents = array(
+            'the user agent...' => array('bot' => null, 'client' => '', ''),
+            'another user agent...' => array('bot' => null, 'client' => '')
+        );
+    }
+
+    public function isCached()
+    {
+        return !empty($this->cachedAgents[$this->userAgent]);
+    }
+
+    public function parse()
+    {
+        $this->bot = $this->cachedAgents[$this->userAgent]['bot'];
+        $this->client = $this->cachedAgents[$this->userAgent]['client'];
+        $this->device = $this->cachedAgents[$this->userAgent]['device'];
+        $this->model = $this->cachedAgents[$this->userAgent]['model'];
+        $this->brand = $this->cachedAgents[$this->userAgent]['brand'];
+        $this->os = $this->cachedAgents[$this->userAgent]['os'];
+    }
+}
+
 class DeviceDetectorFactory
 {
     protected static $deviceDetectorInstances = array();

+    private static $cache = null;
+
     /**
      * Returns a Singleton instance of DeviceDetector for the given user agent
      * <a class='mention' href='https://github.com/param'>@param</a> string $userAgent
@@ -28,10 +62,19 @@ class DeviceDetectorFactory
             return self::$deviceDetectorInstances[$userAgent];
         }

+        $cache = new DeviceDetectorCache($userAgent);
+        if ($cache->isCached()) {
+            self::$deviceDetectorInstances[$userAgent] = $cache;
+            return $cache;
+        }
+
         $deviceDetector = new DeviceDetector($userAgent);
         $deviceDetector->discardBotInformation();
         $deviceDetector->setCache(new DeviceDetectorCache(86400));
         $deviceDetector->parse();
+        
+        // in the command ...
+        // $cached[$userAgent] = array('isBot' => $deviceDetector->isBot(), 'client' => $deviceDetector->getClient());

         self::$deviceDetectorInstances[$userAgent] = $deviceDetector;
@tsteur commented on August 7th 2019 Member

@katebutler just seeing for some different reason in core/DeviceDetectorCache.php in the constructor we are using the getEagerCache. As part of this ticket it be good to change it to getLazyCache() as otherwise it puts too much information into the eager cache.

This Issue was closed on August 25th 2019
Powered by GitHub Issue Mirror