Now that #17497 has been done, there is another issue that most bots, especially the important Google and Bing bots, fail to call the tracker PHP endpoints
piwik.php, while generally
robots.txt and headers do allow it and some bots are successfully tracked.
Matomo, PHP and webserver do not report any errors, so the request seems to fail right at the crawler, probably related to a too long query string or invalid characters? Example URL shown by Google mobile-friendly test:
All bots that crawl the website are tracked and visible in Matomo.
Many bots fail to call the tracker PHP endpoints and hence do not appear in Matomo.
None found so far.
The behaviour of search engine crawlers, which pages they crawl, how often etc, can be important to detect issues, optimise crawler settings/schedules, etc to balance server load and optimise SEO.
API, Actions, Annotations, BotTracker 2.01, BulkTracking, Contents, CoreAdminHome, CoreConsole, CoreHome, CorePluginsAdmin, CoreUpdater, CoreVisualizations, CustomJsTracker, DBStats, DarkTheme 1.1.6, Dashboard, DevicePlugins, DevicesDetection, Diagnostics, Goals, Heartbeat, ImageGraph, Insights, Installation, Intl, LanguagesManager, Live, LogViewer 4.0.1, Login, Marketplace, Monolog, Morpheus, Overlay, PagePerformance, PrivacyManager, Proxy, Referrers, Resolution, SEO, SegmentEditor, SitesManager, Transitions, UserLanguage, UsersManager, VisitFrequency, VisitTime, VisitorInterest, VisitsSummary, WebsiteMeasurable
I don't use Drupal. So while it looks like it's related to Matomo v4, we are not 100% certain. As you have access logs, can you verify that the
matomo.js request is done by the Google bot, but the expected following
matomo.php request is not done at all (does not reach the server)?
@sgiehl I agree but since it is an error an not a good thing in a SEO point of vue
And put an img tag
<noscript><p><img src="https://mymatomodomain.com/matomo.php?idsite=1&rec=1&bots=1" style="border:0;" alt="" /></p></noscript>
And then no bot tracked
piwik.php request with fully developed query string (see above) would not be done in the first place. So that is not the issue.
Okay, I see the possibility of tracking via image source, although wrapping it into
Does a request from Google bot to the PHP endpoint reach your server? Just want to make sure they are actually performed by the bots. If they are listed in your access log, then it might be an issue with Matomo or maybe the BotTracker plugin. For me it sounds like Google does not perform the request for any reason. If that is the case you could try to play around with some settings that change the request, like disabling sendBeacon or forcing a POST request maybe.
@MichaIng I run into my log i got an example of Googlebot requesing motomo.js but not the matomo.php
6126.96.36.199 mymatomodomain.com - [16/May/2021:18:46:21 +0200] "GET /matomo.js HTTP/1.1" 200 20456 "https://mymatomodomain.com/" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/90.0.4430.97 Safari/537.36"
EDIT: I didn't find any request for matomo.php from Googlebot
Okay, so we need to find out why Google and most other bots do not perform this final PHP request then. I wonder if it's done when using an image-based request without wrapping it into
<img src="https://domain.com/matomo.php?idsite=1&rec=1 style="border:0;" alt="" />
I'll create a test page with this and run it through the mobile-friendly test.
Ok i Add
_paq.push(['setRequestMethod', 'GET']); and remove the noscript tag and the in search console it load both ressources correcly
But then bots still not tracked
EDIT I was wrong bots are tracked Wouhou !
Now when i use "mobile friendly test tool" or "url inspection tool" from search console and even lighthouse it get catch by matomo
For lighthouse and mobile friendly tool bot tracker don't catch it it's mark "operating system: bot"
Jep, using the raw image source tracker works here as well (mobile-friendly test, done three times, tracked three times):
I don't think is cause by the long query string cause it's work now with long query string
In my case the only change in the request is &send_image=1 since before it was set to 0
Matomo uses sendBeacon by default (if the browser supports it). Maybe such requests are blocked.
Did you try to change the javascrpit code and add
it work for me
But i don't know what that involve
But since then the request is with
@MichaIng Maybe try the java request with send_image=1 to test if it is related
GET as request method automatically disables the use of
sendBeacon as it uses
Btw, if the BotTracker app is disabled, are bots supposed to be tracked as regular visitors, how are they dropped? I just disabled it and the mobile-friendly test still reports the error and I cannot see a related visitor log. So I guess we can rule out the app to be the culprit.
But I see you're already further with GET/sendBeacon 👍.
Since I use the Cloudflare app, I cannot change the script snipped and cannot add parameters to the request for production. But I'll test it as additional code in the test page.
But then now it work with sendbeacon disable but what is the drowback of such config ? If in matomo 4 it on by default it should be for a good reason ? Less Acurate tracking or less good user experience ?
I can verify that it works with
_paq.push(['setRequestMethod', 'GET']);. Is there a way to keep using POST requests but disable
sendBeacon only, to check whether it's probably POST requests in general, not
sendBeacon in particular?
Matomo by default discards all visitors that are recognized as bots here:
So when the user agent is detected as bot or within certain IP ranges its sorted out unless the parameter
bots=1 is set.
You can achieve that in the tracking code by adding
But to be clear here: Tracking bots within Matomo actually will ruin most of the reports. For most websites it's relevant to know how many visitors a website has and how they perform. Counting bots might explode those numbers and make them meaningless or unreliable. In most cases you may not even be able to distinguish between normal visits and bots within Matomo.
To only disable sendBeacon you could use:
Great, yes it makes totally sense to not count bots by default, and the BotTracker app then has a reasonable purpose to enable it, but outside or regular metrics, as an extra feature.
_paq.push(['disableAlwaysUseSendBeacon']); alone works as well, so we found the culprit indeed 👍.
BotTracker uses it's own detection for bots as far as I know. So it might happen that Matomo would have ignored a request but BotTracker doesn't detect it as bot. https://github.com/Thomas--F/BotTracker/issues/60 would fix this
At least it detects bots based on the user agent, so when Matomo drops a bot with a user agent that has not been added to the BotTracker list (it can be edited/extended), then it wouldn't be tracked at all. But all important ones are part of the default list.
Okay, so while I cannot alter the invocation snipped in the Cloudflare app, I can disable
sendBeacon by manually editing the local Matomo code. But indeed the question is what the downsides are.
So in short, using sendBeacon, the request is sent asynchronously to reduce the chance that the user navigates to another page already, unloading the page, while analytics have not been send yet. That is especially an issue when the script is loaded at the end of the page view, also with
In my case, with Cloudflare app, the script is in the page head without defer flag, so it shouldn't be an issue in this particular case. But usually you don't want such scripts to defer the visual/functional page load, where the issue is more apparent. Workarounds are listed in the docs above.
My personal workaround for now: In
Best would be actually if this was only done for bots. I had a look into the BotTracker source code if there is a way set it there, but there is nothing that would affect those flags prior to the actual tracking request. The CustomJsTracker plugin should generally enable that, but I guess again this cannot be done based on user agent, as the user agent is derived AFTER the tracker request has been done? Chicken and egg it seems to me 😄.
I've test remove
bots=1 from the request and still i got Google Bot tracked by Bot Tracker.
This mean that Bot tracker catch the info before matomo drop it this is a good thing
This is how I understand it:
bots=1, Matomo core tracks bots, but like regular users, which is not what admins want in most cases.
bots=1, then I guess the event is triggered regardless and bots were then tracked as regular user as well as in the BotTracker counter.
You've right I realize that when
bots=1 where activate, Googlebot where still catch by Bot tracker but Lighthouse and Mobile Friendly test tools where seen as regular user with no information (probably catch by the
bots=0 (default) only Googlebot is catch by Bot tracker
bots=1 make no difference for Bot Tracker
Bot Tracker is a bit buggy and not regularly maintain since matomo maintain a list of robots it should be easy to merge this module in the core no ?
Where is this list, actually? I see the
excludedUserAgent variable used but nowhere defined.
I don't know it was referred in https://github.com/Thomas--F/BotTracker/issues/60
Ah, it's a separate repository: https://github.com/matomo-org/device-detector/blob/master/regexes/bots.yml
Also I like the idea to make use of that device detector more natively, being able to get all info of the bot: https://github.com/Findus23/plugin-BotTracking
Being able to view them via
Visits per local time diagram etc would be definitely great. But I'm happy for now that all bot visits are tracked in a way.