New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many bots fail to call the tracker PHP endpoints #17572
Comments
@remz-otw |
@MichaIng Most bots are not able to load the javascript. Google might maybe ignore it to avoid unwanted tracking requests. |
@sgiehl I agree but since it is an error an not a good thing in a SEO point of vue |
Google and Bing both load the JavaScript for sure, otherwise the Probably skipping JavaScript would help regardless, but I use the Cloudflare app to enable tracking on all our pages and there is no choice possible currently. Out of interest, how would the HTML code look like to have PHP tracking done without the JavaScript? Okay, I see the possibility of tracking via image source, although wrapping it into |
Does a request from Google bot to the PHP endpoint reach your server? Just want to make sure they are actually performed by the bots. If they are listed in your access log, then it might be an issue with Matomo or maybe the BotTracker plugin. For me it sounds like Google does not perform the request for any reason. If that is the case you could try to play around with some settings that change the request, like disabling sendBeacon or forcing a POST request maybe. |
@MichaIng I run into my log i got an example of Googlebot requesing motomo.js but not the matomo.php EDIT: I didn't find any request for matomo.php from Googlebot |
Okay, so we need to find out why Google and most other bots do not perform this final PHP request then. I wonder if it's done when using an image-based request without wrapping it into
I'll create a test page with this and run it through the mobile-friendly test. |
Ok i Add But then bots still not tracked |
Now when i use "mobile friendly test tool" or "url inspection tool" from search console and even lighthouse it get catch by matomo |
Jep, using the raw image source tracker works here as well (mobile-friendly test, done three times, tracked three times):
So I suspect the |
I don't think is cause by the long query string cause it's work now with long query string |
Jep, I just tested it as well and the exact same URL that fails from JavaScript succeeds from the image. |
Matomo uses sendBeacon by default (if the browser supports it). Maybe such requests are blocked. |
Did you try to change the javascrpit code and add |
@MichaIng Maybe try the java request with send_image=1 to test if it is related |
Forcing |
Btw, if the BotTracker app is disabled, are bots supposed to be tracked as regular visitors, how are they dropped? I just disabled it and the mobile-friendly test still reports the error and I cannot see a related visitor log. So I guess we can rule out the app to be the culprit. But I see you're already further with GET/sendBeacon 👍. Since I use the Cloudflare app, I cannot change the script snipped and cannot add parameters to the request for production. But I'll test it as additional code in the test page. |
But then now it work with sendbeacon disable but what is the drowback of such config ? If in matomo 4 it on by default it should be for a good reason ? Less Acurate tracking or less good user experience ? |
I can verify that it works with |
Matomo by default discards all visitors that are recognized as bots here: matomo/core/Tracker/VisitExcluded.php Lines 189 to 197 in df68fbc
So when the user agent is detected as bot or within certain IP ranges its sorted out unless the parameter But to be clear here: Tracking bots within Matomo actually will ruin most of the reports. For most websites it's relevant to know how many visitors a website has and how they perform. Counting bots might explode those numbers and make them meaningless or unreliable. In most cases you may not even be able to distinguish between normal visits and bots within Matomo. To only disable sendBeacon you could use: |
Great, yes it makes totally sense to not count bots by default, and the BotTracker app then has a reasonable purpose to enable it, but outside or regular metrics, as an extra feature.
|
BotTracker uses it's own detection for bots as far as I know. So it might happen that Matomo would have ignored a request but BotTracker doesn't detect it as bot. https://github.com/Thomas--F/BotTracker/issues/60 would fix this |
At least it detects bots based on the user agent, so when Matomo drops a bot with a user agent that has not been added to the BotTracker list (it can be edited/extended), then it wouldn't be tracked at all. But all important ones are part of the default list. Okay, so while I cannot alter the invocation snipped in the Cloudflare app, I can disable Reading: https://developer.mozilla.org/docs/Web/API/Navigator/sendBeacon In my case, with Cloudflare app, the script is in the page head without defer flag, so it shouldn't be an issue in this particular case. But usually you don't want such scripts to defer the visual/functional page load, where the issue is more apparent. Workarounds are listed in the docs above. |
My personal workaround for now: In Best would be actually if this was only done for bots. I had a look into the BotTracker source code if there is a way set it there, but there is nothing that would affect those flags prior to the actual tracking request. The CustomJsTracker plugin should generally enable that, but I guess again this cannot be done based on user agent, as the user agent is derived AFTER the tracker request has been done? Chicken and egg it seems to me 😄. |
I've test remove |
This is how I understand it:
|
You've right I realize that when Bot Tracker is a bit buggy and not regularly maintain since matomo maintain a list of robots it should be easy to merge this module in the core no ? |
Where is this list, actually? I see the |
I don't know it was referred in https://github.com/Thomas--F/BotTracker/issues/60 |
Ah, it's a separate repository: https://github.com/matomo-org/device-detector/blob/master/regexes/bots.yml |
@MichaIng I will close this issue now as it seems there's nothing left to do here. Please let me know if you are still otherwise waiting on anything. |
Now that #17497 has been done, there is another issue that most bots, especially the important Google and Bing bots, fail to call the tracker PHP endpoints
matomo.php
/piwik.php
, while generallyrobots.txt
and headers do allow it and some bots are successfully tracked.Matomo, PHP and webserver do not report any errors, so the request seems to fail right at the crawler, probably related to a too long query string or invalid characters? Example URL shown by Google mobile-friendly test:
Further details:
Expected Behavior
All bots that crawl the website are tracked and visible in Matomo.
Current Behavior
Many bots fail to call the tracker PHP endpoints and hence do not appear in Matomo.
Possible Solution
None found so far.
Steps to Reproduce (for Bugs)
Context
The behaviour of search engine crawlers, which pages they crawl, how often etc, can be important to detect issues, optimise crawler settings/schedules, etc to balance server load and optimise SEO.
Your Environment
The text was updated successfully, but these errors were encountered: