New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter more bots which execute JavaScript and are tracked by Matomo #17984
Comments
Very quick note: You might want to look into this plugin: |
Nice, worth giving a shot. I see there is this "Referrer spam" protection part of Matomo, but I though there was a user agent based filter as well? EDIT: While it looks reasonable, the plugin does not really helps against bots and search engine crawlers, as long as they don't originate from one of the cloud provider IP ranges. The "headless browser" detection is only a small list of user agents which I have never seen before: https://github.com/matomo-org/plugin-TrackingSpamPrevention/blob/4.x-dev/BrowserDetection.php EDIT2: Here is what I was actually looking for: https://github.com/matomo-org/device-detector/blob/master/regexes/bots.yml |
Correct, this is what I also wanted to mention now that I got more time: Matomo also ignores all data by default from user agent, device-detector considers as bots (see e.g. https://devicedetector.lw1.at/ for an interactive version). |
I couldn't find a missing bot in the list so far 🤔. However, I'll keep looking for the faulty agent, and probably the spam protection plugin helps as well (many thanks for mentioning it). I close the issue here and, in case, open a PR at the device detector. |
Expected Behavior
Bots are filtered for the Matomo statistics and reports.
Current Behavior
Just recently we recognised a massive increase (several thousand percent) of visits of especially two pages, which are clearly related to bot activity, although I'm not 100% sure which bot it is (could be found out), the access log show quite some with significant access numbers sending requests to the tracker (implied via JavaScript).
Possible Solution
Google and Bing bots are excluded already, according to an old commit, so it should be trivial to extend the list of user agents which are not tracked. If someone could give me a hint where in the code this is done, or which 3rd party library is used for this, I'm glad to compare the list with our access logs and complement it accordingly.
Steps to Reproduce (for Bugs)
Hard to say, since bot activity highly depends on the website, the used search console/webmaster tools, backlinks, the software used etc, and whether it is actually visible or not depends on whether the bot behaves very different than a common user or not.
Context
At least the tracked bots mess with the statistics of two pages, but probably it did or does in other cases as well, which is not so significant. To have trustable statistics, it would be good to reliably ignore bots or having them handled separately.
Your Environment
The text was updated successfully, but these errors were encountered: