Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter more bots which execute JavaScript and are tracked by Matomo #17984

Closed
MichaIng opened this issue Sep 8, 2021 · 4 comments
Closed

Filter more bots which execute JavaScript and are tracked by Matomo #17984

MichaIng opened this issue Sep 8, 2021 · 4 comments
Labels
answered For when a question was asked and we referred to forum or answered it.

Comments

@MichaIng
Copy link
Contributor

MichaIng commented Sep 8, 2021

Expected Behavior

Bots are filtered for the Matomo statistics and reports.

Current Behavior

Just recently we recognised a massive increase (several thousand percent) of visits of especially two pages, which are clearly related to bot activity, although I'm not 100% sure which bot it is (could be found out), the access log show quite some with significant access numbers sending requests to the tracker (implied via JavaScript).

Possible Solution

Google and Bing bots are excluded already, according to an old commit, so it should be trivial to extend the list of user agents which are not tracked. If someone could give me a hint where in the code this is done, or which 3rd party library is used for this, I'm glad to compare the list with our access logs and complement it accordingly.

Steps to Reproduce (for Bugs)

Hard to say, since bot activity highly depends on the website, the used search console/webmaster tools, backlinks, the software used etc, and whether it is actually visible or not depends on whether the bot behaves very different than a common user or not.

Context

At least the tracked bots mess with the statistics of two pages, but probably it did or does in other cases as well, which is not so significant. To have trustable statistics, it would be good to reliably ignore bots or having them handled separately.

Your Environment

  • Matomo Version: 4.5.0 beta (issue started with 4.4.x already)
  • PHP Version: 8.0.10
  • Server Operating System: Debian Bookworm
  • Additionally installed plugins:
#### Plugins Activated:
 API, Actions, Annotations, BulkTracking, CoreAdminHome, CoreConsole, CoreHome, CorePluginsAdmin, CoreUpdater, CoreVisualizations, DBStats, DarkTheme 1.1.6, Dashboard, DevicePlugins, DevicesDetection, Diagnostics, Goals, ImageGraph, Insights, Installation, Intl, LanguagesManager, Live, LogViewer 4.0.1, Login, Marketplace, Monolog, Morpheus, PagePerformance, PrivacyManager, Proxy, Referrers, Resolution, SEO, SegmentEditor, SitesManager, Transitions, UserLanguage, UsersManager, VisitFrequency, VisitTime, VisitorInterest, VisitsSummary, WebsiteMeasurable
@MichaIng MichaIng added the Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. label Sep 8, 2021
@Findus23
Copy link
Member

Findus23 commented Sep 8, 2021

Very quick note: You might want to look into this plugin:
https://github.com/matomo-org/plugin-TrackingSpamPrevention
https://plugins.matomo.org/TrackingSpamPrevention

@MichaIng
Copy link
Contributor Author

MichaIng commented Sep 8, 2021

Nice, worth giving a shot. I see there is this "Referrer spam" protection part of Matomo, but I though there was a user agent based filter as well?

EDIT: While it looks reasonable, the plugin does not really helps against bots and search engine crawlers, as long as they don't originate from one of the cloud provider IP ranges. The "headless browser" detection is only a small list of user agents which I have never seen before: https://github.com/matomo-org/plugin-TrackingSpamPrevention/blob/4.x-dev/BrowserDetection.php

EDIT2: Here is what I was actually looking for: https://github.com/matomo-org/device-detector/blob/master/regexes/bots.yml

@Findus23
Copy link
Member

Findus23 commented Sep 8, 2021

EDIT2: Here is what I was actually looking for: https://github.com/matomo-org/device-detector/blob/master/regexes/bots.yml

Correct, this is what I also wanted to mention now that I got more time: Matomo also ignores all data by default from user agent, device-detector considers as bots (see e.g. https://devicedetector.lw1.at/ for an interactive version).

@MichaIng
Copy link
Contributor Author

MichaIng commented Sep 8, 2021

I couldn't find a missing bot in the list so far 🤔. However, I'll keep looking for the faulty agent, and probably the spam protection plugin helps as well (many thanks for mentioning it).

I close the issue here and, in case, open a PR at the device detector.

@MichaIng MichaIng closed this as completed Sep 8, 2021
@tsteur tsteur added answered For when a question was asked and we referred to forum or answered it. and removed Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. labels Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it.
Projects
None yet
Development

No branches or pull requests

3 participants