Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude as many automated bots as possible from being tracked (more reliable tracking) #5845

Closed
hpvd opened this issue Jul 16, 2014 · 6 comments
Labels
answered For when a question was asked and we referred to forum or answered it. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Milestone

Comments

@hpvd
Copy link

hpvd commented Jul 16, 2014

Improve statistic quality / More realistic statistics

There are more and more AUTOMATIC visitors around and distort/falsify our great statistics:
not only

  1. search engines which build and update their index
    but also
  2. lots of website watchers who notify their user on content changes:
    a) standalone softwares
    b) browser plugins
    c) webservices
    and
  3. simple browsers with lots of open tabs:
    a) Depending on the browser and its setting, in the worst case every time the browser starts a new visit is counted:
    -after booting and opening the browser every morning
    -after updates (especially alpha/beta users may gets lots of updates..)
    -after installing/update some browser plugins
    -...
    b) websites which were used and are not left or closed but are still opened in a browser tab doesn't stop making visit-time count!
  4. probably some more

=> Piwik should take action to get rid of all this misinformation!!

What could be done (cleaning actions):

  1. Exclude visitors which are known/identify as searchengine indexer
  2. Exclude visitors which always visit a subpage at the same time (+- a tolerance of some seconds)
  3. Exclude visitors which visit a subpage in continuous intervals: every hours, every 2 hours...
  4. Exclude visitors which opens more than one subpage at the same time (+- 1s tolerance)
  5. Exclude vistors with no events on the subpage (if you have some events setup on every site)
  6. Exclude visitors staying longer than e.g. 15minutes on a subpage
  7. Exclude visitors which belong to the e.g. 5% of the most returning visitors on a subpage
  8. probably many more things

To make this as transparent as possible, all statistic cleaning-actions should be transparent visible to and selectable by the piwik users.

There are at least two places where this could be done:
A) Globally in the settings
maybe at a new tab "Statistic quality"
via a list with checkboxes to activate each single type of "cleaning action"
B) as items in the segment editor /filter (e.g. when disabled it globally)

What do you think of this topic and the details?
I think this has a huge impact on our statistics and in consequence on our interpretation and the things we do/change/add/modify on our websites
Please add any comments and additional ideas!!

@hpvd
Copy link
Author

hpvd commented Jul 16, 2014

this is the more complete version of the ticket #3556

@mattab mattab changed the title Improve statistic quality / More realistic statistics Exclude as many automated bots as possible from being tracked (more reliable tracking) Aug 3, 2014
@mattab mattab added this to the Mid term milestone Aug 3, 2014
@mattab
Copy link
Member

mattab commented Aug 3, 2014

This is a fine idea. Have you often seen live examples of such bots?

@sgiehl
Copy link
Member

sgiehl commented Aug 3, 2014

FYI: As long as the bots, spiders, and so an can be detected by their user agent, they should be added to the DeviceDetector. Bots detected by DeviceDetector are ignored by Piwik as default (unless bots=1 is given while tracking). For a full list of detected bots see https://github.com/piwik/device-detector/blob/master/regexes/bots.yml

@hpvd
Copy link
Author

hpvd commented Aug 3, 2014

I know lots of people who uses websites watcher to always have a message if there is

  • a new version of a software they use
  • a new version of a software add on they use (I personal use it for piwik plugin site as well ;-)
  • a news on their favourite blog or news site
  • a comment on a blog/forum they use
  • a price change for a special product
  • ....

-> Its really comfortable to have all this in one place!

and if you look on relatively special / deep topics websites which have a view counter on it, you can see that you with your automatic watcher are sometimes e.g. 20% of their visits...

@hpvd
Copy link
Author

hpvd commented Aug 3, 2014

I'm afraid the great device-detector does only help for ticket topic "1)" because

  • ticket topic "2)": the bots identify regularly as IE to have less problems with compatibility (HTML,CSS,JS) or be able to do some complex things like logging in a website and track the changes in the customer area
    or the bots are part of a regular browser (e.g. browser add on)
  • ticket topic "3)": it's no real bot its the browser itself

@mattab
Copy link
Member

mattab commented Oct 4, 2016

Hello, please report all bots to our other popular Device Detector project, which also keeps tracks of bots: https://github.com/piwik/device-detector/

@mattab mattab closed this as completed Oct 4, 2016
@mattab mattab added the answered For when a question was asked and we referred to forum or answered it. label Oct 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

3 participants