Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin: BotTracker to track bot-actions #2391

Closed
Thomas--F opened this issue May 2, 2011 · 155 comments
Closed

Plugin: BotTracker to track bot-actions #2391

Thomas--F opened this issue May 2, 2011 · 155 comments
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. worksforme The issue cannot be reproduced and things work as intended.

Comments

@Thomas--F
Copy link
Contributor

BotTracker Plugin

* When installed, the plugin will detect configured bots & webspiders and exclude them from the visitor-log
* All bot-hits are counted and for every bot the last visit is logged
* Some well-known bots are pre-configured

How to install?

* Download Piwik BotTracker Plugin
* Unzip the plugin and copy the extracted directory "BotTracker" in the directory piwik/plugins/
* Configure the Bot-List by editing the MySQL-Table *piwik_bot_db*.

Author

* Thomas Fasselt

Any help is welcome

Changelog

* version 0.10: first public version

Feedback

Please leave a comment if you have any feedback, suggestion, or bug report.

Keywords: bot, spider, search, engine, agent, third-party-plugin

@Thomas--F
Copy link
Contributor Author

The plugins comes with a widget to show the data from the bot_db-table.
I've testet the plugin, but there maybe still bugs in it. I don't suggest to use it in a productive enviroment.

ToDos:

  • Create Widget to configure the bot_db (add new bots, change keyword, (de-)activate bots, etc.)
  • testing! testing! testing!

Oh, and there is one big limitation:
Most bots don't use JavaScript. So if you use only the Piwik-JS-API (default), you don't get any results.
I use the PHP-API, so every bot hits my visitor-log. That's why I wrote this plugin.

@anonymous-matomo-user
Copy link

Nice I exactly look for this module,but after copying this module I get this error.

Unable to load plugin 'BotTracker' because '/home/******/public_html/plugins/BotTracker/BotTracker.php' couldn't be found. You can manually uninstall the plugin by removing the line Plugins[] = BotTracker from the Piwik config file.

The file is there! ?? Any Ideas

@Thomas--F
Copy link
Contributor Author

Hmmm.... sounds strange. I installed the plugin about 30 times during development and test.

First check, if the folder looks exact as shown in the message (e.g. "BotTracker" or "bottracker") and then check the folder and file permissions. Is the read-access restricted?

@anonymous-matomo-user
Copy link

We are using piwik 1.4 and getting getting the following error when activating the plugin

Fatal error: Call to undefined method Piwik_BotTracker::LogToFile() in example.com/analytics/piwik/plugins/BotTracker/BotTracker.php on line 41

@anonymous-matomo-user
Copy link

Replying to jekko:

We are using piwik 1.4 and getting getting the following error when activating the plugin

Fatal error: Call to undefined method Piwik_BotTracker::logToFile() in example.com/analytics/piwik/plugins/BotTracker/BotTracker.php on line 41

@Thomas--F
Copy link
Contributor Author

Hi jekko,

try the new version (v0.12).
I wrote the LogToFile-function for some debug-logging and in v0.10 i deleted the function but not all of the calls.

btw: I tested the plugin with Piwik 1.3 and 1.4

@anonymous-matomo-user
Copy link

thomas its working, ty

@Thomas--F
Copy link
Contributor Author

Changelog

* version 0.15: better Widget & new entry in visitor-menue "Bot Tracker"

@Thomas--F
Copy link
Contributor Author

Changelog

  • version 0.16:
    • Widget shows only active bots
    • BotTracker-Menue shows active-status as icon

@anonymous-matomo-user
Copy link

Hello,

I'm running piwik 1.4 with BotTracker 0.16.
My Piwik Installation monitors multiple sites.

Plugin has been installed and activated. Widged has been added. Sites contain the Java Script code.
I can't see that any bot access gets counted. After some days I used google webmaster tools trying to force a "access like a bot" access. Still no count.

Question:
Is there any alternate method to simulate an access and verify the installation ?
Should the bots which allow JS be catched by the Plugin ?
Might the problem be caused by that I'm tracking multiple sites on the installation ?

Any hint very much appreciated.

Best regards, sun

@anonymous-matomo-user
Copy link

did you read this?

Oh, and there is one big limitation: Most bots don't use JavaScript. So if you use only the Piwik-JS-API (default), you don't get any results. I use the PHP-API, so every bot hits my visitor-log. That's why I wrote this plugin.

@Thomas--F
Copy link
Contributor Author

Hi sun,

first of all: the plugin is not able to track multi-sites. But I will put that on my todo-list.
The results are currently the sum of all sites.

To test the plugin I use Firefox with a plugin called "User Agend R G".

But remeber: The plugin will only catch non-JS-Vots if you use the PHP-Tracking-API!
Most Bots don't use JS, so don't expect much results when you only use the standard-tracking-code!

@Thomas--F
Copy link
Contributor Author

Dev-Status:
I am currently trying to improve the configuration of the plugin:

  • enable/disable a bot by clicking on the icon
  • open a sub-table with a click on the bot-name. In this sub-table you change the bot-name and/or the keyword. There should also be a delete-button
  • add-new-bot-button at the bottom of the table

In addition to that I will change the database so the plugin can track multiple sites seperately.


I'm a professional programmer for more than 16 years now, but mainly on the mainframe. In the last years I also code some java-applications but PHP is only a hobby. So don't expect a fast solution here. I try to learn the API by looking into the examples, the source and doing some try-and-error-debugging.

If someone will jump in and offers help... the door is wide open! Just leave me a message.

And the last point:
I try to track even this site by using the Image-Tracking. [[Image(http://piwik.rwk-kempen-krefeld.de/piwik.php?idsite=2&rec=1)]] Is this going to work...?

@Thomas--F
Copy link
Contributor Author

Changelog

* version 0.18: Multi-Site

I had add a column to the database, so if you install the new version, it will drop the old database and create a new one. Then the script will insert the bot-list for all sites where the user is admin.
To run the install-script you have to follow these steps:

  • deactivate BotTracker
  • log out (I'm not sure about that)
  • remove the row
    PluginsInstalled[] = "BotTracker"
    from the file
    /piwik/config/config.ini.php
  • log in and activate BotTracker

@anonymous-matomo-user
Copy link

Hi Thomas,

I looked further into my issue not getting any count or timestamp.
First of all. I updated to 0.18.
Second, please forgive me, but I havn't any glue about PHP.

Having that said, I modified BotTracker.php providing me some more log data at function checkBot.

[2011/05/10 21:42:05] user Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
[2011/05/10 21:42:05] SiteID:3
[2011/05/10 21:42:05] Row:43
[2011/05/10 21:42:05] Row[botId]:4

I can see, that the bot access gets catched and that the proper row gets detected (43). GoogleBot matches in my database ID 43. Also I can see, that the code execution enters the if statement where I print the value of $row['botId'] which by above example is 4. So the query updating the database gets performed. But, rather than updating row 43, he is updating row 4, which is the wrong site as well as the wrong bot.

Shouldn't the if statement at the checkBot statement more be like:

        if ($row > 0 ){
            $query = "UPDATE `".Piwik_Common::prefixTable('bot_db')."` 
                      SET botCount = botCount + 1
                        , botLastVisit = CURRENT_TIMESTAMP()
                      WHERE botId = ".$row." ";

            Piwik_Query($query);

            $exclude =& $notification->getNotificationObject();
            $exclude = true;
        }

I replaced $row['botId'] with just $row. At least on my installation he updates now the correct row.

Could you please verify and let me know your comment.

Best regards,

sun

@Thomas--F
Copy link
Contributor Author

Hi sun,

I get the variable $row from the function Piwik_FetchOne:

        $row = Piwik_FetchOne("SELECT botId FROM ".Piwik_Common::prefixTable('bot_db')."
                               WHERE botActive = 1 
                               AND   idSite = ".$idSite."
                               AND   LOCATE(botKeyword,'".$ua."') >0
                           LIMIT 1");

This function returns an labled array with all selected columns. In your case it should be
array( 'botId' => 43)

Because of this I'm surprised, that you get an positive result when you only use $row.

How do you print the log-data?
What PHP-Version do you use?

Best regards,
Thomas

@anonymous-matomo-user
Copy link

Hi Thomas,

it's PHP 5.2.12 and MYSQL 5.1

The 2 lines

     Piwik_BotTracker::logToFile('Row:'.$row);
     Piwik_BotTracker::logToFile('Row[botId]:'.$row['botId']);

just before the if statement generate

[2011/05/11 17:16:11] Row:43
[2011/05/11 17:16:11] Row[botId]:4

which results into using your code that row 4 gets updated and using my code, row 43 gets updated.

sun

@anonymous-matomo-user
Copy link

One more.
Above was a googlebot access.
With MSNBOT I get

[2011/05/11 17:36:13] Row:41
[2011/05/11 17:36:13] Row[botId]:4

So with my code it counts for Row 41, which is MSN in my database.
With your's, it also ends up in row 4

sun

@Thomas--F
Copy link
Contributor Author

Hi sun,

I found the error.

The function Piwik_FetchOne did not return an array, it returns the value.
Because "botId" was not defined, the array-access returns only the first char:
4 instead of 41

I will fix an test it. Thank you.

@anonymous-matomo-user
Copy link

Yep. That works for me as well. Thanks!

@robocoder
Copy link
Contributor

Thomas: I'm glad to see you're continuing to develop this oft-requested feature.

A couple of comments:

@Thomas--F
Copy link
Contributor Author

Hi vipsoft,

thanks for the tips. As you can see I already updated the plugin.

Will the update-scripts run automaticly when you update the plugin? Will all scripts run in the right order if I have skipped a version?

And I have some questions concerning the usage of Trac:

  • How can I change the description of the ticket? I want to add all the changelog-entries.
  • How can I delete the old version "BotTracker.2.zip"? I forgot once to check the box "delete existing file".

@Thomas--F
Copy link
Contributor Author

Oh, and by the way:
Can you delete the last row (last point) in comment:13?

@Thomas--F
Copy link
Contributor Author

Changelog 0.21

  • Change format of botCount from varchar to INT
  • Add new Bot "Exabot"
  • Add Update-Script for v0.18 in case someone has to update BotTracker from an very old version

I have written and tested 2 update-scripts: v0.18 and v0.21
It's a great feature of piwik. Thanks a lot to vipsoft for the hint.

@robocoder
Copy link
Contributor

In checkBot():

  • why does the UPDATE not include idSite in the WHERE clause?
  • have you compared the performance of the SELECT vs using the tracker cache? (I suspect the LOCATE() equates to a table scan.)

@Thomas--F
Copy link
Contributor Author

The UPDATE uses only the botId because it's the unique index of the table. So if I found a qualified row while using the idSite, I can update the table with just using the botId.

Do you have a description of the tracker cache and how to use it? The table is very small and my sites doesn't get very much hits, so performance tuning is not my top priority. But I will compare both ways if I can implement the cache.

@robocoder
Copy link
Contributor

The tracker cache are files in tmp/cache/tracker that are automatically loaded with each tracker request. You can see this being used by SitesManager.php in recordWebSiteDataInCache().

Please also take a look at Matt's ideas in #653

@Thomas--F
Copy link
Contributor Author

There are some points to think about:

  • Change from bot-hits to bot-visits?
    This means, if a bot hits the tracker, the botCount is only updated, if the last hit was 30 minutes ago.
  • Log every visit in an other(new) table to get a real history?
    This has to be, if I want to implement any of the "Additional features", Matt describes in his comment. My first intention was to block bots from my visitor-log. But I can understand, that these features contain very useful informations.
  • Enable the tracker cache?
    The main point to think about ist: What shoud I cache to improve performance? Maybe it's a good idea to store the last 5 user agents (or the hash of it) and the botId (zero if it wasn't a bot). So if a visitor walks through the page, the script has not to seach the whole table for every track-call.

I wish to make some of these features flexible (e.g. how much user agents should be stored or how much time between 2 hits for logging).
Should I use global variables in BotTracker.php or are there any plans for a config-dialog for plugins?

@Thomas--F
Copy link
Contributor Author

Changelog 0.22

  • implementation of an config-menu
    In "Settings" you find a new sub-menu "BotTracker" where you can change bot-names and bot-keywords, (de-)activate entries and add new entries. The config-dialog works for multi-site-installations.

I've tested a lot, but I'm sure there are bugs left. Please report anything, that doesn't work as designed.

@Thomas--F
Copy link
Contributor Author

To solve some of Matt's "aditional features" I have to generate a new table that logs every hit of a bot.
What do you think, what information should be stored in that table?

  • Timestamp
  • IP-adress (masked if the plugin is used)
  • User-Agent
  • page viewed

Is there more to think about?

@Thomas--F
Copy link
Contributor Author

The update-script is only important for those, who migrate Piwik from 1.x to 2.x
It's in the subfolder "Updates"

But I think, I found your problem.
Remove the Plugin (deactivate and deinstall) and then rename the folder from "BotTracker-master" to "BotTracker" and try it again.

GitHub names the download-file "BotTracker-master" because it's the master-branch of the source. The plugin is names BotTracker and the folder must have exact the same name.

@anonymous-matomo-user
Copy link

Oh yeah, I noticed that "master" part. At first I thought I had downloaded the wrong file.

Anyway, I renamed it like you said, and it worked! The new widget is all up and running now.

Thank you very much :-)

@mattab
Copy link
Member

mattab commented Feb 21, 2014

a quick note; if you "Upload the plugin via ZIP" (you can find this link in Settings > Marketplace > Upload a ZIP plugin") then the directory would have been put with the right name automatically.

@anonymous-matomo-user
Copy link

Oh ok, that's good to know.

I have another question now. 2 or 3 spiders have visited while I've been watching the past 30-45 minutes, but they don't show up in the widget. Such as google and baidu, which I guess are both pre-configured. Does this plugin only show the bots/spiders under certain circumstances?

@anonymous-matomo-user
Copy link

Yeah, over a 2 hour period, Google visited twice and Baidu 3 times. But the new BotTracker widget (installed 2 hours before the 1st search engine spider visit) doesn't show any visits.

Maybe I'm misunderstanding something?

There's also another discrepancy, which you probably can't help with. But maybe worth a mention, in case I'm misunderstanding, or it represents a larger problem.

SMF could be reporting 8 guests, 1 user, 2 spiders. But the Real Time Visitor Count stays at zero. And I realize that SMF reports "in the last 15 minutes" and Piwik Real Time reports every 3 minutes. But I refreshed SMF to get it's latest, and then immediately switched to Piwik tab and refreshed it. So I would think that's pretty close -- enough to catch at least a couple of visitors.

I went through that routine several times over the last 4 hours, but the SMF visitors is always higher than Piwik. And Piwik most of the time is on zero, while SMF almost always is reporting visitors. So maybe I just don't understand something?

Thanks again :-)

@Thomas--F
Copy link
Contributor Author

Woohoo - this ticket is definitely growing too big g
Maybe I should use some forum.

Hi Brynn,
how do you integrate Piwik in your website? Do you use the PHP-API or the image/javascript-tag?

Many webcrawler, spider and bots don't load the images in a page and most of them don't execute JacaScript. So you cannot track them if you don't use the PHP-API.

@Thomas--F
Copy link
Contributor Author

From now on, the current version can be downloaded in

GitHub
https://github.com/Thomas--F/BotTracker

or the
Piwik Marketplace
http://plugins.piwik.org/BotTracker

@anonymous-matomo-user
Copy link

Oh, wait a minute....either more controls have become available, or I did not see them initially.... Now there's an update, that I'm pretty sure wasn't there before. And also, there's Settings > BotTracker Configuration, which I don't think was there before, but maybe I just missed it. However, it appears to be already set up anyway.

I use both js and image tracking. (Although that seems to essentially double my visit count, so I may think about changing that. However, I'm not even clear what php-api is. I guess I can find something in the user manual about it. Although it will probably be too complicated for me to figure out. But I'll read up on it anyway.

Maybe it would be a good idea to include a note with the BotTracker plugin, that it doesn't work with js/image tracking? Or I don't know, maybe most people who use Piwik would already know that much. But like I said, I don't even know what api-php is yet.

Ok, well thanks for your help anyway. If I can figure out the api-php, and it's something I can use, maybe I will. Otherwise, I guess I just won't have info about the spiders from Piwik.

@hpvd
Copy link

hpvd commented Feb 21, 2014

update function has some problems at the moment
see #4703

@hpvd
Copy link

hpvd commented Apr 14, 2014

there seem to be a problem wit bottracker and piwik 2.2rc1
please see #4988

@anonymous-matomo-user
Copy link

Just install latest version of BotTracker v.0.49 from 22. Jun. 2014.
During installation this plug-in (or something in PIWIK) messed up my "config.ini.php". It modified "password " field in "[database]" section. I always keep backup of important file so restore was quick. I suggest that "config.ini.php" is treated as "global.ini.php" and backup is performed before any plug-in installation.

  • centos 6.5
  • piwik 2.4.0
  • php 5.4.1.7

@Thomas--F
Copy link
Contributor Author

This looks weird. I don't think the plugin is responible for this unwanted update.
The only thing, PIWIK should update in the config.ini is putting BotTracker into the parts [plugins_installed and [blugins_tracker].
These changes are made by PIWIK-core-functions.

As you suggest, there could be a problem in the backup-routine durng install or update.

@mattab
Copy link
Member

mattab commented Jul 4, 2014

Plugin is not responsible for this bug, see: #5409

@dandv
Copy link
Contributor

dandv commented May 22, 2015

Wanted to report a typo in the description of this plugin:

Landed here from the list of plugins in Piwik:

image

  1. Should that link be updated?
  2. Is there a better place to report this issue?

@sgiehl
Copy link
Member

sgiehl commented May 22, 2015

@dandv see https://github.com/Thomas--F/BotTracker
The link is defined in the plugin.json of the plugin. So it would need to be update there.

@dandv
Copy link
Contributor

dandv commented May 22, 2015

Thanks @sgiehl. Created PRs there.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. worksforme The issue cannot be reproduced and things work as intended.
Projects
None yet
Development

No branches or pull requests

7 participants