Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-update the referrer spammer blacklist #8186

Merged
merged 10 commits into from Jun 25, 2015
Merged

Auto-update the referrer spammer blacklist #8186

merged 10 commits into from Jun 25, 2015

Conversation

mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Jun 23, 2015

Fixes #7674

Auto-update the list from piwik/referrer-spam-blacklist (full URL is https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt).

The up-to-date list is stored serialized in the option table. If it doesn't exist, the one in vendor/ is used.

I also added the possibility to run a specific scheduled task, which is pretty useful to test it:

./console scheduled-tasks:run "Piwik\Plugins\CoreAdminHome\Tasks.updateSpammerBlacklist"

@mnapoli mnapoli added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Needs Review PRs that need a code review labels Jun 23, 2015
@mnapoli mnapoli added this to the 2.14.0 milestone Jun 23, 2015
@gaumondp
Copy link

Do I understand it right that if my Piwik server can't use the internet I just have to manually copy the list from https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt to /vendor/piwik/referrer-spam-blacklist/spammers.txt and then run ./console scheduled-tasks:run "Piwik\Plugins\CoreAdminHome\Tasks.updateSpammerBlacklist" and then magic will happen ? ;)

@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 23, 2015

Not with this implementation: the auto-updated list is written in the database (and overrides the file spammers.txt).

You could overwrite the file in vendor yes, but that would be overwritten on update (which should be fine since a new Piwik release should have the latest version of the list). You don't have any command to run in that case.

@quba
Copy link
Contributor

quba commented Jun 23, 2015

Is it cached or each tracking request selects this list from piwik_option table?

@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 23, 2015

Good point, currently it isn't cached, maybe it should or is it negligible?

public function updateSpammerBlacklist()
{
$url = 'https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt';
$list = Http::sendHttpRequest($url, 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a bit more eg. 30 seconds timeout to give slow servers enough time

@mattab
Copy link
Member

mattab commented Jun 23, 2015

Good point, currently it isn't cached, maybe it should or is it negligible?

currently in Tracking API requests we only query log_* tables and not any other table (the round trip to mysql and query runtime are not negligible when you get eg. 500 requests per second.). DB performance would be affected and great tracking API performance is a big part of what makes Piwik scalable. we cache all information from the DB during tracker in the tracker cache.

Ultimately we will need to have performance regression tests to ensure we don't regress performance in Tracking API or other key features of Piwik #7889

+1 to add caching and looks good to merge!

@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 24, 2015

I've added a cache, but I was quite confused by the Tracker\Cache class. I ended up using Piwik\Cache::getLazyCache(). Please advise if there's a better solution.

@sgiehl
Copy link
Member

sgiehl commented Jun 24, 2015

I guess getEagerCache would be the right choice, as the lazyCache invalidates more often.
@tsteur correct?

@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 24, 2015

I was afraid that the eager cache would load the list (which can be huge, especially in the future) on every single Piwik request/process?

@sgiehl
Copy link
Member

sgiehl commented Jun 24, 2015

It already does, as the DeviceDetector uses the eager cache. See 2d2b8df

@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 24, 2015

👍 thanks for the link, I'll update to use the eager cache

mattab pushed a commit that referenced this pull request Jun 25, 2015
Auto-update the referrer spammer blacklist
@mattab mattab merged commit b801a75 into master Jun 25, 2015
@mattab
Copy link
Member

mattab commented Jun 25, 2015

Looks good 👍 It's awesome to know all piwik users with at least 2.14.0 will have an always up to date spam filter. This will make it much more efficient for all of us to fight referrer spammers.

Anyone reading: feel free to join the fun at: https://github.com/piwik/referrer-spam-blacklist/

@mattab mattab mentioned this pull request Jun 25, 2015
8 tasks
@mnapoli mnapoli deleted the spammer-list-update branch June 25, 2015 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants