New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-update the referrer spammer blacklist #8186
Conversation
Do I understand it right that if my Piwik server can't use the internet I just have to manually copy the list from https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt to /vendor/piwik/referrer-spam-blacklist/spammers.txt and then run ./console scheduled-tasks:run "Piwik\Plugins\CoreAdminHome\Tasks.updateSpammerBlacklist" and then magic will happen ? ;) |
Not with this implementation: the auto-updated list is written in the database (and overrides the file spammers.txt). You could overwrite the file in vendor yes, but that would be overwritten on update (which should be fine since a new Piwik release should have the latest version of the list). You don't have any command to run in that case. |
Is it cached or each tracking request selects this list from piwik_option table? |
Good point, currently it isn't cached, maybe it should or is it negligible? |
public function updateSpammerBlacklist() | ||
{ | ||
$url = 'https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt'; | ||
$list = Http::sendHttpRequest($url, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put a bit more eg. 30 seconds timeout to give slow servers enough time
currently in Tracking API requests we only query Ultimately we will need to have performance regression tests to ensure we don't regress performance in Tracking API or other key features of Piwik #7889 +1 to add caching and looks good to merge! |
I've added a cache, but I was quite confused by the Tracker\Cache class. I ended up using |
I guess |
I was afraid that the eager cache would load the list (which can be huge, especially in the future) on every single Piwik request/process? |
It already does, as the DeviceDetector uses the eager cache. See 2d2b8df |
👍 thanks for the link, I'll update to use the eager cache |
The blacklist is updated weekly from github (stored in the option table).
…d of the lazy cache
6916c37
to
3b5beec
Compare
Auto-update the referrer spammer blacklist
Looks good 👍 It's awesome to know all piwik users with at least 2.14.0 will have an always up to date spam filter. This will make it much more efficient for all of us to fight referrer spammers. Anyone reading: feel free to join the fun at: https://github.com/piwik/referrer-spam-blacklist/ |
Fixes #7674
Auto-update the list from piwik/referrer-spam-blacklist (full URL is https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt).
The up-to-date list is stored serialized in the
option
table. If it doesn't exist, the one invendor/
is used.I also added the possibility to run a specific scheduled task, which is pretty useful to test it:
./console scheduled-tasks:run "Piwik\Plugins\CoreAdminHome\Tasks.updateSpammerBlacklist"