New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of the spammers list #7674
Comments
We need some UI in Piwik to:
It's more tech step. But Piwik must support regex in these URLs. At least, Piwik on |
And please, make it "low tech" so people with server that can't go on the internet can "copy-paste & save" the content of a file to upgrade the list... |
So it seems the easiest way from a user standpoint will be to keep spammer list in external file, which to be updated on daily basis. |
Guess it would make sense to have an (auto Updated) "general List" and something User/Installation based ... So one could add specific Entries only for the own Instance of Piwik and don't have to wait for the general List to be updated. |
The idea of a custom user list seems like a good idea but I don't think it's beneficial for everybody on the long run: spammers are spammers for everybody. If people do not report them because they can flag them in their user list, then the interest in the global list is gone. We should maybe take the problem the other way: when admin report spammers, they are added to the custom list. That way they don't have to wait for the spammer to be added to the official list, but it still means that users will report spammers and not simply create a custom list. However we may want to start with a simpler goal at first (one where there is no UI to report spammers, and no way to have a custom list). @gaumondp The list should be updated on each Piwik update, like now, I'm not sure letting users manually update the list is that necessary. That should be enough for a start for those installs that don't have internet access, especially since those might not be the target of spammers (since they don't have internet access). |
I like the idea of reporting spammers to the global list adds them to the custom list ... prevents bad reporting bahaviour as you say. |
@mnapoli I don't want to make my case the rule but I know few people with big installation and very rigid environment/infrastructure can't keep up with Piwik fast release cycle. In fact, I usually update 4 times a year. So we're often 3 release behind at update time. I don't think I'm alone though. |
@gaumondp and those setups cannot use auto-update of the list? |
Exact, no auto-update spammer possible, no one-click GeoIP updating, no easy install for stuff at http://plugins.piwik.org/ ... And considering size (DB is at 22 GB here right now), no Web interface Piwik update possible. We use the CLI for that. |
@gaumondp that's why it should be simple external txt file with spammer list in it, which can be updated easily through cli. |
@gaumondp OK then we can document how to update the updated list, i.e. there will be 2 files:
That doesn't require any additional effort and should address all use cases. Then once that is done we can discuss of how to let users update manually through the UI if that's really necessary. |
@mnapoli , I'm just giving information and use case about few environment and scenario I know about that maybe you don't see often. I'm not "requiring" stuff. :) I'm just good at being devil's advocate. I'm not sure about saving the list in /tmp/ directory though. In my view, everything in /tmp/ once emptied will be "auto-generated". Tell me if I'm wrong about this! But you know Piwik internals better than me for sure so I trust you about where to store such file. |
That's appreciated to list the different use cases, I for sure don't have a clear overview of all of them. In that case there is no additional effort so I don't see any issue ;) Regarding the directory, maybe somebody else can chime in on this but I'm afraid we need a folder with write access. |
Maybe a new table in Piwik but "feedable" from the text file or a future Web interface with a simple "each line is a spammer" so it's copy-paste enabled ? Or if GeoIP database is in /misc/ maybe it makes sense to use this one instead of /tmp/ if you don't want an additional table in the DB ? |
For the record the new list is here: https://github.com/piwik/referrer-spam-blacklist |
Should the improved handling also discount spam visits retroactively? |
@openjck no it will not remove referrer spammers from historical data |
Is there a way/command to remove referrer from historical data ? (Maybe a "rebuilt) |
Since we will have a long dev cycle for 3.0.0 I reckon we need to provide users a solution to have constantly auto-updated spammers and really leverage our referrer spammer list. Moving to 2.14.0 |
👍 makes sense |
Note: we can't easily store the file on disk (not ideal to store in |
@mattab doing so prevents users from updating manually (in environments without internet access) |
@mnapoli I guess one possibility for Environments without Internet Access would be to to update the DB from a temporary File on Disk (tmp/)? So flushing tmp/ wouldn't be an issue.
|
@futureweb it would require more effort to implement, and would be less practical to use (requires SSH access, or requires to log in into Piwik instead of just dropping a file through FTP), but that's still a better solution than nothing so I guess we could do that. |
|
The blacklist is updated weekly from github (stored in the option table).
PR: #8186 |
The blacklist is updated weekly from github (stored in the option table).
In #5099 we have many people regularly report new spammers (which spam Piwik with fake visits). We need to find a more scalable solution as it's beginning to become a real problem.
Goals:
Ideas:
tmp/
every day or week -> the version intmp/
would override the one installed invendor/
I'm not too sure yet about the Packagist part (it's not a PHP package, would require to use
composer update
before releases) but using submodules is definitely a no-go…The text was updated successfully, but these errors were encountered: