@mattab opened this Issue on December 23rd 2018 Member

Currently we're using Google Recaptcha on pages with a form, which leaks lots of data to Google.

For example on this page: https://matomo.org/contact/

-> It would be fantastic to find & use an open source, decentralised alternative to Google recaptcha on our Matomo.org website.

If anyone knows an alternative to Recaptcha that works, please let us know

@fdellwing commented on December 24th 2018 Contributor

There are a lot of Captcha-Libaries, but none of them provide such features as reCaptcha.

@Findus23 commented on December 24th 2018 Member

@fdellwing The only feature we need is not getting overwhelmed with spam :slightly_smiling_face:

Bonus points if it is accessibility-friendly.

@fdellwing commented on December 24th 2018 Contributor

As I said, I know no captcha that is nearly as user friendly as reCaptcha. So best would be to take some random image captcha (where are MANY) and just hit an self made database on top that recognises returning users.

@Findus23 commented on December 27th 2018 Member

As I said, I know no captcha that is nearly as user friendly as reCaptcha

I really have to disagree. I regularly spend multiple minutes getting angrier and angrier as I am clicking through page after page arguing whether something can be considered a storefront when the captcha switches into extra-slow mode where every image takes a 5-second transition to load.
(I am not using a VPN or anything similar, just a regular internet connection)

I think a captcha doesn't need to be complex to stop most bots (after all while Recaptcha is hard to circumvent, it only costs 0.2 cent to pay someone to solve it for you), it just needs to be different enough so it stops automated bots programmed to popular wordpress forms.

I even think that a simple input field asking to enter the name of the open source project you are trying to contact (that maybe also allows common variants) would stop nearly all automated spam.
And the remaining ones I think (from what I see on the forum) are actual people pasting spam texts into the forms and those are not blockable via captchas.
@tsteur, would it be possible to add something like this to the forms without too much work?

@tsteur commented on December 27th 2018 Member

As long as there is a wordpress plugin for it that should be fine. We wouldn't want to build anything ourselves. The plugin would ideally hook into random places where needed and support gravity forms etc.

@Findus23 commented on December 27th 2018 Member

https://wordpress.org/plugins/humancaptcha/ seems to be pretty much what I described, but the plugin looks odd and only seems to integrate with comments.
Apart from that I could only find https://wordpress.org/plugins/humancaptcha/ which seamlessly integrates into login, registration, lost password, comments, bbPress and Contact Form 7.

I have never used Gravity forms before, but it seems to have many features and maybe one can make a required input field with the quiz feature Not sure if it can be combined with the normal contact form.

@tsteur commented on December 27th 2018 Member

Did a quick search for "captcha gravity" maybe https://wordpress.org/plugins/nomorecaptchas/ or https://wordpress.org/plugins/cleantalk-spam-protect/ would help? cleantalk also seems to support woocommerce. not really sure how good they are though.

I reckon something where people need to enter "Matomo" might be too complicated sometimes for some humans (it seems easy but may not always be clear what to enter) and at the same time someone wanting to spam us could easily achieve it.

@Findus23 commented on December 27th 2018 Member

https://wordpress.org/plugins/nomorecaptchas/ or https://wordpress.org/plugins/cleantalk-spam-protect/

Both plugins work by sending the visitor behaviour data to the services' servers and analyzing it there. So I guess they are no better than ReCAPTCHA.

It's odd that there isn't a well-maintained opensource plugin that just does basic local analysation.

someone wanting to spam us could easily achieve it.

Targeted attackers will probably always be able to afford the 0.2 cent it costs to reliably circuvent all types of captcha.

@mt-dave commented on May 7th 2019

I would think alternate of recaptcha will be kind of service, something that can solve traditional recaptcha issue like GDPR and accessibility and still provide solution like no captcha.

I came across some solutions and here is a quick summary

Captcha providers can widely be categorized in 2 categories :-

Captcha Service Providers : This option works well for mission critical Enterprises looking for protection against constantly evolving spam and bot threats. Some of the Industry players in Captcha Services are :-

RECAPTCHA : Free and One of the most widely used captcha service used across the globe. They have recently launched recaptcha v3 which generate a risk score based on user behavior on site, google cookies, traffic history etc. GDPR has been a major concern considering what information it stores and uses for other google product like google ads.

MTCaptcha : Captcha Service that is more focused for Enterprise needs. Provide NoCaptcha alternative to recaptcha, captcha account management, GDPR compliant, Availability across globe (China included). Limited in low friction captcha capabilities.

Solve Media captcha: Ad driven Captcha that uses advertisement to generate captcha and solving them. GDPR compliant, Beautiful captcha and customizable. It may not be good idea to show advertisement on enterprise site.

Captcha Library Providers: There are lot of players in Captcha Library space, And if you are willing to manage and setup the code, some of the options are:-

BotDetect CAPTCHA : Most widely used captcha library, Available in multiple languages. They license the library which then need to be implemented and managed.

KeyCAPTCHA - Innovative Anti-Spam Solution : Plugin driven captcha cover wide range of CMS systems. Mostly for CMS driven, need self hosting and management. Permutations are limited for captcha generation.

@Findus23 commented on May 9th 2019 Member

I just came across https://www.phpcaptcha.org/ which seems to be the only local open source captcha solution that has a wordpress plugin: https://wordpress.org/plugins/securimage-wp/

But I don’t know how well it supports the forms used on matomo.org

@Crypto-Loot commented on June 10th 2019

Hi there,
We offer a PoW (proof-of-work) based captcha system where a user must verify a captcha via mining a cryptocurrency for several seconds before proceeding to confirm the token. You may find more at our website: https://crypto-loot.org (will have to login to see the demo/code)

We are also doing a rebrand shortly along with a potential partner to help bring web mining into the white light for the industry.

Please feel free to let us know if you would like to work with us!
support@crypto-loot.org

@mattab commented on June 29th 2019 Member
@joekarns commented on September 19th 2019

One non-google product you could use to better protect your login page (or any page of the site) would be using the free version of Cloudflare. I use "Page Rules", then configure only my login page with the form on it to be in "under attack" mode in Cloudflare. By doing so, it scans any/all users who try to access that page of the site. It's not a perfect solution but it should cut out most of the pure bots hitting that page. Hope that helps.

@Findus23 commented on September 19th 2019 Member

@joekarns Using Cloudlare might be even worse as it

  • allows one entitiy to intercept a huge fraction of the internets traffic
  • cuts of a large fraction of internet users (e.g. tor users)
  • still uses ReCaptcha (I think) to detect bots
@joekarns commented on September 19th 2019

Yes, fair points.

@mattab commented on January 30th 2020 Member

We're still actively looking for an alternative to Google recaptcha!

if you have any hint, we'd love to hear!

@ara4n commented on February 9th 2020

we are too, over at https://github.com/vector-im/riot-web/issues/3606 (in the interests of sharing any discoveries). (Riot also uses matomo for its analytics, fwiw :)

@raneq commented on February 13th 2020

What about:

  • Captcha code ?? It's up to date and looks clean to me. I don't know if it's effective though.

For the record:

  • Visual Captcha? It's not maintained, but it may just work. Wrong, the wordpress plugin is not up to date.
  • Secure Image It's not the cutest, but it's just a matter of CSS. But it's also not up to date
  • Lepture Captcha: For python projects, it looks a good resource, even thought last commit is from Nov 2018.

It feels like the interest for light and effective captchas has dropped really a lot. Thank you for not surrendering on this.

@Findus23 commented on February 13th 2020 Member

@raneq
I am really not sure if those Captchas that just use GD to print a random string to an image and sprinkle a few dots or lines above it are really helpful.

  • They are all completely inaccessible, so a lot of people are completely prevented from submitting a form.
  • I really doubt any but the most trivial bot is unable to detect those images via OCR (especially as they use known font-files)
  • most of them seem to have the latest commit many years ago

For captcha-code-authentication specifically there seem to be multiple reviews mentioning that removing the captcha from the form circumvents it.

@mattab commented on March 4th 2020 Member

Btw we could also self-hosted the google recaptcha and proxy requests, this would help people from china at least, and may limit some of the privacy implications? using this: https://github.com/google/recaptcha

PHP client library for reCAPTCHA, a free service to protect your website from spam and abuse. http://www.google.com/recaptcha/

@Findus23 commented on March 4th 2020 Member

self-hosted the google recaptcha and proxy requests

That would solve the issue for chinese users, but it might make privacy even worse as it would be harder to block and might open new privacy law issues as users can't opt out anymore.

@Reechik8760 commented on March 23rd 2020

I'm also looking for a good captcha to use that protects a users privacy. One solution that doesn't work for me but might be ok for you is: https://www.hcaptcha.com/

users are labeling data for free with hcaptcha and we don't know what is being done with the labeled data. As a result I'm not using it.

@tsteur commented on March 29th 2020 Member

I just came across https://www.hcaptcha.com/ as well. It looks quite interesting and there is a WordPress plugin https://wordpress.org/plugins/hcaptcha-for-forms-and-more/

I suppose it's at least better than Google but didn't look into any terms or privacy policy.

@Findus23 commented on March 29th 2020 Member

Things I noticed with hcaptcha:

  • The captcha itself varies from obvious to impossible (three blurry, distorted images and nine equally incomprehensible images and somehow one has to find a connection between them)
  • The tasks are looping in a very small set. If none of the few tasks available at the moment is doable, one can't submit.
  • Their solution for users who can't do visual tests is forcing them to create a account and share their personal data, which doesn't really feel appropriate (https://www.hcaptcha.com/accessibility)
  • The JS mentions that its license can be found at https://hcaptcha.com/license which is a 404
  • They have a privacy policy, but I think it is not linked anywhere (https://www.hcaptcha.com/privacy)
  • Update: It is linked in the captcha itself, but using a 9px font and using #cccccc text on #fafafa background (which is the lowest color contrast I have seen in a long time)
  • Children under 13 are banned from using the service which isn't really an issue but is a bit weird.

Weird quotes from the privacy policy:

Some of the information you provide us may constitute sensitive data as defined in the GDPR (also referred to as special categories of personal data), including identification of your race or ethnicity on government-issued identification documents.

please be aware that your personal data will be transferred to, processed, and stored in the United States. Data protection laws in the U.S. may be different from those in your country of residence. You consent to the transfer of your information, including personal information, to the U.S. as set forth in this Privacy Policy by visiting our site or using our service.

(I don't think that's how consent works)

So I think the major benefits to reCAPTCHA are:

  • it is not Google
  • you get a (potentially very tiny) fraction of the etherum tokens earned
  • it might not do any actual tracking to detect humans
@Reechik8760 commented on March 29th 2020

@Findus23 -- thank you very much for this great analysis. If I find any good open source solutions that protect people's privacy (or end up creating my own Captcha) I will be sure to post it.

@HawkLiking commented on April 3rd 2020

It's funny to read @Findus23 (good) analysis knowing that Cloudflare just started using hCaptcha...

but hCaptcha is as easily resolvable as reCaptcha by services like anti-captcha.com (human automated solving) which support both of them (and many others). It takes less than 30 seconds to solve a hCaptcha/reCaptcha with there lib/api, for 0,0022€ per captcha... Do not even try picture-based captchas, it is even easier.
The fact is Google is doing NOTHING to block these services, so I asked to hCaptcha and here is there answer:

Short answer is Google has never bothered to try and stop those users, but we break the captcha services on a regular basis.
We have a variety of strategies, but fundamentally if a human being is answering the question through anti-captcha then we'll detect that they're human. You end up in an arms race to detect that it's specifically a captcha service user, and they end up in an arms race trying to defeat your detection. This also means you can't just publish your detection results to everyone, otherwise their time-to-defeat will be much lower.

But hey, anti-captcha manage to bypass them successfully (last check: today) 🤷‍♂

So far I did not find any captcha which could not be solved by services like anti-captcha, or by public libraries, but I am very interested in finding one, so I will watch this topic !

@Jookia commented on April 4th 2020

Please don't use hCAPTCHA or other inaccessible CAPTCHAs.

@Findus23 commented on April 4th 2020 Member

@HawkLiking
Honestly, as much as I am here complaining about most solutions, solvable with human automated solving methods isn't really an issue for me. The point of a CAPTCHA is to tell computers and humans apart (the CHA part) and a person paid to solve a CAPTCHA for someone else is definitely a human.
Solving this issue is even more complex, maybe impossible and out of scope of finding a ReCAPTCHA alternative.

@HawkLiking commented on April 6th 2020

@Jookia why hCaptcha is "inaccessible" ?

@Jookia commented on April 6th 2020

Blind people can't use it without signing up to the service.
Deafblind people can't use it either.

On Mon, Apr 06, 2020 at 04:54:33AM -0700, HawkLiking wrote:

[1]@Jookia why hCaptcha is "inaccessible" ?


You are receiving this because you were mentioned.
Reply to this email directly, [2]view it on GitHub, or [3]unsubscribe.

References

  1. https://github.com/Jookia
  2. https://github.com/matomo-org/matomo/issues/13905#issuecomment-609748085
  3. https://github.com/notifications/unsubscribe-auth/AABNHO6RQ4QNHYUWKNRDVZLRLG7HTANCNFSM4GMABJAQ
@Tirion77 commented on April 13th 2020

I've got a solution that respects user privacy and removes bots like no other. Nobody owns the data at the end, unless the user decides to manually capture their data and then use it. It is a little experimental and will require some configuration and effort to implement.

@HawkLiking commented on April 14th 2020

I've got a solution that respects user privacy and removes bots like no other. Nobody owns the data at the end, unless the user decides to manually capture their data and then use it. It is a little experimental and will require some configuration and effort to implement.

@Tirion77 Ok, and what is this solution ? I am very curious!

@yolknet commented on April 21st 2020
* [Captcha code](https://github.com/wp-plugins/captcha-code-authentication) ?? It's up to date and looks clean to me. I don't know if it's effective though.

The contact form at the bottom of the page has a Google reCAPTCHA (v2). They don't trust their own work anymore I guess :-)

@jcalfee commented on May 6th 2020

So far, BotDetect CAPTCHA seems like the way to go for me. We have node as a back-end though. I'm asking them if they are working on something for that. I don't trust the government, so really like how they document the reCaptcha concerns. I wish it were an image slide captcha but I can't be too picky at this point.

@Tirion77 commented on May 7th 2020

Apologies for the late reply, everyone. I wasn't sure if I should share it because the solution is highly experimental as I said, and only recently came out with something that made me confident enough to start sharing it.
Please look into the Idena network -- https://idena.io/. It is a decentralized blockchain solution that is able to derive digital identities that are valid for approx. 2 weeks based on a captcha puzzle that the whole network executes at the same time (those approx. every 2 weeks). Users of that network can then use that identity to log in to websites by connecting their account to a wallet.
It is still very early in development, but the identity and the sign-in is there already as of this week. This is definitely not a solution for the general population at this point, but your regulars might be interested in this over doing captcha every time they want to post/buy/etc.
Note that this involves 0 investment into its token, and the solution could be used solely based on the digital identity without having to worry about insane cryptocurrency value swings.

I'd like to reiterate again that this is super new and early, and it could really change over the next year -- or completely disappear. That said, the network has been growing 15% every 2 weeks or so, and it seems the devs are comptetent.

All code etc. is open source and on their github. As a privacy geek, this peaked my interest.

@Jookia commented on May 7th 2020

It only works for people with eyes.

@Tirion77 commented on May 7th 2020

You are absolutely right. For now it is like that, although the developers are aware and are hoping to address this too. From their site:

  • How can people whose disabilities prevent them from completing a traditional flip validation session be validated?

  • For now, they can't. But Idena is designed as an open-source project, and we hope that there will be teams with specific expertise in this area who will be motivated to develop means for people with disabilities to get validated in the network, such as audio flips, for example.

Again, this is super early stage so research and look into at your own expense.

@Jookia commented on May 7th 2020

I don't want to be a downer but is it really worth bringing it up if
it's unstable experimental technology that you can't even use now
without an invite and dedicated computer with the app?

@HawkLiking commented on May 7th 2020

Interesting
I tested your flip challenge here https://flips.idena.io/?pass=idena.io but I gave up (bored) after 3 challenges, these "stories" are maybe to complicated..

@Jookia commented on May 7th 2020

Wow, I tried one of those flip challenges and got one that implied a
person shot a home intruder and the intruder was dead in a body bag. :\

Edit: I later got one that straight up showed actual dead people? It had a
watermark for a russian website

@Findus23 commented on May 7th 2020 Member

We are getting a bit off-topic, but for completion’s sake I again want to give an extensive feedback about this solution:

  • Wow, that were the 5 most stressful minutes I had in quite a long time. I constantly felt like I was randomly guessing between two completely random image collections. The fact that the buttons start blinking after a while makes this quite an experience.
  • This is completely inaccessible to a huge fraction of the population due to being image-only. And even those images are very small with lots of details, so even I had to guess quite often what they should show.
  • Even worse they are inaccessible to people from different cultures. So many of these "stories" depend on subtle cultural context clues that might be completely misunderstood by people not sharing the same culture as the creator.
  • What's wrong with the topics of these "stories"?!? I don't want to think about the implications of people dying or even living their live in general just to submit a comment on a website.
  • You seem to plan on allowing anyone to create "stories", which I can guess can only go wrong.
  • WTF?!?! I got 75.9% correct by applying the complex algorithm of always clicking on the "left" button. Where can I send my invoice over $5000 for developing this AI? (https://idena.io/?view=flip_challenge) Preferably in a real currency.
  • And I have not even reached the point of idena.io itself, which seems to be replacing a concept that I can explain to a time traveler who has never seen a PC before (there is a question, you type in an answer to it below) with something that even after reading for 10 minutes is only roughly understandable (what this has to do with a local client and Global universal basic income I can only guess)

So I honestly can't take this seriously as even an attempt of something that can be considered a CAPTCHA.

@k09i71 commented on May 25th 2020

@Findus23 -- thank you very much for this great analysis. If I find any good open source solutions that protect people's privacy (or end up creating my own Captcha) I will be sure to post it.

https://github.com/produck/svg-captcha

Powered by GitHub Issue Mirror