1. 35

  2. 18

    Neat idea. I’m not sure this is a captcha, but rather just a rate limiter.

    1. 13

      So much this. A proof-of-work scheme will up the ante, but not the way you think. People need to be able to do the work on the cheap (unless you want to put mobile users at a significant disadvantage) and malware/spammers can outscale you significantly.

      Ever heard of parasitic computing? TLDR: It’s what kickstarted monero. Any website (or an ad in that website) can run arbitrary code on the device of every visitor. You can even shard the work, do it relatively low-profile if you have the scale. Even if pre-computing is hard, with ad networks and live-action during page views an attacker can get challenges solved just-in-time.

      1. 9

        The way I look at it, it’s meant to defeat crawlers and spam bots; they attempt to cover the whole internet, they want to spend 99% of their time parsing and/or spamming, but if this got popular enough to prompt bot authors to take the time to actually implement WASM/WebWorkers or a custom Scrypt shim for it, they might still end up spending 99% of their time hashing instead.

        Something tells me they will probably give up and start knocking on the next door down the lane. And if I can force bot authors to invest in a $1M USD+ /year black hat “distributed computing” project so they can more effectively spam Cialis and Micheal Kors Handbags ads, maybe that’s a good thing? I never made $1M a year in my life, probably never will, I would be glad to be able to generate that much value tho.

        If it comes down to a targeted attack on a specific site, captchas can already be defeated by captcha farm services or various other exploits (https://twitter.com/FGRibreau/status/1080810518493966337). Defeating that kind of targeted attack is a whole different problem domain.

        This is just an alternate approach to put the thumb screws on the bot authors in a different way, without requiring the user to read, stop and think, submit to surveillance, or even click on anything.

        1. 9

          This sounds very much like greytrapping. I first saw this in OpenBSD’s spamd: the first time you got an SMTP connection from an IP address, it would reply with a TCP window size of 1, one byte per second, with a temporary failure error message. The process doing this reply consumed almost no resources. If the connecting application tried again in a sensible amount of time then it would be allowed to talk to the real mail server.

          When this was first introduced, it blocked around 95% of spam. Spammers were using single-threaded processes to send mail and so it also tied each one up for a minute or so, reducing the total amount of spam in the world. Then two things happened. The first was that spammers moved to non-blocking spam-sending things so that their sending load was as small as the server’s. The second was that they started retrying failed addresses. These days, greytrapping does almost nothing.

          The problem with any proof-of-work CAPTCHA system is that it’s asymmetric. CPU time on botnets is vastly cheaper than CPU time purchased legitimately. Last time I looked, it was a few cents per compromised machine and then as many cycles as you can spend before you get caught and the victim removes your malware. A machine in a botnet (especially one with an otherwise-idle GPU) can do a lot of hash calculations or whatever in the background.

          Something tells me they will probably give up and start knocking on the next door down the lane. And if I can force bot authors to invest in a $1M USD+ /year black hat “distributed computing” project so they can more effectively spam Cialis and Micheal Kors Handbags ads, maybe that’s a good thing?

          It’s a lot less than $1M/year that they spend. All you’re really doing is pushing up the electricity consumption of folks with compromised computers. You’re also pushing up the energy consumption of legitimate users as well. It’s pretty easy to show that this will result in a net increase in greenhouse gas emissions, it’s much harder to show that it will result in a net decrease in spam.

          1. 2

            These days, greytrapping does almost nothing.

            postgrey easily kills at least half the SPAM coming to my box and saves me tonnes of CPU time

            1. 1

              The problem with any proof-of-work CAPTCHA system is that it’s asymmetric. [botnets hash at least 1000x faster than the legitimate user]

              Asymmetry is also the reason why it does work! Users probably have at least 1000x more patience than a typical spambot.

              I have no idea what the numbers shake out to / which is the dominant factor, and I don’t really care; the point is that I can still make the spammers lives hell & get the results I want right now (humans only past this point) even though I’m not willing to let Google/CloudFlare fingerprint all my users.

              If botnets solving captchas ever becomes a problem, wouldn’t that be kind of a good sign? It would mean the centralized “big tech” panopticons are losing traction. Folks are moving to a more distributed internet again. I’d be happy to step into that world and work forward from there 😊.

            2. 5

              captchas can already be defeated by […] or various other exploits (https://twitter.com/FGRibreau/status/1080810518493966337)

              An earlier version of google’s captcha was automated in a similar fashion: they scraped the images and did a google reverse image search on them!

              1. 3

                I can’t find a link to a reference, but I recall a conversation with my advisor in grad school about the idea of “postage” on email where for each message sent to a server a proof of work would need to be done. Similar idea of reducing spam. It might be something in the literature worth looking into.

                1. 3

                  There’s Hashcash, but there are probably other systems as well. The idea is that you add a X-Hashcash header with a comparatively expensive hash of the content and some headers, making bulk emails computationally expensive.

                  It never really caught on; I used it for a while years ago, but I’ve never received an email with this header since 2007 (I just checked). It seems used in Bitcoin nowadays according to the Wikipedia page, but it started out as an email thing. Kind of ironic really.

                  1. 1

                    “Internet Mail 2000” from Daniel J. Bernstein? https://en.m.wikipedia.org/wiki/Internet_Mail_2000

                2. 2

                  That is why we can’t have nice things… It is really heartbreaking how almost all technology advance can and will be turned for something evil.

                  1. 1

                    The downsides of a global economy for everything :-(

                3. 3

                  Captchas are essentially rate limiters too, given enough determination from abusers.

                  1. 4

                    Maybe. The difference I would make is that a captcha attempts to assert that the user is human where this scheme does not.

                    1. 2

                      I mean, objectively, yes. But, since spammers are automating passing the “human test” captchas, what is the value of that assertion? Our “human test” captchas come at the cost of impeding actual humans, and are failing to protect us from the sophisticated spammers, anyway. This proposed solution is better for humans, and will still prevent less sophisticated attackers.

                      If it can keep me from being frustrated that there are 4 pixels on the top left tile that happen to actually be part of the traffic light than by all means, sign me the hell up!

                4. 4

                  Looks like more punishment for people who aren’t buying the latest hardware.

                  1. 4

                    My first thought is it would be interesting to have this sort of thing as a an email spam filter, then I remembered that Hashcash is a thing and upon inspection does basically exactly this. Plus it can solve The Mailing List problem (there are valid use cases for sending majillions of emails) by having the client whitelist the sending email address/server; the mailing list server can just refuse to send an email to a client requiring proof of work, or can do the proof of work once to send an email to the user asking them to whitelist it.

                    Anyone know why this sort of system hasn’t caught on? Just lack of support?

                    1. 1

                      I think so. Similar to the reason why we still have to support STARTTLS for email. Email is simply impossible to upgrade with breaking changes because of the proliferation of diverse, out of date email servers & the risk of messages dis-appearing into the ether once you start fully committing to deviation from the lowest common denominator.

                      It could also be that a PoW requirement on email would be lobbied out of existence by the MailGuns and MailChimps of the world, as it would disproportionately impact them.

                    2. 3

                      I like the proof of work idea, but it would be extremely annoying to have such a captcha without anything to fill out. Just simply waiting would be too annoying. Something that happens while you type not so much (except for password manager users like me). Another thought: would it make it better to use input events somehow? Isn’t it so that trusted input events can’t be faked by scripts inside the typical browsers?

                      1. 1

                        If you wish to experience it yourself, here’s a test showing the captcha being used as a bot deterrent in front of a media file: every time you navigate to this URL it will re-direct you to a new captcha challenge w/ 5 bits of difficulty: https://picopublish.sequentialread.com/files/aniguns.png

                        The difficulty is tweak-able. I think i used 8 bits of difficulty and specifically waited for one that took abnormally long when I was capturing the screencast I used as the GIF on the ReadMe page.

                        Isn’t it so that trusted input events can’t be faked by scripts inside the typical browsers?

                        Are you referring to the ways that facebook attempts to prevent script kiddie bots from interacting on their platform(s)? Yes, a simple version of such a thing may work as an effective heuristic to get rid of non-browser bots and simplistic browser automation bots without being privacy invasive. Maybe that’s a good idea for a feature of version 2 🙂

                        1. 2

                          It looks like this is tied to IP address + browser user agent. Once I load the above page once in my browser I can hit it as many times and as fast as I’d like with curl provided that I pass the same user-agent from my browser.

                          1. 1

                            Heh, did you read the code or find that out yourself? I guess I’m impressed either way :P

                            Yes, that’s how I set it up for this particular “picopublish” app, independent of the PoW Captcha project. If you want to see the bot deterrent over and over, you have re-navigate to the original link without the random token, or else change your UA/IP. I got the idea from the way GoatCounter counts unique visits.

                            1. 2

                              I fiddled with it in the browser, I do some web scraping and from that am pretty familiar with the process of peeling away one option or feature at a time from http requests until the server finally denies the request.

                          2. 1

                            Ah yes I see, the five bit version is absolutely bearable. Not much longer than an extensive page change animation. If this is enough to keep bots (I guess it is), I would totally go for it.

                        2. 3

                          This idea could be extended by using an actual cryptocurrency PoW (Or mining pool PoW) and use it as a captcha AND an income revenue for your user. You could provide easier challenges to solve it in a few seconds and every once in a while you might be able to find a solution to a harder challenge that yields actual currencies.

                          1. 2

                            I thought about this, and I decided I wanted the opposite. My reasons were:

                            • complexity. I want the captcha to be as simple as possible, both to set up and to use.
                            • ease-of-abuse. If I use a common scheme that has real hash power behind it, the “captcha solving botnet” that other folks posted about could probably be replaced by a single retired ASIC.

                            So I specifically chose Scrypt parameters that are very different from those supported by Scrypt ASICs designed to mine Litecoin.

                            I have heard that Monero mining can’t really be optimized much, is that true? I don’t know much about it. I suppose if there is a mining scheme out there that truely resisted being GPU’d or ASIC’d this could be possible. I wonder if the app would eventually get flagged as malicious by Google Safe Browsing because its running a known monero miner script on the user’s browser XD

                            1. 3

                              I feel that if the goal is to keep the bot out, you are kind of out of luck because the computing power of anyone running bot will overwhelmingly be much greater than any of your human user. Captcha is only good to send away the automated, generic bot or tools. Anyone that really wants to collect your site won’t be stopped by any captcha. So if we agree on that, the algorithm used should not really matter, even if specific hardware already exists for the PoW.

                              As for Safe Browsing, I don’t think that would be an issue since you are mining from the website, not an extension or ads. Safe Browsing should only be flagging website that distributes malware/unwanted software executable and phishing.

                              1. 1

                                Anyone that really wants to […] won’t be stopped

                                Exactly, this is drive-by-bot / scattershot deterrent. Agreed that when facing a targeted attack against a specific site, different strategy is needed.

                                the algorithm used should not really matter, even if specific hardware already exists for the PoW.

                                For SHA256, I can buy a USB ASIC for $200 that can hash faster than 2 million CPUs. I think that’s a meaningful difference. Much more meaningful than the difference between one user and a botnet, probably even more meaningful than the difference between a user’s patience and a bot’s patience.

                                AFAIK, Litecoin uses a slightly modified version of Scrypt, and its “CPU and Memory Cost” (N) / “Block Size” (r) parameters are set quite low. This means that Scrypt ASICs designed for Litecoin can’t execute the hash with larger N and r parameters like the ones which would be used for key-derivation (or in my case, anti-spam).

                                According to this 2014 paper the hash rates of GPU Scrypt implementations fall off quickly as N and r are increased. In fact, for the exact parameters I use, N = 4096 and r = 8, they cite the 2014 GPU performing a measly 3x faster than the CPU in my 2019 laptop (See section 5.2.4). So for a modern expensive GPU, that might be something like 100-300x faster? I’m not sure, but its certainly different from 2 million times faster. I believe this was actually a design goal of Scrypt from the beginning: it’s intentionally hard to accelerate it to insane hash rates.

                                As an aside, I have a friend who took an alternate route, purely based on security through obscurity. They put a “write me a haiku” field on the registration page of their web app. It completely defeated every single bot. I opted for PoW instead of a pure “obscurity-based” approach because I wanted to show/argue that we can come up with bot deterrent strategies which truly scale, even if there are millions of sites using the same deterrent software, It should still be effective against folks who want to hit all 1 million of them. While I doubt my project will ever grow to that scale, I thought it was fun to try to design it to have that potential.

                                1. 1

                                  Exactly, this is drive-by-bot / scattershot deterrent.

                                  Then why does it matter if using a known PoW allows some attacker to be 2 million faster? You can expect that any targeted attack will be a few thousand times better than your average user, even with custom Scrypt parameters. So does it really matter that an attacker is a few thousand or a million times faster? He’s probably done scraping or spamming your site by the end of the day either way. At least with the known PoW you might have made a few bucks.

                                  1. 2

                                    It’s because like I said, I primarily care about dragnet / “spam all forms on the internet” type stuff.

                                    The former is a privacy concern, the latter represents all spam I’ve ever had to deal with in my life.. no one has ever “targeted” me or my site, its just stupid crawlers that post viagra ads on every single un-secured form they can find. I think that being targeted is actually very rare and it happens for political/human social reasons, not as a way to make a profit.

                                    The weight of the PoW (from the crawler’s perspective) matters because if its extremely light (sha256) they can simply accept it and hash it at scale for cheap. If its heavy (scrypt with fat memory cost parameter) They literally can’t. It would cost an insane amount to solve gazzillions of these per year. Even if they invest in the GPU farm, it will only make it hundreds of times faster, not millions. And if you have a GPU farm, can’t you make more money renting it to corporations for machine learning model development anyways??

                                    Like others have mentioned, that cost can be driven down by botnets. But like I have argued in return, IMO that level of investment is unlikely to happen, and if it does, I’ll be pleasantly surprised.

                          2. 1

                            I wonder how the frontier of the definition of covered work of GPL applies here.