The page warns that “ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.”. What if I deploy Nepenthes in a subdirectory that is blocked by robots.txt? That should keep well-behaved bots (probably Google and Bing etc.) out of the trap, while catching stupid or aggressive bots that ignore the robots.txt directives, no?
Wait, are we really talking about the same software? Because I was talking about the “Nepenthes” software (from https://zadzmo.org/code/nepenthes/); but the robots.txt you linked to is based on the “DadaDodo” software (from https://www.jwz.org/dadadodo/), by JWZ.
Confusingly, both projects were recently on the first page of lobste.rs – maybe there is a growing urge for these tools :-) or maybe people just got reminded again that this exists.
I think most search engines are very compliant. Google has the Search Console interface for website owners to debug why a page is indexed or not, and you can see it’s cause of robots.txt, for example. It’s the AI data mining that’s the problem.
I’m not sure if burning my CPU cycles to warm the planet in order to force someone else to burn CPU cycles to warm the planet is exactly what I want to be deploying…
But it’s very neat from a technical point of view!
I saw a purely static one called Quixotic. It obviously doesn’t do the slow generation thing, but you can at least partially simulate that with server configuration, probably.
Cool idea. Reminds me of a very old, defunct piece of software also called Nepenthes that emulated vulnerable operating systems to catch malware in the act.
The page warns that “ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.”. What if I deploy Nepenthes in a subdirectory that is blocked by robots.txt? That should keep well-behaved bots (probably Google and Bing etc.) out of the trap, while catching stupid or aggressive bots that ignore the robots.txt directives, no?
This precisely that the author is doing: there is a
Disallow: /dadadodo/dadadodo.cgiline in its robots.txtWait, are we really talking about the same software? Because I was talking about the “Nepenthes” software (from https://zadzmo.org/code/nepenthes/); but the robots.txt you linked to is based on the “DadaDodo” software (from https://www.jwz.org/dadadodo/), by JWZ.
Confusingly, both projects were recently on the first page of lobste.rs – maybe there is a growing urge for these tools :-) or maybe people just got reminded again that this exists.
Anyway, I was just wondering why the Nepenthes author did not include the hint about robots.txt (like JWZ kinda did, at https://www.jwz.org/blog/2025/01/exterminate-all-rational-ai-scrapers/).
Ah sorry. I was confused since I saw the other story (potentially on the orange site; couldn’t really remember anymore :)
This is the question that I came here to see an answer for …
Also, does anyone know if Google (etc.) actually respect the robots.txt directives?
I think most search engines are very compliant. Google has the Search Console interface for website owners to debug why a page is indexed or not, and you can see it’s cause of robots.txt, for example. It’s the AI data mining that’s the problem.
Maybe I will try this out on my site and report back with the results.
Wow, it has somehow been 9 years since I wrote https://github.com/earthboundkid/heffalump
lovely lack of go.mod :)
I’m not sure if burning my CPU cycles to warm the planet in order to force someone else to burn CPU cycles to warm the planet is exactly what I want to be deploying…
But it’s very neat from a technical point of view!
I saw a purely static one called Quixotic. It obviously doesn’t do the slow generation thing, but you can at least partially simulate that with server configuration, probably.
The software itself is called Nepenthes, after the carnivorous plant that traps flies, mosquito larvae, bugs, spiders, ants, and such.
Oh, yes, thank you. My mistake entirely for the previously incorrect title
Cool idea. Reminds me of a very old, defunct piece of software also called Nepenthes that emulated vulnerable operating systems to catch malware in the act.
afaict it is the same thing. https://sourceforge.net/projects/nepenthes/
update: it is not the same thing.
Thanks for clarifying; it is indeed just an unfortunate name clash.
Similar recent submission: https://lobste.rs/s/s9yq5a/block_ai_scrapers_with_anubis
Similar recent submission: https://lobste.rs/s/nzxgvf/exterminate_all_rational_ai_scrapers