1. 74
  1.  

    1. 11

      The page warns that “ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.”. What if I deploy Nepenthes in a subdirectory that is blocked by robots.txt? That should keep well-behaved bots (probably Google and Bing etc.) out of the trap, while catching stupid or aggressive bots that ignore the robots.txt directives, no?

      1. 3

        This precisely that the author is doing: there is a Disallow: /dadadodo/dadadodo.cgi line in its robots.txt

        1. 3

          Wait, are we really talking about the same software? Because I was talking about the “Nepenthes” software (from https://zadzmo.org/code/nepenthes/); but the robots.txt you linked to is based on the “DadaDodo” software (from https://www.jwz.org/dadadodo/), by JWZ.

          Confusingly, both projects were recently on the first page of lobste.rs – maybe there is a growing urge for these tools :-) or maybe people just got reminded again that this exists.


          Anyway, I was just wondering why the Nepenthes author did not include the hint about robots.txt (like JWZ kinda did, at https://www.jwz.org/blog/2025/01/exterminate-all-rational-ai-scrapers/).

          1. 1

            Ah sorry. I was confused since I saw the other story (potentially on the orange site; couldn’t really remember anymore :)

        2. 2

          This is the question that I came here to see an answer for …

          Also, does anyone know if Google (etc.) actually respect the robots.txt directives?

          1. 4

            I think most search engines are very compliant. Google has the Search Console interface for website owners to debug why a page is indexed or not, and you can see it’s cause of robots.txt, for example. It’s the AI data mining that’s the problem.

          2. 2

            Maybe I will try this out on my site and report back with the results.

          3. 10

            Wow, it has somehow been 9 years since I wrote https://github.com/earthboundkid/heffalump

            1. 2

              lovely lack of go.mod :)

            2. 8

              I’m not sure if burning my CPU cycles to warm the planet in order to force someone else to burn CPU cycles to warm the planet is exactly what I want to be deploying…

              But it’s very neat from a technical point of view!

              1. 4

                I saw a purely static one called Quixotic. It obviously doesn’t do the slow generation thing, but you can at least partially simulate that with server configuration, probably.

              2. 7

                The software itself is called Nepenthes, after the carnivorous plant that traps flies, mosquito larvae, bugs, spiders, ants, and such.

                1. 1

                  Oh, yes, thank you. My mistake entirely for the previously incorrect title

                2. 4

                  Cool idea. Reminds me of a very old, defunct piece of software also called Nepenthes that emulated vulnerable operating systems to catch malware in the act.

                    1. 7

                      update: it is not the same thing.

                      1. 3

                        Thanks for clarifying; it is indeed just an unfortunate name clash.