1. 24

    1. 4

      isn’t robots.txt mostly to help google not accidentally take down your $10/year server?

      it’s not really a security thing

      1. 4

        It’s effective for opting out of well-behaved bots. If you want your site excluded from the large search engines - Google, Bing, etc - it will do exactly what it’s supposed to.

        It’s not going to protect you from the vast swarms of badly behaved bots out there.

        1. 3

          Except when someone publicly linkes to your site, Google may still show results


          1. 1

            Which makes sense.

            To use an analogy, you could use an htaccess file to lock your front door to anyone that doesn’t have the key. You can also use robots.txt to put a polite note on the door explaining that you would prefer it if people did not rummage around in your basement collection of mint-condition baseball cards. Poorly mannered people might go around the neighborhood reading those notes to see who has something interesting, if the owner didn’t lock the door.

            In neither case have you forced anyone who did come into your house to sign an NDA about what they see. Robots.txt protects you from well-behaved bots from burning your bandwidth, for any other use case, you probably want a different solution.

            1. [Comment removed by author]

        2. 1

          Definitely not supported by bing though. I got a website that had the robots.txt up from the start, and my website shows up in bing (and duckduckgo which relies on bing results apparently). And that website shouldn’t be linked anywhere on the web, as I did not communicate the url to anyone.

          1. 1

            That sounds like a bug in Bing - they claim to support robots.txt here: https://www.bing.com/webmasters/help/how-to-create-a-robots-txt-file-cb7c31ec

      2. 3

        Google is very good about not taking down $10/year severs already. They have a very sophisticated rate-limiting system including tracking response time across requests and detecting when your server is struggling. There is likely a strong correlation between the bots that hit your site the hardest and bots that ignore robots.txt.

        1. 1

          that definitely hasn’t been the case in the past.

          1. 1

            That sounds like a web urban legend to me. But I’m happy to be proven wrong. I’ve never had Google take down my VPS, but then, I’ve always had a robots.txt file.