1. 25
  1. 31

    The robots.txt spec does not include it, but most bots support the extension: Crawl-delay

    Rather than block bing, they should be adding Crawl-delay: 10

    Bing documents this: https://blogs.bing.com/webmaster/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot/ https://blogs.bing.com/webmaster/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation

    1. 2

      Maybe there’s a punitive angle to this action?

      1. 2

        The only ones who will feel real pain from that action are the webmasters who didn’t see the announcement and lose traffic.

    2. 44

      bing has no qualms hitting meta more than 5000 times in a 3 hour period

      We decided to take this drastic measure to protect Discourse sites out there from being attacked by Microsoft crawlers.

      One read every two seconds is not an attack, it’s totally irrelevant unless your software is many (at least 3?) orders of magnitude too slow.

      Maybe they should work on that before shifting the blame onto Microsoft.

      1. 17

        Meh, what’s a factor 10 when you’re still under a request per second? Does this actually matter?

        I’m curious whether any attempt was made to contact the Bing team before the change & post.

        1. 3

          Seems very unnecessarily frequent. If you had multiple search engines checking in this frequently it’d add up quickly

        2. 7

          That is not a good idea. Many sites might be affected without seeing that notice. Bing has larger usage than it might seem.

          1. 2

            Plenty of time for people to yell. How many discourse sites pull the latest code on a regular basis ? :)

            1. 3

              I do. Luckily, I saw this post in my RSS/Atom reader. You can write an opinion here.

          2. 3

            Somewhat relatedly, and (not so) funnily, Microsoft / LinkedIn is suing a company for… crawling their Web data. https://www.eff.org/cases/hiq-v-linkedin