1. 23
  1.  

  2. 4

    Is there someone who can elaborate on why it’s seemingly a need to be able to block headless browsers from accessing sites?

    1. 4

      I’m speculating, but I suspect it’s to do with verifying that the client is driven by a “real human” for advertising and tracking purposes.

      Edit I followed some links and found this article:

      http://antoinevastel.github.io/bot%20detection/2017/08/05/detect-chrome-headless.html

      Quoting from the second section:

      Why detect headless browser?

      Beyond the two harmless use cases given previously [doing tests or taking screenshots of webpages], a headless browser can also be used to automate malicious tasks. The most common cases are web scraping, increase advertisement impressions or look for vulnerabilities on a website.

      1. 2

        Thank you for the elaboration gerikson.

        So it’s basically a few attempts at making it slightly harder to use a headless Chrome to do bad stuff. It just seems like it’s on the wrong level the attempt is being made.

    2. 1

      meh, this is really a cat and mouse game. just test it like:

      if (navigator.webdriver || navigator.hasOwnProperty('webdriver')) {
        console.log('chrome headless here');
      }
      

      And there goes the article until the author can find a way to bypass this now…

      1. 6

        The point of the article is sort of that it’s a cat and mouse game. The person doing the web browsing is inherently at the advantage here because they can figure out what the tests are and get around them. Making the tests more complicated just makes things worse for your own users, it doesn’t really accomplish much else.

        const oldHasOwnProperty = navigator.hasOwnProperty;
        navigator.hasOwnProperty = (property) => (
          property === 'webdriver' ? false : oldHasOwnProperty(property)
        );
        Object.defineProperty(navigator, 'webdriver', {
          get: () => false,
        });
        
        1. 1

          Yet there are other ways that surely make it possible for a given time window, like testing for a specific WebGL rendering that chrome headless cannot perform. Or target a specific set of bugs related only to chrome headless.

          https://bugs.chromium.org/p/chromium/issues/detail?id=617551

          1. 1

            Well, eventually you just force people to run Chrome with remote debugging or Firefox with Marionette in a separate X session, mask the couple of vars that report remote debugging, and then you have to actively annoy your users to go any further.

            I scrape using Firefox (not even headless) with Marionette; I also browse with Firefox with Marionette because Marionette makes it easy to create hotkeys for strange commands.

            1. 1

              Even if there were no way to bypass that, don’t you think that you’ve sort of already lost in some sense once you’re wasting your users’ system resources to do rendering checks in the background just so that you can restrict what software people can choose to use when accessing your site?

              1. 3

                If headless browser is required to scrape data (and not just requesting webpages and parsing html), then website is already perverse enough. Noone will be suprised more if it would also run webgl-based proof of work before rendering most expensive thief-proof news articles from blob of malbolge bytecode with webgl and logic based on GPU cache timing.

                1. 1

                  You’re paying a price, certainly. But depending on your circumstances, the benefits might be worth the cost.