1. 35
  1.  

  2. 14

    This meta tag will block only Google’s web crawlers

    Given that google stopped respecting robots.txt, I will assume that they will have no qualms stopping respecting this one should it become a nuisance.

    1. 4

      To be fair, Google does respect robots.txt Disallow, which is what 99% of people actually want (but not what the author here needs).

    2. 11

      Personally I find this to be pointless actionism. Sure, Google won’t list your site, but people won’t be able to find it so its value diminishes. Unless your website is well trafficked through other means (lobste.rs, HN, word of mouth, poster ads, facebook) you might as well remove your site from the Internet.

      It feels like protesting by… being quiet and staying home. No useful change will evolve from that.

      What would be better is to improve resources on explaining why you think Google is bad, showing how to reduce the amount of dependence people have on Google, showing and potentially improving alternatives to Google services etc.

      Notably, the resources the author lists, like Fuck off Google and Privacy Guides are both the #1 hit when you search them by name on Google.

      1. 2

        What would be better is to improve resources on explaining why you think Google is bad, showing how to reduce the amount of dependence people have on Google, showing and potentially improving alternatives to Google services etc.

        None of these things actively diminish the usefulness of Google. If even a small number of sites do as OP did, you can no longer assume you aren’t missing anything by doing just a Google search. Did you address this in your comment and I missed it? I do notice you wrote “why you think Google is bad,” so readers may suspect you are diminishing this strategy because you disagree with its goals rather than genuinely doubting its effectiveness, but I’m sure that was not your intent.

        At one point I was considering getting an iPhone and enabling iMessage, then switching back to my dumb phone, so that people with iPhones can’t send me texts unless they disable iMessage. If enough people do that you can no longer assume a text sent from an iPhone with iMessage enabled will reach its destination, so it would force people to disable iMessage. Same principle but more extreme.

        1. 3

          None of these things actively diminish the usefulness of Google. If even a small number of sites do as OP did, you can no longer assume you aren’t missing anything by doing just a Google search.

          Ok, but what about increasing the usefulness of alternatives? Every single time I try to search for a specific thing via DuckDuckGo, I end up having to resort to Google, because it never comes up even if it’s searched for by name, in quotes. Google might be bad, but the alternatives are several leagues worse.

          Also of note is the hypocrisy by the author – they do not like Google search but were fine with creating a privacy-invading map of the fediverse that was rejected by anyone outside of mastodon.social

          1. 1

            Nothing wrong with making alternatives better, I was just pointing out what /u/Leonidas seemed to miss in assessing this as “pointless actionism” akin to “being quiet and staying home.”

            Can’t comment on whether fediverse.space invades privacy or whether that is the same problem OP has with Google.

          2. 1

            I do notice you wrote “why you think Google is bad,” so readers may suspect you are diminishing this strategy because you disagree with its goals rather than genuinely doubting its effectiveness, but I’m sure that was not your intent.

            I think it is a bit of both. I doubt its effectiveness and also think that avoiding Google would be easier if there were better alternatives but there currently aren’t for many things. The centralization of Gmail is concerning, but getting your emails from your own MTA delivered to Gmail is much easier than to Hotmail/Outlook.com (because, let’s face it, people won’t migrate to a mail provider that charges money). Also, all the Google Maps alternatives aren’t even remotely as good as GMaps was years ago. Similarly, YouTube has just no alternative that would be anywhere near as useful.

            So while you can sort of feasibly use DuckDuckGo instead of Google search and I do, the rest of the Google ecosystem is just on its own merit of better quality than the alternatives and this is why people choose to use it. You can also sort-of use Firefox and it at least not a 100% copy of everything Chrome does, but I think boycotting Google search to avoid giving them ad revenue is probably less effective than just using an ad blocker.

            So yeah, I guess Google is bad but the (current) alternatives are worse.

          3. 2

            I see this as being like not using Facebook. Things like Google Search and Facebook have no value on their own. The value of Google Search comes from the things in their index. The value of Facebook comes from the people who are on Facebook. If enough sites refuse to allow Google to index their contents, then the value of Google Search is reduced. If you aren’t on Facebook, the value of Facebook to everyone else is very slightly reduced. If a lot of people aren’t on Facebook, other people leave and the value drops more. This is much easier for platforms that depend on symmetrical network effects like Facebook - ICQ, AIM, and MSNM all died because people didn’t join / left and so gradually the value for everyone else dropped until they also left.

            It’s less clear that this works for platforms where the consumers of the network effects are distinct from the producers. You don’t get the feedback loop because someone stopping using Google Search because it can’t find the sites that they need doesn’t reduce the value of the ecosystem for everyone, but in an ad-driven world then if Google drives fewer visitors to your site because other useful sites aren’t indexed then the cost to you of excluding Google from is less.

          4. 10

            Why not use a robots.txt? I am not a big fan of using meta-tags for this, and in the author’s sense, wasting my users’ bandwidth for such redundant tags. It also saves you from accidentally forgetting to add it to a document.

            1. 25

              Hi, author here – Google stopped respecting robots.txt noindex a few years ago: https://developers.google.com/search/blog/2019/07/a-note-on-unsupported-rules-in-robotstxt

              Combined with the fact that Google might list your site based only on third-party links, robots.txt isn’t an effective way to remove your site from Google’s results.

              1. 13

                https://developers.google.com/search/docs/advanced/crawling/block-indexing#http-response-header indicates that X-Robots-Tag: noindex as a HTTP header should have the same effect as the meta tag. It’s relatively easy to manage that per-server in the webserver config.

                That doesn’t save much on bandwidth but at least the “accidentally forgetting” issue is averted.

                1. 9

                  Today I learned. Thank you very much, Sir!

              2. 5

                Wouldn’t it be more effective to check for Google user agent (or even better, known Google IPs) and return 404 or just slam the connection closed?

                1. 2

                  Google will occasionally do some sneaky verification to make sure you’re not serving it different content than your actual users. While I haven’t heard of them sharing that info with anyone for spam making purposes, I would be concerned about getting delisted from more than Google this way.

                  1. 2

                    I’m not sure what’s the situation today, but some time ago it was a pretty common thing to serve different content for GoogleBot than for the users. Some paywalled sites were (or maybe still are) serving full content if the User-Agent was set to GoogleBot, so it was possible to bypass them with a browser plugin that allowed for easy switching of User-Agent string.

                    Browsing the web as a full-time GoogleBot also didn’t make any sense, because lots of sites were dropping styles or were launching a mobile version of the site automatically.

                    While the above doesn’t mean that Google doesn’t perform any sneaky verification, I have some doubts if the outcome of this verification would be enough to delist a site (well, unless the verdict would be verified by a human I suppose).

                    1. 1

                      It’s not clear what the threshold is and what the “manual action” would involve in that case, but it is against their guidelines. https://developers.google.com/search/docs/advanced/guidelines/cloaking

                      1. 4

                        Given the authors intent has been to become delisted, going against their guidelines would probably be in their interest.

                        1. 3

                          My point was that this may not end at Google search. It depends how they classify the delisting and who they share the data with. (Now or in the future)

                          From an experience few months ago: 3 invalid bot-generated phishing reports on http://phishtank.org/index.php , which get distributed to opendns, which gets used in an anti-virus package can take you days to clean up fully - meanwhile your site comes up as “alert: malware” for many people. Now imagine Google shares their “dodgy site” list with others - what’s the fallout?

                          Even if you want to get deleted from Google, I don’t think explicitly breaking their rules is a good idea.

                2. 2

                  You have neglected to disable FLoC with Permissions-Policy: interest-cohort=(). Like you I’m disappointed with Google ignoring parts of the robots.txt standard that they clearly understand the meaning of and am unpersuaded by their reasoning but for me the FLoC system is much more obnoxious.

                  I can also tell you from personal experience that disabling FLoC harms your search appearance considerably. My own website was downranked hard after I added that header.