1. 45
  1.  

  2. 16

    I just checked and it seems like uBlock Origin blocks plausible too, so it might be even more.

    1. 4

      I did the study with June numbers while Plausible was not on any blocklists until yesterday so the numbers should be accurate for a more “average” web audience. Should have done a study today for just one day with traffic from Lobsters, Hacker News etc. Would expect much higher numbers but my intention was to look into a less tech-savvy audience and their use of blocking.

    2. 12

      The best source of data for your own website is still access.log

      1. 2

        My idea was to use server logs too but AWStats showed more than 100% higher number of unique visitors and more than 18 times higher number in page views (both compared to Plausible numbers) so I excluded it from the study as I thought it’s very inaccurate.

        1. 1

          how can that be?

          did you consider analog or another log analyzer?

          1. 3

            despite AWStats filtering bots, many do get through. It was easy to see as most viewed pages according to AWStats were back end pages etc. i tried Webalizer with similar results too. i published the stats here: https://plausible.io/blog/server-log-analysis

            1. 2

              Oh, yeah. The amount of bot traffic on a public website these days is ridiculous.

              I’ve started locking down my sites with httpauth and a simple login combo.

              1. 2

                Is there some sort of fail2ban for http servers that bans IPs if there are more than X requests in Y seconds, where X and Y are some values that humans typically don’t achieve?

                1. 2

                  I think you can indeed just actually use fail2ban. Should be possible to make a rule for this.

                  1. 2

                    Oh, it can parse webserver logs? I should read up on this!

                    1. 2

                      It can parse any log file. It uses regexes to determine when to trigger rules, iirc

      2. 4

        For reference I find that when I make the HN front page the spike on my CloudFlare analytics is about 2x as big as the spike on Google Analytics, with Google showing a really high rate of mobile users. Seems plausible to me that over 50% of HN/Lobsters users have an ad blocker on their desktop and some people also have one on their phone.

        1. 4

          Thanks for a study. Very interesting.

          1. 4

            13% seems like a nonissue at first glance; the graphs should tell the same story except with slightly different numbers, right? If you have a signup form, GA should say that 32% come there from A, 68% from B, but the absolute numbers are undercounts and therefore the percentages are a little less precise. If you get 21% of your inbound visitors from site X, leaving out 13% of the browsers when counting should make the “21” less precise, that’s all, so the historical graph’s a little more wiggly but hovers around the same level.

            But then there’s skew and join. About 50% of linux users disappear from the graphs, 5% of iphone users. If GA data is joined with something else, that skew might be a real problem. If someone facets the actual customer service costs by browser OS and then joins that with skewed GA, the cost of a linux-related customer service issue will appear to be twice as high as it really is. Someone might make a powerpoint slide that says “linux users form 2% of audience, 4% of customer service issues and costs ⇒ let’s drop linux”.

            There aren’t very many linux users, so this isn’t a big problem. (Except for us linux users, maybe.) But I wonder what other skew might go unreported. I’ve heard that the city I live in has a much higher frequency of adblockers than other comparable cities, although I’m not sure how accurate that is.

            1. 2

              This might foster alternative solutions.

              1. 2

                Interesting, it confirms my long time suspicion. It would be interesting to see a bigger study with a more representative sample on this topic. Like other comments mention, there is also the possibility of plausible being blocked, so these values can be even bigger.

                1. 1

                  What options are available for blocking Plausible? They seem to encourage website operators to use CNAME a subdomain to them, specifically to avoid blocklists.

                  Also, am I wrong in thinking Plausible is almost more immoral than Google Analytics? It seems like they’re trying to deliver spyware to people who have gone out of their way to block such things

                  1. 2

                    They seem to encourage website operators to use CNAME a subdomain to them, specifically to avoid blocklists.

                    That hasn’t worked against uBlock Origin or PiHole for months.

                    1. 2

                      How do they do that?

                      1. 1

                        uBlock Origin runs a CNAME query against everything before letting the request go through.

                        PiHole is a DNS server, so it already knows about every recursive request.

                        1. 1

                          Interesting, thanks for the info.