1. 35
  1.  

  2. 9

    The headline is a bit click-baity, but the write-up is high quality. In summary, the ratio between visitor count according to google analytics and that according to server log files has decreased from around 1 to around 0.5 for a selection of sites.

    One possible explanation is ad/tracking blockers blocking Google Analytics. Shouldn’t it be possible to determine that precisely, by making the Google Analytics javascript call back to your server? Then you should be able to get a precise reading on “fraction of log-file visitors that reach Google Analytics” by correlating log file entries for normal site visits and callbacks.

    1. 5

      This seems to start from the assumption that because there’s a big difference between GA and WordPress analytics that GA must be wrong because it’s the smaller number. Sure, ad-blockers play a part, but I’m skeptical that they could possibly account for most of the difference, when it’s 70% or more. Could it be that GA is actually just really good at excluding non-human traffic, and the numbers they’re getting from GA are closer to reality than what’s coming from WordPress logs? It’s not like there’s a canonical answer in the WordPress logs…

      1. 2

        I thought this, but the GA results show an upward trend while the processed web logs show a downward trend. That leads to the conclusion that GA is filtering less or recording more visits than are actually hitting the server. I don’t know what method the author is using to filter out non-human traffic from their web logs but I usually use a list of known bot User Agents and it’s accurate enough for my needs.

      2.  

        I’ve found over 78% of real site visitors are not being tracked.

        That’s great news. Let’s hope that the number creeps closer and closer to 100% as browsers get better and better at blocking tracking by default.

        1. 2

          I have worked on analytics tracking for embedded videos, and I am skeptical of the results. Analytics tracking is hard, especially if you want to compare subsequent visits (“sessions”) so each “person” is only counted once within a given period (daily, usually). Lots of browsers make weird requests, especially mobile browsers requesting different data (iOS has a user agent for web, a different agent for video, and probably others depending on what you are trying to fetch).

          On the other hand, I applaud questioning Google Analytics, and whether they can be considered the gold standard. We had a lot of customers tell us “Your numbers are wrong, because Google Analytics show us…” - since we were doing specialized analytics we often could be more precise in our tracking, but people had a hard time believing us over Google, because they were Google, regardless of arguments and showing them numbers, trends, showcasing examples, etc.

          1.  

            Analytics tracking is hard, especially if you want to compare subsequent visits (“sessions”)

            I too raised an eyebrow of ‘sessions’ compared to ‘local logs’ without much meat in the methodology of how these apples were compared to those oranges.

            I guessing the WPEngine logs are simply crawler useragent filtered rather than also being reduced into sessions.

            Plus I’m not sure that “new website in March” is enough reason for new user traffic, other than maybe the new site is much more chatty on the wire.

            What is good in the article though are the recommendations on using other more tangibile metrics such as real logins and what not.

            “Your numbers are wrong, because Google Analytics show us…”

            It is a pain I share but fortunately it is a two way street: “your VAST tag/VPAID player is broken” can be responded with “works fine in the Google VAST inspector, maybe it is your end” :)

          2.  

            A question that I didn’t see touched on in the article: can you confirm that the upward trend seen in the logs in WP Engine really is likely real people and not only an artefact of the amount of click-fraud spam bots operating on the internet increasing over time? Ideally you should be able to look at the rates at which people go through conversions that the spam bots can’t fake (like e.g. buying stuff) and verify that they go up and down roughly in line with the stats reported by WP Engine?

            I think the advice to concentrate more on conversion numbers (e.g. someone bought a thing) seems like a very good idea. Bots aren’t going to fake all of those so well.

            1.  

              Like everyone else, I think it’s weird to assume that WPEngine is better at filtering out bots (which make up ~50% of web traffic) than Google.

              But it’s definitely true that ad blockers warp GA data. I have a client who complained that our software was broken because GA said nearly 10% of their revenue was from Polish customers, which didn’t make any sense. We used a Rails gem that attached cart data to client-side GA data. It turned out that when an ad-blocker was in use, somehow the null location was labeled as this arbitrary place in Poland.

              1.  

                somehow the null location was labeled as this arbitrary place in Poland

                Are you European? If so, could it be something roughly analogous to this issue where the default GeoIP location for otherwise unlocateable IP addresses from a particular provider was actually a specific location near the geographic center of the US?

              2.  

                A more accurate comparison would be against actual HTTP access logs vs Google Analytics. I remember talking to the developers of Piwik asking about JavaScript vs log analysis. At the time Piwik could do both, but you’d have to run two instance (one for each). I told them it might be work looking into having one instance able to do both, and then have comparison reports; possibly even trying to break down what is/isn’t a bot and who does/doesn’t have JavaScript or Ad block enabled.

                1. 2

                  We had massive investment in using GA to track business events. I really wish we hadn’t, especially because once events go in they don’t really come out again. :(

                  1. 1

                    I addition to the two options listed in the “what to do” section, wouldn’t self-hosting your analytics potentially help? (Assuming that it is actually adblockers causing the difference)

                    There’s a big difference between google tracking you all over the web and a site owner tracking your use of their site, so many adblockers probably don’t automatically blocker the latter.

                    1. 1

                      It’d be interesting to consider the possible biases this could be introducing in understanding different cohorts of users. Which groups of users are most likely to have / not have analytics-blocking browsers/addons? How could this affect the interpretation of the collected data? What about the same people coming from mobile?

                      1.  

                        I came to the same conclusion about a year ago. Our conversions to registered accounts did not line up with the amount of traffic that Google Analytics was reporting, so I implemented our own usage tracking database. Our estimate was that Google was under-reporting traffic by roughly ten-fold. This article reinforces everything I’ve seen on our own service dashboard.

                        1. -1

                          s#Google Analytics is lying to you (massively)#Google is lying to you massively#g