I suppose I have two home truths about analytics. Firstly that the vast majority of analytics data is not looked at. In my professional experience many just attach it and never look at it, though the data continues to flow and (importantly for the numerous SaaS analytics companies) they continue to pay for it.
The other comment is that of those who do look at it, the majority don’t use it very well. There is a weird sense that people have that analytics helps you to micro-optimise tiny stuff on your site (the “128 shades of the colour blue” story from Google). Perhaps this is true if you are in the top 5 companies worldwide by market capitalisation.
For most normal companies though, your analytics usually tell you that you are doing something basic wrong. For example you have an 8 step signup form with 90% drop-off from people who originally did want to do business with you. Instead of discussing and dealing with problems such as that you get a lot of talk about multi-armed bandits and weekly updates about noise metrics. This is perhaps similar to how many companies with fewer than 20 devs use k8s - people prefer to act as though they are part of an enormous company.
Javascript analytics obviously gives up some privacy. I can’t say that I am really ok with it. What I think privacy campaigners either don’t know or underplay is how little is gotten out of that surrendered privacy.
You might be right and the people for whom I work will never look at the analytics data.
However, until this day, they do. Not only for crashes and that kind of stuff so JIRA can have stuff for us, the developers, and our salary is justified, but also for see if users are using the application properly.
I cannot say really much about my work, but that data is used. Maybe not now, maybe it needs a bit of time to be effective, but it is used. It’s very early to confirm that.
Using analytics can be done in two ways: JavaScript (Firebase, Matomo) or also in the backend, or both. In Android you embed that data with a manager or something and put the analytics.trackScreen(...) in whenever you need it. If the developers don’t allow you to opt-out, you’re being tracked each time you use the app. In the browser you can block that using uBlock Origin or something like that and you’re almost not tracked…
Huh. I remember running it internally for a while, I wonder if they changed their license or if we were just abusing the license.. We never even got out of beta internally as it kept breaking and upstream seemed to have zero desire to make it stable(for us at least), even if we sent patches.
idk when you had this experience but nowadays we (I am an employee of Sentry) have a dedicated team for open-source work that makes sure the issue tracker gets triaged and external PRs don’t fall under the table. We also have a docker-compose setup since the complexity of the service increased over time.
We do still get a large amount of bug reports that we have a hard time remote-diagnosing, particularly around Kafka/Zookeeper and networking.
Yup, we changed the license. Unless you were trying to build a direct competitor to Sentry you were probably not abusing either license, but IANAL.
This was all just for internal, and it was many years ago(5-ish or maybe more, I’m not sure). I’m glad they/you seem to be doing better around open source stuff! We just wrote our own, very simple system that basically amounts to an issue in our issue tracker with a stack trace attached.
When we were running it, there was no Kafka or Zookeeper, so before those dependencies came in. As I remember, it was strictly python, with maybe a redis or SQL dependency and that was basically it. It sounds drastically more complicated now.
I make no claims that it won’t work for someone else, just that it was(at the time) terrible for us, stability wise. People should evaluate it for themselves and if it will work for their use-case.
There’s an unquestioned assumption near the beginning of this post:
I do not think that having analytics in a project is a bad thing. You cannot always rely in users to get reports of bugs or unexpected crashes with a full report on how to reproduce the bug.
We need to directly examine the idea that the question of how to ship less buggy code is answered by user-generated metadata. To explicitly repeat myself:
How do we make our applications crash less? Once we realize that this is the question that we care about, then it becomes somewhat obvious that there will be crashes which aren’t reported because the crash is too severe or obscure for the crash-reporting framework to recover, and that survivorship bias and Goodhart’s Law will combine to create a false image of the crashiness of the application.
And once we understand that analytics does not directly address the stated problem, then we no longer need to feel conflicted over whether the ethical concerns outweigh the imagined benefits.
I suppose I have two home truths about analytics. Firstly that the vast majority of analytics data is not looked at. In my professional experience many just attach it and never look at it, though the data continues to flow and (importantly for the numerous SaaS analytics companies) they continue to pay for it.
The other comment is that of those who do look at it, the majority don’t use it very well. There is a weird sense that people have that analytics helps you to micro-optimise tiny stuff on your site (the “128 shades of the colour blue” story from Google). Perhaps this is true if you are in the top 5 companies worldwide by market capitalisation.
For most normal companies though, your analytics usually tell you that you are doing something basic wrong. For example you have an 8 step signup form with 90% drop-off from people who originally did want to do business with you. Instead of discussing and dealing with problems such as that you get a lot of talk about multi-armed bandits and weekly updates about noise metrics. This is perhaps similar to how many companies with fewer than 20 devs use k8s - people prefer to act as though they are part of an enormous company.
Javascript analytics obviously gives up some privacy. I can’t say that I am really ok with it. What I think privacy campaigners either don’t know or underplay is how little is gotten out of that surrendered privacy.
You might be right and the people for whom I work will never look at the analytics data.
However, until this day, they do. Not only for crashes and that kind of stuff so JIRA can have stuff for us, the developers, and our salary is justified, but also for see if users are using the application properly.
I cannot say really much about my work, but that data is used. Maybe not now, maybe it needs a bit of time to be effective, but it is used. It’s very early to confirm that.
Using analytics can be done in two ways: JavaScript (Firebase, Matomo) or also in the backend, or both. In Android you embed that data with a manager or something and put the
analytics.trackScreen(...)
in whenever you need it. If the developers don’t allow you to opt-out, you’re being tracked each time you use the app. In the browser you can block that using uBlock Origin or something like that and you’re almost not tracked…Tiny nit: Sentry is not open source. They use the Business Source License.
Huh. I remember running it internally for a while, I wonder if they changed their license or if we were just abusing the license.. We never even got out of beta internally as it kept breaking and upstream seemed to have zero desire to make it stable(for us at least), even if we sent patches.
idk when you had this experience but nowadays we (I am an employee of Sentry) have a dedicated team for open-source work that makes sure the issue tracker gets triaged and external PRs don’t fall under the table. We also have a docker-compose setup since the complexity of the service increased over time.
We do still get a large amount of bug reports that we have a hard time remote-diagnosing, particularly around Kafka/Zookeeper and networking.
Yup, we changed the license. Unless you were trying to build a direct competitor to Sentry you were probably not abusing either license, but IANAL.
This was all just for internal, and it was many years ago(5-ish or maybe more, I’m not sure). I’m glad they/you seem to be doing better around open source stuff! We just wrote our own, very simple system that basically amounts to an issue in our issue tracker with a stack trace attached.
When we were running it, there was no Kafka or Zookeeper, so before those dependencies came in. As I remember, it was strictly python, with maybe a redis or SQL dependency and that was basically it. It sounds drastically more complicated now.
I make no claims that it won’t work for someone else, just that it was(at the time) terrible for us, stability wise. People should evaluate it for themselves and if it will work for their use-case.
Thank you very much for pointing this out. I updated the article adding this!
There’s an unquestioned assumption near the beginning of this post:
We need to directly examine the idea that the question of how to ship less buggy code is answered by user-generated metadata. To explicitly repeat myself:
And once we understand that analytics does not directly address the stated problem, then we no longer need to feel conflicted over whether the ethical concerns outweigh the imagined benefits.