It doesn’t matter whether telemetry was added to a product in good faith or not: projects change leadership, corporations are taken over, data sets leak, or people just change their mind. Once I opt in any change in the amount and quality of the data sent to headquarters is likely to be done without my explicit consent.
I also believe that the gains in quality that are claimed to be achieved based on telemetry data are highly exaggerated for commercial (i.e. ad industry) reasons. Apart from minor ergonomical improvements I can not think of much that is to be achieved by collecting user data, saying this as someone who is maintaining an open source project for 25 years, now.
Apart from minor ergonomical improvements I can not think of much that is to be achieved by collecting user data
Here is a list of things that Go team wanted to learn from transparent telemetry they proposed. It seems reasonable to me and hard to get any other way.
I’m not arguing for Go telemetry, just pointing out that the telemetry in general can have many use-cases that are even well documented. When you contrast that with someone who claims to have lots of experience and at the same time fails to see any use for such telemetry, it’s hard for me to not see this as a dishonest claim made only to push some hidden agenda. It would be much better for people involved in such discussion to be honest and say what they really think out loud, instead of making such silly claims. You don’t want telemetry in Go because you don’t trust Google or big corps in general? Fine, just say so instead of those silly pseudo technical arguments. 🤷
No need to get personal. I clearly wrote that I think the claimed advantages are exaggerated, and that I do acknowledge minor gains, I also wonder what “hidden agenda” you are talking about. In fact, your strong wording makes me wonder what your agenda is…
In my experience I found it more fruitful to directly engage with users and co-developers to gather areas of improvement instead of relying on questionable statistics that might be abused or may even misrepresent real use. Users should not be a resource to be mechanically harvested. If I’m unclear about the usage patterns of the tool I maintain, I can politely ask the userbase and they may or not be interested in telling me about it.
No, I don’t trust Google. I do trust certain communities in the moment, that may change in the future the same way as communities, their governors or their policies may change. That is my main point: trust is not something I grant eternally, and tool vendors should acknowledge and accept this.
Regarding my honesty: I truly can not remember any situation where adding telemetry to something I maintained would have made a genuine impact, note that I’m talking of my personal experience. There may be minor gains, perhaps, but is it worth requiring the user to pay for these gains with his or her data? I think not.
All I’ve seen of you in all the threads on the Go telemetry has been weird attacks on people rather than on their arguments. You jump down the throat of anyone who you feel hasn’t sufficiently “read the proposal”, you impute dishonest motivations to those who disagree with you, etc.
Perhaps you could find a more constructive way to make your points?
It’s worth noting that the GDPR requires both consent to collect data and restricts the use of that data to the things for which consent was given. It’s possible to write a GDPR-compliant privacy policy that lets you use telemetry data for the good (for the user) purposes but which would expose the holder of the data to huge liability if they’re new management tried to use it for anything else.
I’m a bit wary of this because I think it’s quite easy to use lists like the above to win the argument by economy: coming up with the list is a lot more time-efficient than going through it point by point and refuting each thing. But there are some points worth bringing up that apply to most of them.
Many people have opposed opt-in, claiming it will bias the results. rsc has also been vocally anti-surveys for similar reasons. But all of these suggestions are susceptible to other sources of bias—they overcount projects which are built often (or invoke other tools often; the same objection applies). I don’t know which projects those are, but I see no reason to just assume that the number of times a thing is compiled correlates with its significance.
Also, as other people in the thread have raised, that a thing happens (relatively) rarely doesn’t mean that its presence is unimportant. rsc has framed this as an exercise in answering important quantitative questions, but making the debate about how to answer them would be reasonable only if the quantitative answers were ends in themselves. They’re not—they’re a way of answering qualitative questions about the importance of various things to the community. I don’t think they achieve this.
I could go deeper and attack the statistics side of this, but I think it’d be a bit redundant. The data can’t be of high quality, even if it gathered and handled perfectly (which I don’t think it is) because it’s answering the wrong questions.
There is yet a wider zoom level through which to view this debate, however. Go as an institution has apparently been built on the premise of not talking to one’s customers. This is typical for Google, but it shouldn’t be typical for FOSS.
That article would be a bit more compelling if Russ put a disclaimer stating “I have not received any pressure from my superiors at Google to collect more data from Go users.”
What’s to prevent a sufficiently evil change in guard to just start collecting that info without really notifying you (or burying it)? Like, this seems like a legitimate concern that isn’t something that can be solved by complaining about telemetry. Just disable it and move on or use something else.
Once I opt in any change in the amount and quality of the data sent to headquarters is likely to be done without my explicit consent.
That’s a fair complaint. Assuming we wanted telemetry but wanted to avoid this issue, I see a design that could fix this.
Most projects’ SEND_TELEMETRY setting or environment variable is a boolean – DO_NOT_SEND or SEND. In the project that we want to add telemetry to, instead make it an enum: DO_NOT_SEND, SEND_V1, SEND_V2. If the project wants to start collecting new data, it would have to convince users to change their settings to SEND_V2. Whether the project used opt-in or time-delayed opt-out for its initial request to enable telemetry, it would use the same strategy to request the upgraded telemetry.
So long as the initial team stays in control and that enum is never expanded to new telemetry. What about telemetry you thought you were collecting on day 1 and it turns out there was a bug that meant you never sent anything useful? Could you fix that? What about sending the uptime but you realize you also need to know if the copy on disk is newer than your program’s uptime?
Much of this relies on the continued single minded determination of the person or entity making the decisions to make them consistently and correctly.
I think it was Audacity that had added telemetry …. in the form of Sentry bug collecting. People really got super pissed off and I was honestly a bit flummoxed. Surely bug reports are reasonable at some level?
It does feel like the best kind of telemetry is the opt-in kind. “Tell us about this bug?” Stuff like Steam has user surveys that are opt-in. It’s annoying to get a pop-up, but it’s at least respectful of people’s privacy. I have huge reservations about the “opt in to telemetry” checkbox that we see in a lot of installers nowadays, but am very comfortable with “do you want to send this specific set of info to the developers” after the fact.
IIRC, Steam also shows you the data that it has collected for upload and gets your confirmation before sending it.
I also appreciate that they reciprocate by sharing the aggregated results of the survey. It feels much more like a two-way sharing, which I think really improves the psychological dynamic.
Unfortunately, bug reports are just a single facet of product improvement that seems to get glossed over. If you can collect telemetry and see that a feature is never used, then you have signals that it could be removed in the future, or that it lacks user education. And automatic crash reporting can indicate that a rollout has gone wrong and remediation can happen quicker. Finally, bug reports require users to put in the effort, which itself can be off-putting, resulting in lost useful data points.
If you can collect telemetry and see that a feature is never used, then you have signals that it could be removed in the future, or that it lacks user education.
But it can be very tricky to deduct why users use or do not use a feature. Usually it can not be deduced by guessing from the data. That’s why I think surveys with free-form or just having some form of channel like a forum tends to be better for that.
A problem with both opt-in and opt-out is that your data will have biases. Whether a feature is used by the people who opted in is not the same question as whether people (all or the ones who pay you) make use of it. And you still won’t know why so..
There tends to be a huge shock when people make all sorts of assumptions and then because they try them and they still fail they start talking to users and are hugely surprised by thing they never thought off.
Even with multiple choice surveys it’s actually not the easiest. I am sure people that participate in surveys of technologies know how it feels to when the data is prevented give wrong assumptions as interpretation.
It’s not so easy and this is not meant anti-survey, but to say that this isn’t necessarily the solution and it makes sense (like with all sorts of metrics) to compare that with actual (non-abstract/generic) questions to end up implementing a feature, investing time and money only to completely misinterpret the results.
And always back things up by also talking to users, enough of them to actually matter.
But it can be very tricky to deduct why users use or do not use a feature. Usually it can not be deduced by guessing from the data. That’s why I think surveys with free-form or just having some form of channel like a forum tends to be better for that.
Asking users why they do/don’t use every feature is extremely time consuming. If you have metrics on how often some feature is getting used, and it is used less than you expect, you can prepare better survey questions which are easier for users to answer. Telemetry isn’t meant to be everything that you know about user interactions, but instead a kick-off point for further investigations.
I agree. However that means you need both and that means that you cannot deduct a lot of things simply by using running some telemetry system.
Also I am thinking more of a situation where when you make a survey and add (optional) text fields to provide context. That means you will see things that you didn’t know/think about, which is the whole point of having a survey in first place.
That’s something I’m not so sure about either though. I don’t really have a problem with anonymous usage statistics like how often I click certain buttons or use certain features. But if a bug report includes context with PII I’m less keen on that. Stack variables or global configuration data make sense to include with a bug report, but could easily have PII unless it’s carefully scrubbed.
You’re going to have a lot of wells to unpoison when it comes to this claim–especially given dark patterns such as not asking for consent around tracking in general. I don’t think folks argue it’s “telemetry = bad” but all of the mishandling of data we’ve seen and, without the source, it’s hard to to just “take their word” on what they’ll collect or that it won’t be abused in a future release.
I think that this is exactly what folks argue. At least that’s how I perceived the reaction to Russ proposal to add ‘transparent telemetry’ to Go. In fact, the discussion on lobsters showed that many (if not almost all) who argued against it didn’t even read the proposal.
That’s exactly the poisoned-well issue, though: there are enough somewhat credible accusations against Google that the actual data handling practices have in fact been incompatible with the literal reading of published documents. And any big org has stories about higher-ups overriding carefully balanced planning from the technical teams. Oh, and nobody doubts that Google can get both code and transmitted-data obfuscation done.
In this model — supported by many people’s evaluation of facts (mine included) — reading a proposal indeed cannot remove the worries, so some people just skipped it.
I have to agree. And it makes me sad, as software gets progressively worse for power users and developers, since they don’t receive our telemetry. I pretty much always enable telemetry, because I care about how I use a product being counted.
It can be used as an excuse to remove features that only the telemetry-naysayers are using, since the telemetry shows no use of said features. Before telemetry, someone had to formulate an argument for a feature removal, now the burden of proof has moved to those who want to keep it.
I think it’s true that a lot of people didn’t read the proposal, and more so the further you got from the original discussion on GitHub (I didn’t follow that one on lobste.rs, but I did see some nuance-lite posts on mastodon). But in spaces where it was more constructive, the message was much more clearly “must be opt-in.” And hey, they listened.
I think that in the current year people who still claim that slippery slopes are an invalid concern are either bad actors or useful idiots.
People have such a grumpy view on telemetry because once it becomes normalized there is no incentive in place to roll it back or limit it. Given how gnarly the economy is going to be for tech for the next several years, I fully expect maximum resource extraction to be the dominant strategy–and companies will turn to that telemetry to make it happen.
If I have a problem, I’ll send in a bug report or mainly upload a crash dump. I do not need my machine snitching on me however beneficial people will claim it to be.
I think that in the current year people who still claim that slippery slopes are an invalid concern are either bad actors or useful idiots.
Why is the current year relevant? A logical argument is valid or not, regardless of the prevailing zeitgeist. Maybe one has to willfully be somewhat ignorant to keep believing we can have nice things, especially in our current pessimism-saturated online discourse. So be it; I’ll be a useful idiot too. If the most pessimistic opinions about the impact of microelectronics had prevailed in the 1970s and early 1980s, I, for one, would have been much worse off. Telemetry is certainly not an invention on the same scale, but I’m willing to have faith that it can be a valid tool for producing more usable software, despite previous abuses and the current prevailing opinion.
I think one place it makes sense to have telemetry is on websites where your primary question is “are our users finding the buttons we want them to find”.
The average website user will not file a bug report.
There’s one fundamental reason to distrust telemetry however: aggregation cannot be done on my machine, so by default I have no idea whether my data is being aggregated and my records anonymised. I have to trust that they are.
I’ve also been reported a piece of licencing basically saying they can collect everything they want and there’s nothing you can do about it (except of course refusing the licence). I understand the product probably does its telemetry in good faith, but if the licence terms are that creepy I’m not going to willingly use the product. Not even if said product is an otherwise very good programming language.
Local differential privacy allows for the aggregation to be done locally, so that the only data that gets uploaded has already been blinded by noise. Apple uses local differential privacy in iOS for this reason–even if Apple’s severs were malicious, they still couldn’t learn what data you uploaded.
Differential privacy and its offshoots seem like nonsense. There is not a single concrete example on the linked page of how the scheme is actually secure against the observability threats it cites. It just states it by assertion. So it just seems like a lot of handwaving to justify the data they wanted to collect anyway.
The privacy I want is to not be observed without my consent, and that consent will never be given. The end.
I don’t budge on this because we have gone from GDPR-enforced cookie walls to now “legitimate interest” making a mockery of the whole thing.
Mass data collection endeavors should not and cannot be trusted. Just because the harm to the individual is small doesn’t mean the effects in total aren’t disgustingly totalitarian.
Blackmailing people into sharing their data because otherwise only the naive dumb dumb’s data is being collected… that’s especially totalitarian.
Differential privacy and its offshoots seem like nonsense. There is not a single concrete example on the linked page of how the scheme is actually secure against the observability threats it cites.
I agree the linked Wikipedia page is poorly written – it obscures how to actually implement local differential privacy. However, I found an answer in this phrase: “The prototypical example of a locally differential private mechanism is the randomized response survey technique”. That link has an example of using a coin to randomly deciding whether to answer a question truthfully. It seems intuitive to me that yes, randomly deciding whether to answer truthfully will give you more privacy than always answering truthfully.
For a toy example of implementing it in software, say an open source image editing program asks users to opt into sending one bit of telemetry – whether they have ever attempted to open an image in JPEG XL format. The program authors are considering adding support for that format, but that feature would bloat the binary size and make future changes harder, so they only want to add it if enough people would use it. With normal telemetry, the program would send as-is its record of whether you opened a JPEG XL image. With randomized response, a type of local differential privacy, the program would instead send its record with probability 0.5, “yes” with probability 0.25, or “no” with probability 0.25.
Say you don’t use JPEG XL right now and are sympathetic to the maintenance burden, but think there’s a chance you would start using JPEG XL, in which case you would want support for it. However, you are so private you never want to definitively identify yourself as a JPEG XL user. You build the program from source yourself and see that it implements randomized response in its telemetry. This knowledge should make you more willing to turn on telemetry, because whoever controls the telemetry receipt server could not definitively identify you as a JPEG XL user. At least 25% of all users who opted into telemetry would have sent the same answer as you, so you would be hiding among an identical crowd of a minimum size.
They could, by changing the code that’s running on your device without you knowing (or perhaps the code is already out of step with their public statements).
If they could change the code running on your device without you knowing, then they could also mine bitcoin, or do something else nefarious. I think in this case, as in all cases, if you don’t trust the authors of a program, you should audit the source code.
Telemetry just tells you the quantity and frequency of use, but it doesn’t tell you anything else. There are many features that people use rarely, but rely on nonetheless.
If go just by telemetry data, the developer of a phone app might also consider calling the emergency services such a rarely used edge case that it can be removed.
Telemetry doesn’t tell you the value people place on a feature, or how important it is for them. Maybe they chose your product over alternatives purely because of one rarely used feature, which they need nonetheless.
Also, even in the best case, telemetry can only improve the product in the eyes of the people making the improvements. If a company is spending money making a product and I’m using it for free, it’s unlikely that we are entirely aligned on this point.
As something approximating an example, many large websites use telemetry guidance to implement what are (from my point of view) clearly UI regressions. YouTube’s developers see that people who can find the button to disable autoplay don’t see as many ads, so they make it less obvious, etc.
I would love to see Linux distributions putting a system wide telemetry service accessible over DBus in place. Any software to be included would have to use this service. As would all the flatpak & co. stores.
This service would upload to distro repository that would be publicly accessible and thus auditable. It would make telemetry as a valuable resource available to the whole community and not only to a bunch of core developers.
Users would opt-in/out during installation and any time after that.
I would also like to point out this related post made by a friend: Making Go telemetry opt-in is a mistake. The truth is that telemetry is something that can be used for good, and it’s depressing that the well has been poisoned so much by Microsoft and Google.
The data is rarely looked at by human beings. In the cases (such as ad-targeting) where the data is highly individuated, both the input (your activity) and the output (your recommendations) are both mainly consumed by you, in your experience of a product, by way of algorithms acting upon the data, not by an employee of the company you’re interacting with.
This was exactly the argument that Google originally used against criticisms of Gmail’s data harvesting…
The voting analogy is interesting. The article says that telemetry is a form of voting; by providing the data, users can influence the development of the product to fit their needs best. This is treating users as passive consumers. In contrast, the older open-source ecosystem is based on voting through patches; we used to all be developers, and the tools we would build would be for us. Depending on the component’s depth in the system, there is probably a gradient between both positions.
One aspect the article isn’t speaking to is that it’s impossible not to leak information when using computers on a larger scale picture. If you have ever used packet sniffing, you will see that your computer is constantly talking to services left and right. Each has a particular reason to exist; the clock wants to stay in sync and talk to the NTP server; the printer driver intends to alert you if the toner needs to be replaced; the system seeks to check security updates need to be applied. The result is that your computer is now a firehose of information leakage. Adding more telemetry is just making the problem worse.
I only rarely enable telemetry because the opt-in mechanisms almost never specify what they actually collect, how they anonymise or with whom they share the data.
It doesn’t matter whether telemetry was added to a product in good faith or not: projects change leadership, corporations are taken over, data sets leak, or people just change their mind. Once I opt in any change in the amount and quality of the data sent to headquarters is likely to be done without my explicit consent.
I also believe that the gains in quality that are claimed to be achieved based on telemetry data are highly exaggerated for commercial (i.e. ad industry) reasons. Apart from minor ergonomical improvements I can not think of much that is to be achieved by collecting user data, saying this as someone who is maintaining an open source project for 25 years, now.
Here is a list of things that Go team wanted to learn from transparent telemetry they proposed. It seems reasonable to me and hard to get any other way.
It’s a reasonable list, but as Bunny351 pointed out, that list and who’s in charge of it can change. It does not exist in isolation or frozen in time.
I’m not arguing for Go telemetry, just pointing out that the telemetry in general can have many use-cases that are even well documented. When you contrast that with someone who claims to have lots of experience and at the same time fails to see any use for such telemetry, it’s hard for me to not see this as a dishonest claim made only to push some hidden agenda. It would be much better for people involved in such discussion to be honest and say what they really think out loud, instead of making such silly claims. You don’t want telemetry in Go because you don’t trust Google or big corps in general? Fine, just say so instead of those silly pseudo technical arguments. 🤷
No need to get personal. I clearly wrote that I think the claimed advantages are exaggerated, and that I do acknowledge minor gains, I also wonder what “hidden agenda” you are talking about. In fact, your strong wording makes me wonder what your agenda is…
In my experience I found it more fruitful to directly engage with users and co-developers to gather areas of improvement instead of relying on questionable statistics that might be abused or may even misrepresent real use. Users should not be a resource to be mechanically harvested. If I’m unclear about the usage patterns of the tool I maintain, I can politely ask the userbase and they may or not be interested in telling me about it.
No, I don’t trust Google. I do trust certain communities in the moment, that may change in the future the same way as communities, their governors or their policies may change. That is my main point: trust is not something I grant eternally, and tool vendors should acknowledge and accept this.
Regarding my honesty: I truly can not remember any situation where adding telemetry to something I maintained would have made a genuine impact, note that I’m talking of my personal experience. There may be minor gains, perhaps, but is it worth requiring the user to pay for these gains with his or her data? I think not.
All I’ve seen of you in all the threads on the Go telemetry has been weird attacks on people rather than on their arguments. You jump down the throat of anyone who you feel hasn’t sufficiently “read the proposal”, you impute dishonest motivations to those who disagree with you, etc.
Perhaps you could find a more constructive way to make your points?
It’s worth noting that the GDPR requires both consent to collect data and restricts the use of that data to the things for which consent was given. It’s possible to write a GDPR-compliant privacy policy that lets you use telemetry data for the good (for the user) purposes but which would expose the holder of the data to huge liability if they’re new management tried to use it for anything else.
I’m a bit wary of this because I think it’s quite easy to use lists like the above to win the argument by economy: coming up with the list is a lot more time-efficient than going through it point by point and refuting each thing. But there are some points worth bringing up that apply to most of them.
Many people have opposed opt-in, claiming it will bias the results. rsc has also been vocally anti-surveys for similar reasons. But all of these suggestions are susceptible to other sources of bias—they overcount projects which are built often (or invoke other tools often; the same objection applies). I don’t know which projects those are, but I see no reason to just assume that the number of times a thing is compiled correlates with its significance.
Also, as other people in the thread have raised, that a thing happens (relatively) rarely doesn’t mean that its presence is unimportant. rsc has framed this as an exercise in answering important quantitative questions, but making the debate about how to answer them would be reasonable only if the quantitative answers were ends in themselves. They’re not—they’re a way of answering qualitative questions about the importance of various things to the community. I don’t think they achieve this.
I could go deeper and attack the statistics side of this, but I think it’d be a bit redundant. The data can’t be of high quality, even if it gathered and handled perfectly (which I don’t think it is) because it’s answering the wrong questions.
There is yet a wider zoom level through which to view this debate, however. Go as an institution has apparently been built on the premise of not talking to one’s customers. This is typical for Google, but it shouldn’t be typical for FOSS.
That article would be a bit more compelling if Russ put a disclaimer stating “I have not received any pressure from my superiors at Google to collect more data from Go users.”
What’s to prevent a sufficiently evil change in guard to just start collecting that info without really notifying you (or burying it)? Like, this seems like a legitimate concern that isn’t something that can be solved by complaining about telemetry. Just disable it and move on or use something else.
That’s a fair complaint. Assuming we wanted telemetry but wanted to avoid this issue, I see a design that could fix this.
Most projects’
SEND_TELEMETRY
setting or environment variable is a boolean –DO_NOT_SEND
orSEND
. In the project that we want to add telemetry to, instead make it an enum:DO_NOT_SEND
,SEND_V1
,SEND_V2
. If the project wants to start collecting new data, it would have to convince users to change their settings toSEND_V2
. Whether the project used opt-in or time-delayed opt-out for its initial request to enable telemetry, it would use the same strategy to request the upgraded telemetry.So long as the initial team stays in control and that enum is never expanded to new telemetry. What about telemetry you thought you were collecting on day 1 and it turns out there was a bug that meant you never sent anything useful? Could you fix that? What about sending the uptime but you realize you also need to know if the copy on disk is newer than your program’s uptime?
Much of this relies on the continued single minded determination of the person or entity making the decisions to make them consistently and correctly.
I think it was Audacity that had added telemetry …. in the form of Sentry bug collecting. People really got super pissed off and I was honestly a bit flummoxed. Surely bug reports are reasonable at some level?
It does feel like the best kind of telemetry is the opt-in kind. “Tell us about this bug?” Stuff like Steam has user surveys that are opt-in. It’s annoying to get a pop-up, but it’s at least respectful of people’s privacy. I have huge reservations about the “opt in to telemetry” checkbox that we see in a lot of installers nowadays, but am very comfortable with “do you want to send this specific set of info to the developers” after the fact.
IIRC, Steam also shows you the data that it has collected for upload and gets your confirmation before sending it.
I also appreciate that they reciprocate by sharing the aggregated results of the survey. It feels much more like a two-way sharing, which I think really improves the psychological dynamic.
Unfortunately, bug reports are just a single facet of product improvement that seems to get glossed over. If you can collect telemetry and see that a feature is never used, then you have signals that it could be removed in the future, or that it lacks user education. And automatic crash reporting can indicate that a rollout has gone wrong and remediation can happen quicker. Finally, bug reports require users to put in the effort, which itself can be off-putting, resulting in lost useful data points.
But it can be very tricky to deduct why users use or do not use a feature. Usually it can not be deduced by guessing from the data. That’s why I think surveys with free-form or just having some form of channel like a forum tends to be better for that.
A problem with both opt-in and opt-out is that your data will have biases. Whether a feature is used by the people who opted in is not the same question as whether people (all or the ones who pay you) make use of it. And you still won’t know why so..
There tends to be a huge shock when people make all sorts of assumptions and then because they try them and they still fail they start talking to users and are hugely surprised by thing they never thought off.
Even with multiple choice surveys it’s actually not the easiest. I am sure people that participate in surveys of technologies know how it feels to when the data is prevented give wrong assumptions as interpretation.
It’s not so easy and this is not meant anti-survey, but to say that this isn’t necessarily the solution and it makes sense (like with all sorts of metrics) to compare that with actual (non-abstract/generic) questions to end up implementing a feature, investing time and money only to completely misinterpret the results.
And always back things up by also talking to users, enough of them to actually matter.
Asking users why they do/don’t use every feature is extremely time consuming. If you have metrics on how often some feature is getting used, and it is used less than you expect, you can prepare better survey questions which are easier for users to answer. Telemetry isn’t meant to be everything that you know about user interactions, but instead a kick-off point for further investigations.
I agree. However that means you need both and that means that you cannot deduct a lot of things simply by using running some telemetry system.
Also I am thinking more of a situation where when you make a survey and add (optional) text fields to provide context. That means you will see things that you didn’t know/think about, which is the whole point of having a survey in first place.
That’s something I’m not so sure about either though. I don’t really have a problem with anonymous usage statistics like how often I click certain buttons or use certain features. But if a bug report includes context with PII I’m less keen on that. Stack variables or global configuration data make sense to include with a bug report, but could easily have PII unless it’s carefully scrubbed.
You’re going to have a lot of wells to unpoison when it comes to this claim–especially given dark patterns such as not asking for consent around tracking in general. I don’t think folks argue it’s “telemetry = bad” but all of the mishandling of data we’ve seen and, without the source, it’s hard to to just “take their word” on what they’ll collect or that it won’t be abused in a future release.
I think that this is exactly what folks argue. At least that’s how I perceived the reaction to Russ proposal to add ‘transparent telemetry’ to Go. In fact, the discussion on lobsters showed that many (if not almost all) who argued against it didn’t even read the proposal.
That’s exactly the poisoned-well issue, though: there are enough somewhat credible accusations against Google that the actual data handling practices have in fact been incompatible with the literal reading of published documents. And any big org has stories about higher-ups overriding carefully balanced planning from the technical teams. Oh, and nobody doubts that Google can get both code and transmitted-data obfuscation done.
In this model — supported by many people’s evaluation of facts (mine included) — reading a proposal indeed cannot remove the worries, so some people just skipped it.
I have to agree. And it makes me sad, as software gets progressively worse for power users and developers, since they don’t receive our telemetry. I pretty much always enable telemetry, because I care about how I use a product being counted.
Why would that be the reason software is getting worse, considering that there wasn’t telemetry when it was better?
It can be used as an excuse to remove features that only the telemetry-naysayers are using, since the telemetry shows no use of said features. Before telemetry, someone had to formulate an argument for a feature removal, now the burden of proof has moved to those who want to keep it.
That makes sense, but if anything it is an argument against telemetry rather than for it.
It seems more like an argument for sad acquiescence to a fait accompli.
There’s not necessarily a connection between sending (or not) telemetry and software getting worse, though.
None of the software I use collects telemetry, and in my opinion it’s only gotten better.
I think it’s true that a lot of people didn’t read the proposal, and more so the further you got from the original discussion on GitHub (I didn’t follow that one on lobste.rs, but I did see some nuance-lite posts on mastodon). But in spaces where it was more constructive, the message was much more clearly “must be opt-in.” And hey, they listened.
I think that in the current year people who still claim that slippery slopes are an invalid concern are either bad actors or useful idiots.
People have such a grumpy view on telemetry because once it becomes normalized there is no incentive in place to roll it back or limit it. Given how gnarly the economy is going to be for tech for the next several years, I fully expect maximum resource extraction to be the dominant strategy–and companies will turn to that telemetry to make it happen.
If I have a problem, I’ll send in a bug report or mainly upload a crash dump. I do not need my machine snitching on me however beneficial people will claim it to be.
Why is the current year relevant? A logical argument is valid or not, regardless of the prevailing zeitgeist. Maybe one has to willfully be somewhat ignorant to keep believing we can have nice things, especially in our current pessimism-saturated online discourse. So be it; I’ll be a useful idiot too. If the most pessimistic opinions about the impact of microelectronics had prevailed in the 1970s and early 1980s, I, for one, would have been much worse off. Telemetry is certainly not an invention on the same scale, but I’m willing to have faith that it can be a valid tool for producing more usable software, despite previous abuses and the current prevailing opinion.
I think one place it makes sense to have telemetry is on websites where your primary question is “are our users finding the buttons we want them to find”.
The average website user will not file a bug report.
There’s one fundamental reason to distrust telemetry however: aggregation cannot be done on my machine, so by default I have no idea whether my data is being aggregated and my records anonymised. I have to trust that they are.
I’ve also been reported a piece of licencing basically saying they can collect everything they want and there’s nothing you can do about it (except of course refusing the licence). I understand the product probably does its telemetry in good faith, but if the licence terms are that creepy I’m not going to willingly use the product. Not even if said product is an otherwise very good programming language.
Local differential privacy allows for the aggregation to be done locally, so that the only data that gets uploaded has already been blinded by noise. Apple uses local differential privacy in iOS for this reason–even if Apple’s severs were malicious, they still couldn’t learn what data you uploaded.
What do you know, I stand corrected. Thanks.
Differential privacy and its offshoots seem like nonsense. There is not a single concrete example on the linked page of how the scheme is actually secure against the observability threats it cites. It just states it by assertion. So it just seems like a lot of handwaving to justify the data they wanted to collect anyway.
The privacy I want is to not be observed without my consent, and that consent will never be given. The end.
I don’t budge on this because we have gone from GDPR-enforced cookie walls to now “legitimate interest” making a mockery of the whole thing.
Mass data collection endeavors should not and cannot be trusted. Just because the harm to the individual is small doesn’t mean the effects in total aren’t disgustingly totalitarian.
Blackmailing people into sharing their data because otherwise only the naive dumb dumb’s data is being collected… that’s especially totalitarian.
I agree the linked Wikipedia page is poorly written – it obscures how to actually implement local differential privacy. However, I found an answer in this phrase: “The prototypical example of a locally differential private mechanism is the randomized response survey technique”. That link has an example of using a coin to randomly deciding whether to answer a question truthfully. It seems intuitive to me that yes, randomly deciding whether to answer truthfully will give you more privacy than always answering truthfully.
For a toy example of implementing it in software, say an open source image editing program asks users to opt into sending one bit of telemetry – whether they have ever attempted to open an image in JPEG XL format. The program authors are considering adding support for that format, but that feature would bloat the binary size and make future changes harder, so they only want to add it if enough people would use it. With normal telemetry, the program would send as-is its record of whether you opened a JPEG XL image. With randomized response, a type of local differential privacy, the program would instead send its record with probability 0.5, “yes” with probability 0.25, or “no” with probability 0.25.
Say you don’t use JPEG XL right now and are sympathetic to the maintenance burden, but think there’s a chance you would start using JPEG XL, in which case you would want support for it. However, you are so private you never want to definitively identify yourself as a JPEG XL user. You build the program from source yourself and see that it implements randomized response in its telemetry. This knowledge should make you more willing to turn on telemetry, because whoever controls the telemetry receipt server could not definitively identify you as a JPEG XL user. At least 25% of all users who opted into telemetry would have sent the same answer as you, so you would be hiding among an identical crowd of a minimum size.
They could, by changing the code that’s running on your device without you knowing (or perhaps the code is already out of step with their public statements).
If they could change the code running on your device without you knowing, then they could also mine bitcoin, or do something else nefarious. I think in this case, as in all cases, if you don’t trust the authors of a program, you should audit the source code.
Yes, this. I always audit the source code of the Microsoft Windows and Office automatic updates, doesn’t everybody?
The context of this comment and this specific telemetry question was for the Go compiler, which is open-source.
If the source code were available it would provide some accountability without each individual having to audit the code themselves as well.
Then of course, you have the question of “is the published code the code that’s running on my machine”
Can I (either as an individual or organization) collect the telemetry for auditing purposes? Why or why not?
Here is the telemetry for my blog: https://carlmjohnson.goatcounter.com/
Telemetry just tells you the quantity and frequency of use, but it doesn’t tell you anything else. There are many features that people use rarely, but rely on nonetheless.
If go just by telemetry data, the developer of a phone app might also consider calling the emergency services such a rarely used edge case that it can be removed.
Telemetry doesn’t tell you the value people place on a feature, or how important it is for them. Maybe they chose your product over alternatives purely because of one rarely used feature, which they need nonetheless.
Also, even in the best case, telemetry can only improve the product in the eyes of the people making the improvements. If a company is spending money making a product and I’m using it for free, it’s unlikely that we are entirely aligned on this point.
As something approximating an example, many large websites use telemetry guidance to implement what are (from my point of view) clearly UI regressions. YouTube’s developers see that people who can find the button to disable autoplay don’t see as many ads, so they make it less obvious, etc.
I would love to see Linux distributions putting a system wide telemetry service accessible over DBus in place. Any software to be included would have to use this service. As would all the flatpak & co. stores.
This service would upload to distro repository that would be publicly accessible and thus auditable. It would make telemetry as a valuable resource available to the whole community and not only to a bunch of core developers.
Users would opt-in/out during installation and any time after that.
I would also like to point out this related post made by a friend: Making Go telemetry opt-in is a mistake. The truth is that telemetry is something that can be used for good, and it’s depressing that the well has been poisoned so much by Microsoft and Google.
This was exactly the argument that Google originally used against criticisms of Gmail’s data harvesting…
The voting analogy is interesting. The article says that telemetry is a form of voting; by providing the data, users can influence the development of the product to fit their needs best. This is treating users as passive consumers. In contrast, the older open-source ecosystem is based on voting through patches; we used to all be developers, and the tools we would build would be for us. Depending on the component’s depth in the system, there is probably a gradient between both positions.
One aspect the article isn’t speaking to is that it’s impossible not to leak information when using computers on a larger scale picture. If you have ever used packet sniffing, you will see that your computer is constantly talking to services left and right. Each has a particular reason to exist; the clock wants to stay in sync and talk to the NTP server; the printer driver intends to alert you if the toner needs to be replaced; the system seeks to check security updates need to be applied. The result is that your computer is now a firehose of information leakage. Adding more telemetry is just making the problem worse.
I only rarely enable telemetry because the opt-in mechanisms almost never specify what they actually collect, how they anonymise or with whom they share the data.
#notalltelemetry ?
Also: https://www.techdirt.com/2023/03/10/gizmodo-found-28000-apps-sending-tiktok-user-data/