As someone from the outside looking in on the situation, it seems to me that Google has simply burned its trustworthiness for this sort of thing. The Go devs are arguing based on the details of their particular proposal, but everyone else is arguing based on what the organization is known to do, hence the two camps seem unlikely to see eye-to-eye.
The opinion of Russel Cox, in the linked article from the 24th of February, is surprising. He writes:
In the GitHub discussion, there were some unconstructive trolls with no connection to Go who showed up for a while, but they were the exception rather than the rule: most people seemed to be engaging in good faith.
I have read a good chunk of the comments, also the ones which were collapsed for moderation reason. I didn’t see any trolls. Of course the debate was heated and Google got criticized but I didn’t see any troll trying to derail the conversation on purpose.
Also the comments would be from users “with no connection to Go”. Well I’m glad Russel Cox can know which users have a connection to Go or not just from their Github profile. That makes him prescient so I wonder why he needed the telemetry proposal in the first place.
That one would need to trust Google to not log your IP (thus also location) and time usage of the Go tools is also ignored once more.
I feel the proposal and specially the way it was presented damaged the reputation of Go in a significant proportion. But for this only time will tell us if it’s true, no telemetry.
I have read a good chunk of the comments, also the ones which were collapsed for moderation reason. I didn’t see any trolls. Of course the debate was heated and Google got criticized but I didn’t see any troll trying to derail the conversation on purpose.
You clearly didn’t look that hard.
I feel the proposal and specially the way it was presented damaged the reputation of Go in a significant proportion. But for this only time will tell us if it’s true, no telemetry.
There are a set of blog posts on Russ Cox’s personal website which outline a design for telemetry. They were posted with the intent of gathering feedback. Then a bunch of weirdos (who obviously didn’t read the blog posts) started giving speeches about morality and Google’s evil plans to use Go for data harvesting.
I think their sentence would be more accurately rendered:
I hope the proposal and specially the way it was presented damaged the reputation of Go in a significant proportion. But for this only time will tell us if it’s true, no telemetry.
I’m really annoyed that people started to bundle together the talk about error reporting and feature telemetry in the last few years. This works in favour of the big companies unfortunately. They will now ask “would you like to send us usage data, like information about the app crashing” instead of separate “would you like to send crash reports” and “would you like to send everything down to your keystroke timing”.
There’s lots of new users who can’t tell the difference and become suspicious of anything being sent. And then debugging their problems takes multiple times as long - if it’s even possible to reproduce on demand. Yet again, big corps are why we can’t have nice things.
there are unknowable megabytes of private Go code that will never see the light of day
I don’t buy that this is a technical problem. Can Google really not spend enough money on direct contact and outreach to know what people are doing in private? It’s not like the private usage is secret - it’s mostly just not public. If it’s big enough, your corp representatives can establish the needed relationships… I guess unless you’re Google and have “not talking to customers” in your DNA.
You let them know how to contact you. They likely want a relationship with you, so you say “if you’re running a large / interesting project, let us know, we’d like to understand your needs”. Often you provide consulting / training around the product. I’ve traveled before to meet up with developers of a project we were using, for essentially a meet and greet and to chat about their future plans. I’ve enrolled into partner program for an app I’m relying on. It seems to be a common thing between companies. I’m sure Percona (for example) knows which companies use which databases/features internally, without any telemetry.
And if they want to stay secret instead of private, they’ll kill the telemetry too, so that’s no difference.
On the other hand, I don’t understand how collecting anonymous usage data that is trivial to opt out of is at all equivalent to spying or is harmful to anyone. I was hopeful when reading the original post that having an example of a well designed anonymous telemetry system would encourage other people to adopt that approach, but given it wasn’t treated any differently as non-anonymous telemetry by the community I don’t know why anyone would go through the effort.
There is no such thing as “anonymous data” when it’s paired with an IP address.
Even when it’s trivial to opt out, it’s usually extremely difficult to never use the software in a context where you haven’t set the opt-out flag or whatever. Opting out for one operation might be trivial, remaining opted out continuously across decades without messing up once is non-trivial.
I agree IP address is non anonymous, which is why this system doesn’t collect it. Most privacy laws also draw the line at collecting PII as where consent is required and I think that’s a reasonable place to draw the line.
Most software and websites I use has far more invasive telemetry than this proposal, and I think my net privacy would be higher taking an approach like Go proposed rather than the status quo, which is why I was excited about it being a positive example of responsible telemetry. Good for you if you can go decades without encountering any of the existing telemetry that’s out there.
How does the telemetry get sent to Google’s servers in a way which doesn’t involve giving Google the IP address?
I agree that website telemetry is also an issue. But this discussion is about Go. There is no good example of responsively spying on users without their consent.
You do have to trust Google won’t retain the IP addresses, but the Go module cache also involves exposing IP addresses to Google. I think the on by default but turn it off if you don’t trust Google is reasonable. I also trust that the pre-built binaries don’t contain backdoors or other bad code, but if you don’t want to trust that you can always compile the binaries from source.
Anyways, I’m not trying to change your mind just trying to explain why some people don’t consider anonymous telemetry that’s opt-out to be non-consensual spying.
IANAL, but collectioning data associated with an IP address (or some other unique identifier) definitely required consent under the GDPR.
An IP address or UUID is considered pseudonymous data:
‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;
What differs pseudonymisation from anonymisation is that the latter consists of removing personal identifiers, aggregating data, or processing this data in a way that it can no longer be related to an identified or identifiable individual. Unlike anonymised data, pseudonymised data qualifies as personal data under the General Data Protection Regulation (GDPR). Therefore, the distinction between these two concepts should be preserved.
That is some really creative copy pasting you did there. I am also not a lawyer but I don’t think it is super relevant for this proposal since they follow the first principle of data collection: “do not collect personal data”.
Imagine the discussion goes like this:
You: “Hello Google, I am a Go user and according to the GDPR I would like you to send me a dump of my personal data that was sent via the Go tooling telemetry. To which I OPTED-IN when it was released.”
Google: “That data is anonymized. It is not connected to any personal data. We have the data you submitted but we cannot connect it to individuals.”
You: “Here is my IP address, will that help?”
Google: “No, we do not process or store the IP address for this data. (But thank you! now we know your IP! Just kidding!)”
You: “Here is the UUID that was generated for my data, will that help?”
Google: Unfortunately we cannot verify that is actually your UUID for this telemetry. And thus we don’t know whether you are requesting data for yourself.”
That is some really creative copy pasting you did there.
You can find all this in the GDPR. At any rate, I wasn’t criticizing The Go proposal, only the statement:
guidance of both GDPR and CCPA is that an IP address is not considered PII until it is actively correlated / connected to an individual.
But I see now that this is a bit ambiguous. I read it as analytics associated with IP addresses is not PII, which is not really relevant, since it is pseudonymization according to the GDPR and pseudonymous data is subject to the GDPR. But I think what you meant (which becomes clear from your example) was that in this case there is no issue, because even though Google may temporarily have your IP address (they have to if you contact their servers), but they are not storing the IP address with the analytics. I completely agree that the analytics data is then not subject to the GDPR. (Still IANAL.)
I think that this is a mistake and will make the data collected by this telemetry system functionally useless
Excellent! Telemetry is a slippery slope, especially when the world’s largest ad company is involved. There are lots of upsides to misusing telemetry data, and just because it omits personal info now doesn’t mean it will forever. So I hope it is utterly useless and the people who “need” it lose out.
This is funny and it hints at bad faith when suggesting opt out. Because how does that functionally change the sampling unless you assume ignorance of the choice and presence of the surveillance?
It’s indeed sad that this thread is dominated by hot-take reactions to the word “telemetry” mostly bringing up points already addressed by two long, thoughtful, evenhanded TFAs. I expected better, lobsters.
read the proposal, and still don’t want google to open this can of worms. what’s your point?
specifically, I don’t like how google will now include machinery to automatically phone home. I don’t have a problem with the current plans, especially that it’s now opt-in, but I have a problem with the idea that it will be very very hard for future google to ignore that they can now siphon data from anyone running Go apps. today it’s “anonymous, random” stuff, tomorrow it’s…. ?
My point is just that there are a lot of people who clearly haven’t read the proposal and who are invoking things like “but GDPR!” and “oh no, they’ll be able to build usage patterns”. (Not just here in this discussion page.) They clearly haven’t looked at the proposed mechanisms and the great care they would take to keep identifiable and invasive information out of the reports. It’s not perfect—there are changes I would make—but it’s better than many of the accusations being leveled against it.
Some people are instead just objecting on the basis of “Google is trying to slurp up information”. (In fact, while Golang came from Google and is still moderately associated with it, this isn’t Google per se, although I understand that that perception is real.) And that might be an OK objection if they used correct information, but most aren’t. I think if you want to object on that basis, you have to make a reasonable comparison with the current state of things. The Go team could try to do a frog-boil and start with innocuous telemetry and later start sneaking in some more invasive stuff, but… let’s be honest, we’ve got a community of detail-oriented nerds with strong opinions here. I don’t think it would work! And maybe they’d push it through anyway, but they could already do that today.
Honestly, it’s just very unfortunate that Go is the project bringing this proposal forward. I think that if a Python linter proposed this, the discussion would be going very differently. (And then depending on how that turned out, the Go folks could decide to adopt the same thing.) Someone has to pioneer privacy-preserving telemetry so that we can then go to existing projects and say “hey, use this instead”. Anything Google-associated is kind of doomed, though.
The collected data goes through many efforts to avoid including personally identifiable information, but apparently IP addresses are personally identifiable information and in order for data to be transmitted over the internet, an IP address is required as the source. This is unavoidable because this is HOW THE INTERNET WORKS.
The author is apparently still approaching PII with a 1900s mindset. An increasing number of jurisdictions are unwilling to accept “that’s just how the software works” or “that’s just how the network behaves” as excuses for leaking metadata. The correct takeaway uses modus tollens; if it’s not possible to collect this sort of data without breaking norms around handling PII, then the data ought not to be collected.
The onus is on you to argue why the particular proposed telemetry is OK. In general, spying on users without their consent (which is what opt-out telemetry fundamentally is) should be considered not OK.
I disagree that it’s on me to argue that. Russ proposed something, and it is the responsibility of the people who claim that this proposal is bad to explain what faults they see.
As far as my own pov - It’s ok because it’s not collecting any personal information about users. I’m honestly finding it hard to understand what is the issue here. The proposal is carefully designed to only collect information about the go toolchain itself, no personal information is sent over the network (it’s not even possible to send arbitrary strings, just counter values), the collected data is publicly available and everything is open source. Finally, everyone who finds that not acceptable can easily opt-out.
I haven’t read the proposal, but I find the ‘no information about users’ bit interesting, as a compiler writer. The hardest thing about getting actionable bug reports for a compiler is that absolutely anything that the compiler accesses can be sensitive. Even the approximate shape of an AST can leak proprietary information. I find it hard to imagine something that doesn’t contain any personal / sensitive information, but which is actually useful to me.
A quick skim suggests that it is mostly not collecting the things that I’d care about as a compiler writer, but is full of side channels that leak more than I’d be happy with as a user. In particular, being able to correlate those counters with other data sources could leak information about proprietary (or simply unreleased) code and I am unable to provide an upper bound on the amount of information that is leaked. This is a common technique in privacy-violating technology: show users what you collect, don’t show them what it correlates with or what other datasources you can combine it with and it looks benign.
You’re the one arguing in favor of spying on users without their consent. Everyone’s default position should be that spying on people without their consent is not okay.
IP address + time of use is identifiable. Not many people other than me have a routine where they regularly move between the place I live and the place I work.
Can you please stop claiming that they propose something that can collect time of use? The proposal was to upload counters once a year.
In fact if you think about this, Ip address and time of use is seen also by git servers. I don’t see anyone claiming that git servers are spying on users, do you? In fact git servers are much worse since they see time of use when you push/pull, much more often than once a year counters upload.
They necessarily know what time it is when the telemetry calls home, right? And the IP address from which the call happens?
And you seriously claim that you are worried that owners of transparent telemetry servers will learn that you use Go toolchain once a year?
Git servers are accessed when I ask some tool to access the git server.
You could argue that this is the same thing. You need to educate yourself to understand what and when is being sent to git server, which commands communicate over network and which are local. The same thing could be said about future Go toolchain - you would have to educate yourself when and what could be sent, and how to completely disable it.
There is a list of things Go team wants to collect once a year: https://research.swtch.com/telemetry-uses - can you explain to me which of those examples do you consider ‘spying’ ?
I have written a detailed proposal for gathering telemetry from your daily household routine. I think this is good and that the information I intend to collect should not be considered “spying”. I also claim that the information I will gather is only going to be used for good purposes. Sure, the company I represent may have a horrendous track record of over-gathering personal information and using it for not-good purposes, but I am a well-known programmer who wrote thoughtul words, and that means this time it is automatically OK.
Anyway. This will require me to plant sensors and transmitters in your house that can collect and send the information to me, but if you read the proposal you will find why I think this is OK.
I understand that some misguided people have a reaction to this and think that some sort of “privacy” should exist in their own home, but they simply don’t understand the potential of this proposal – I could significantly improve my business with this information! And besides, it is not my burden to convince them; the default position is and always must be that I can plant my telemetry devices in your house. The only way to change that is for you read every single word of the proposal three times over and then rebut it line-by-line in a way that convinces me you’ve actually read it thoroughly enough.
The whole idea of having a compiler calling home is a very bad one. I want a simple program that does what I expect it to do. Which is compiling a language into another. This stuff always opens a channel waiting to be exploited. We all have seen this countless of times. Starts with some harmful telemetry by some well intended people, then years pass, other people gets in their place and before we know it, user privacy went out of the window. To the point of risking the lives of people in oppressive regimes.
Just don’t create a problem where there isn’t one. I think the general opinion of not wanting any kind of telemetry is not that difficult to understand nor is it baseless.
The whole idea of having a compiler calling home is a very bad one.
That is the main difference between you and me. You seem to oppose the idea of any kind of telemetry regardless how it would be implemented. I on the other hand accept that there might be a design that preserves privacy. In fact I find transparent telemetry design good enough on privacy front and I’m willing to use a tool that implements it. Doubt we can find a compromise.
What I think is important is to be clear about your reasons for opposing it. In the github discussion thread I felt that a lot of people shared your point of view but instead of saying it clearly tried to ‘hide’ it behind some invalid technical arguments. I’m saying invalid because I suspect (and this lobster post confirms it) that a lot of complaining people didn’t really read that proposal. Which is a shame because it wastes everyone’s time.
are you making assumptions based on the word ‘telemetry’ ?
That was pretty much my point. It is an horrible idea from the start. Just don’t. I don’t want it, obviously many don’t want it either. At all.
But I want to care about opposing views. Perhaps I am wrong? Maybe I am missing something.
I read the design now. Still no thanks.
It’s an open source programming language implementation. Leave it alone. Stop this “let’s improve the customer based on their feedback” non sense. It is a tool for engineers by engineers. Let technical merits prevail. Do a survey if you want to get a bit of insight of the user base, maybe.
Give me a technical tool that we both, its maker and the user, understand. Something I know is useful because I know why it is useful. It is not because of politics, customer service or any of that. I am talking about engineering, technology. Sure we can talk about the rest. Do I want to support a project because the author is a nice guy? Or perhaps it is ran buy people with good principles. Maybe. I don’t know, but let’s keep those things separated please.
You appear to be in need of a reminder that people are allowed to dislike the idea of on-by-default telemetry without needing to critique the specific technical mechanisms by which it is collected or the specific pieces of information collected. They also are allowed to default to being against it without being required to first comprehensively read and point-by-point rebut every proposal for telemetry that is put forward.
The hole in this argument is the counter-question of “why do you need to know?”
If you look at the vast swath of projects that have dedicated ports to particular architectures, operating systems etc., none of them require telemetry, particularly because the people that actually use these ports contribute to the larger project discourse.
Why does Google need invasive metrics to judge what needs to be maintained? Can this not be inferred from literally any other voluntary source of information? Issue trackers? Pull requests? Why do you need invasive telemetry, when you can judge this from the discourse around your projects?
The fighter jet argument is also very weak: the justification that “we can make sure that fighter jets are still running!” is being used.. for telemetry? Do you really think anybody will leave Go’s telemetry enabled on anything related to these projects? These are walled-off areas of development that won’t (and shouldn’t) send anything anywhere.
The arguments for this are veering into a direction that hasn’t been backed up by simply harvesting existing data. Whoever’s making the decisions around ports, please, listen to the people that use them and contribute to the discourse. Linux handles this just fine.
I trained as an engineer, so one of my main biases is the ethics of what I build and expose people to. When solving problems, I justify whether it is right or wrong. As an ethical standpoint, I claim that we should try to minimize data collected from customers and gain explicit consent when collecting it.
Now with that out of the way, let’s cover the core problem that telemetry approaches try to solve: How do people use a tool? What features do they use? How often does the tool work? How often does it fail? What chronic issues are we missing?…One of the most basic approaches to this is to just ask people how they use a tool and collect responses. This works to a point, but it does not scale very well…“It’s a basic truth of the human condition that everybody lies. The only variable is about what.” Dr. House
It is interesting that you quote Dr. House, because they have a complex understanding of medical ethics. According to ethical principles, patients must be informed of all the possible harms of a procedure and they have the right to say no to any treatment that they think is too risky. Note the phrase they think. I claim that not asking for explicit consent is not ethical.
One of the key examples given for why this system was considered is an incident where the Go standard library mysteriously had a C dependency on macOS (for context: Go programs depending on the standard library alone should not require a C compiler)
If the team maintaining Go claims to support a platform, they must own automated builds and testing like e.g. Rust Tier 1 targets. The fact that customers had to report issues to the Go team for a “supported” target does not speak to telemetry, it speaks to better automated testing. In this case running otool on Mac for all builds would work.
I just hope I don’t sound like a Google shill here. This design that someone working at Google proposed is the best way to do this kind of action. I guess a lot of the backlash is against the fact that the concept exists at all. I get it, but at some level leaving people alone doesn’t scale. You can’t know what someone is doing unless you see what they are doing.
Ask for my explicit consent 1) nicely, 2) using an evidence-based argument, 3) in a way that I can become a cheerleader to others including my company. Otherwise, I’ll make sure all my Dockerfiles and build scripts patch out all telemetry and never support your project.
The fact that customers had to report issues to the Go team for a “supported” target does not speak to telemetry, it speaks to better automated testing.
As a trained engineer you probably also know that that there is a huge difference between people using tools in “the wild” vs in a static lab setup.
I thought the opt out telemetry sounded reasonable to me, but this also strikes me as a good take. As a thought experiment, I can ask myself what if the proposal had come from the Chrome team (lol, too late, they spy on everything) instead of the Go team, and I think I would be much less positively inclined. I trust the Go team. I don’t trust “Google”. I extremely distrust the Blink team. But for someone who doesn’t interact with the Go team, they’re all just “Google”.
I think I would be bitter that Russ went back on opt out telemetry, except I actually do think the mandatory opt in during install will get a fair number of users, and so it won’t be a complete wash.
I don’t see how any telemetry transmitted via the internet that is opt-out is not a direct violation of the GDPR. The IP address that is transmitted with it (in the IP packets) is protected information that you don’t have consent to collect - you failed at step 0 and broke the law before you even received the bits you actually care about.
Of course, the GDPR seems to be going routinely unenforced except against the largest and most blatant violations, but I really don’t see why a company like google would risk it. Why other large companies are actively risking it.
My understanding of the GDPR was that IP addresses are not automatically PII. Even in situations where they are, simply receiving a connection from an IP address does not incur any responsibilities because you require the IP for technical reasons to maintain the connection. It’s only when you record the IP address that it may hit issues. You can generally use some fairly simple differential privacy features to manage this (e.g. drop one of the bytes from your log).
(30) Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.
This doesn’t actually say that collecting IP addresses is not allowed. It only states that when the natural person is known, online identifiers could be used to create profiles.
Furthermore this is only relevant if those online identifiers are actually processed and stored. According to the Google proposal they are not. They only keep record of the anonymous counters. Which is 100% fine with GDPR.
It’s a shame the go compiler isn’t well positioned UX-wise to ask users for opt-in consent at installation (as an IDE might) since that’d likely solve privacy concerns while reaching folk that don’t know about an opt-in config flag.
Yes IP addresses are not automatically PII, but if you can’t enforce they are not you must assume they are. The telemetry data itself is probably not PII, because it’s anonymized.
GDPR prohibits processing[0] of (private) data, but contains some exceptions. The most common used one is to full fill a contract (this doesn’t need to be a written down contract with payment). So assume you have an online shop. A user orders i.e. a printer you need his address to send the printer to him. But when the user orders a ebook you don’t need the address because you don’t need to ship the ebook. In the case of go the service would be compiling go code. I don’t see a technical requirement to send google your IP-Address.
Next common exception is some requirement by other law (i.e. tax-law or money laundering protection law). I think there is none.
Next one is user consents: You know these annoying cookie banner. Consents must be explicit and can’t be assumed (and dark pattern are prohibit). So this requires an opt-in.
Next one would be legitimate interest. This is more or less the log file exception. Here you might argue that the go team needs this data to improve there compiler. I don’t think this would stand, because other compiler work pretty well without telemetry.
So all together I[1] would say the only legal way to collect the telemetry data is some sort of user consent.
[0] Yes processing not only storing, so having a web server answering http requests might also falls under GDPR.
You are wrong. The GDPR is not some magic checkbox that says “do not ever send telemetry”. The GDPR cares about PII and your IP address and a bunch of anonymous counters are simply not PII. There is nothing to enforce in this case.
The only reason this argument seems complicated is that the technicians involved insist on understanding the argument in terms of its technical merits. What the “my biases” section is missing is the acknowledgement of a meta-bias in favor of there being a technical solution to a problem like this. If there were one, then sure, Go’s approach to transparent telemetry would be a good candidate. There isn’t, though, because the instigating factors in this dispute lie outside the technical domain. From my perspective, the situation is pretty simple: you’ve got a powerful organization whose incentives re privacy are dramatically at odds with the incentives of individuals, and the tradeoff to your language ecosystem receiving abundant investment from that organization is that a vast swath of the userbase is going to understandably object to telemetry no matter the virtues of the manner in which it’s collected. Corporate investment, tooling improvement via telemetry collection, user trust: pick at most two. There’s simply no technical solution that will allow the choice of all three, and that’s just how it is in our current landscape. The corporate investment choice was made a long time ago, so the only gracious thing to do is acknowledge the tension between the remaining two options and choose one. The Go team, then, behaved graciously here. I don’t see any mistake being made.
You know, I like Nielsen ratings because I only fill the paper out if I feel like it, and they send me a dollar.
Say these analytics are “worth” something: can we put a number on it? Can I get two microdollars for all my kilobytes of trouble? I mean, what’s it worth to them, I wonder?
This is the same as no analytics. Like all micropayment ideas, it requires a working micropayments system to already be in place and used by everybody, without friction or middle-men taking all of your money.
It isn’t. You just gave a few examples, which means that most compilers and most interpreters, such as Python, GCC, LLVM/Clang, Perl, PHP, Ruby, Tcl, D, SBCL, CLisp, and so on, do no such thing, and feel no need to. Trying to normalize it is creepy, and trying to do so by merely stating that it’s normal is really something else.
It is a normal thing for proprietary software. I think that is one of the driving factors making this controversial: Golang is ostensibly an open-source platform, but that brings expectations that are sometimes at odds with its historical origin as a Google initiative.
The informal, soft power companies can have over open-source technologies that people depend on creates resentment.
Yeah, I read those last few comments, and which compilers had telemetry, and I think you’ve hit the nail on the head. Go-with-telemetry has to be considered a proprietary platform in a way that go-without-telemetry doesn’t.
Careful how you use ‘proprietary’ here, I’m sure some pedant somewhere would point out that the license is still OSI. However, governance has always been a blind spot of open licensing, and that is where this issue falls.
All of those, with the exception of the JDK, are IDEs. They are not compilers.
It’s somewhat defensible to have telemetry in an IDE, and as far as I’m aware, IntelliJ and Visual Studio both asked me before collecting it.
The reasons they give for wanting telemetry in the Go compiler – the public reasons, notwithstanding any private reasons that we don’t know – are weak at best, and just serve to reinforce the reasons I dislike Go at worst.
For example, tracking what platforms people compile for. Why not just let the community maintain ports? It amazes me that LLVM can manage to have community-built and driven ports to the M68K platform, despite LLVM being a significantly more complex codebase than Go. Yet, Go won’t even let users of Power support ISAs lower than Power8. Even when the community gave them PRs and offered CI, they refused it! Large commercial customers using Go on Power7/AIX were even told to pound sand, let alone those of us trying to run Linux workloads on older Power hardware.
I don’t know what Go compiler authors want telemetry for, but as a compiler maintainer I would certainly be interested in some telemetry-style data to see the wrong code that people write, and be able to use that to improve error messages and/or think about how to avoid common mistakes. It is easy to find valid code in my language over the internet, but people almost never commit code that does not compile. All the intermediate states that are ill-parsed or ill-typed, but people wrote because it felt natural, this is what I would love to have access to. Of course this could be opt-in, and a good rule of thumb would be to only collect this for projects that are already publicly available – to make sure that there are as few privacy concerns as possible.
I thought of a design once: have the compiler create a git repository somewhere on user machines (one git repository per project in the language), and then commit incorrect/invalid files there on each failed compile. Once in a while, show users a message saying: “hey, would you like to send your git repo to for us to look at your errors and improve the compiler?”. (In particular, users decide to send their data over the network, and it is in a format that they can easily inspect to make sure they are okay with its content.)
I don’t know what Go compiler authors want telemetry for, but as a compiler maintainer I would certainly be interested in some telemetry-style data to see the wrong code that people write, and be able to use that to improve error messages and/or think about how to avoid common mistakes.
And this is a reason people run screaming away from telemetry, even if it’s arguably well-intentioned: If my compiler is sending my code to some other entity, that code can be used against me in a court of law. Am I writing encryption code? Am I writing code some AI flags as detrimental to some DRM scheme? It’s impossible to tell what could happen, potentially, and some of the scenarios are so horrible they outweigh the potential good.
I brought this up in the GitHub discussion and here, but got shouted down and silenced quite effectively.
as a compiler maintainer I would certainly be interested in some telemetry-style data to see the wrong code that people write
So do I, but I’d hate to actually be responsible for processing it. People accidentally paste a load of sensitive things into code, or write almost-correct commercially sensitive code all the time. The only way I’d be happy collecting this would be to have a local service trying to extract representative samples and then an explicit manual step for users to approve uploading them.
Yes, see the design I sketched above with local git repositories:
collect data locally for a project (so: you can enable this for your cool open-source project whose sources are widely available and insensitive, and disable it completely for your internal codebase that is not meant for public consumption; as a rule of thumb, only ever enable it for public projects)
store the data in a format that users can easily understand and review
sending the data remotely is an explicit action of the user
Unfortunately your idea of “representative samples” sounds in fact very very hard to do right. In general I don’t know what I’m looking for in this data yet, my queries may change over time, and I don’t know how to summarize it in a way that remains useful. There has been work on automatically minimizing buggy code, and we could consider doing it, but minimizing is compute-intensive (so would users really want that?). I think that for parsing errors, one could design specific summary formats that would make sense. For typing errors it is much harder in general (but doable, keyword is “type error slicing”), and understanding a typing error usually benefit from being able to build the affected parts of the project and also understanding the code, minimization could easily prevent that. And for “what kind of bugs do people introduce in their code that passes the type-checker but is caught by the testsuite”, automatically minimizing that in a way that does not hamper our ability to analyze errors and turn them into language/tooling design feedback, well, that sounds like a research project of its own (starting on the existing work on program slicing, but in a way that should preserve useful context for code comprehension).
And I think that for the people worried that their code could contain something very problematic, minimization/summarization is not necessarily going to reassure them. They will disable any feeedback process in any case. Their choice! But so maybe working hard on summarization if the intent is to reassure people is not worth it. I think it is just easier to work with the other sort of people that, like, “would enjoy streaming their coding session anyway but they never bothered to set it up”.
I wonder if it would make sense to generate an “errors digest” or something, as build output, which can be optionally (but encouraged) to be committed directly to source, like a lockfile.
It’d have to be trivially merged by git, but the compiler itself could provide a merge tool and some registry somewhere where projects could opt-in, probably associated with a package manager.
Then full opt-out is “gitignore”,
the default is “people working on this project can use that telemetry”, since the tools are built in
and opt-in is “register my project with the telemetry scraper, which pulls from git”
I guess this doesn’t handle the “private code” argument, but it would allow for some of what you’re looking for I think, on a not per-user, but per-project basis, which I think helps the PII argument.
Yes, committing errors into the project proper is a one way to go about it, if we can make the UI unobtrusive enough.
(This does not handle “private code” but I think that it is realistic to assume that, fundamentally, if people are not willing to make their code public, they probably don’t want to export telemetry information about it either unless you force them to. Luckily there are many programmers willing to write public code.)
It’s worth noting: Debian has had opt-in telemetry for nearly 20 years–without (AFAIK) complaints. This system runs afoul of the same “IP address” issues discussed here since, connecting to a server reveals your IP address.
I don’t understand how any of this is the problem for the user of Go. If the tool gets worse due to lack of “telemetry data” then user just finds a new tool.
Personally I don’t really care about the telemetry part, I care more about where the data goes to. In this case it seems to go to Google or Google employees. I do not trust Google to handle my data in any kind of manner. And if my data is valuable, then I hope they’re willing to pay for it.
As someone from the outside looking in on the situation, it seems to me that Google has simply burned its trustworthiness for this sort of thing. The Go devs are arguing based on the details of their particular proposal, but everyone else is arguing based on what the organization is known to do, hence the two camps seem unlikely to see eye-to-eye.
The opinion of Russel Cox, in the linked article from the 24th of February, is surprising. He writes:
I have read a good chunk of the comments, also the ones which were collapsed for moderation reason. I didn’t see any trolls. Of course the debate was heated and Google got criticized but I didn’t see any troll trying to derail the conversation on purpose.
Also the comments would be from users “with no connection to Go”. Well I’m glad Russel Cox can know which users have a connection to Go or not just from their Github profile. That makes him prescient so I wonder why he needed the telemetry proposal in the first place.
That one would need to trust Google to not log your IP (thus also location) and time usage of the Go tools is also ignored once more.
I feel the proposal and specially the way it was presented damaged the reputation of Go in a significant proportion. But for this only time will tell us if it’s true, no telemetry.
Btw, where did you find Russ Cox’s full name? I was curious since “Russel” seems like an unusual spelling.
Thanks, it’s a mistake on my part but now it’s too late to edit the comment.
Honestly, I was just curious where the “Russel(l)” came from, regardless of the spelling. I’ve only even seen “Russ.”
You clearly didn’t look that hard.
There’s no proposal yet.
Some of the rebuttals to criticisms in this thread have been “read the proposal!” and others have been “there is no proposal”.
I’m not sure what, exactly, is going on here, but it’s making me even more suspicious than I would have been to begin with.
There are a set of blog posts on Russ Cox’s personal website which outline a design for telemetry. They were posted with the intent of gathering feedback. Then a bunch of weirdos (who obviously didn’t read the blog posts) started giving speeches about morality and Google’s evil plans to use Go for data harvesting.
I think their sentence would be more accurately rendered:
I’m really annoyed that people started to bundle together the talk about error reporting and feature telemetry in the last few years. This works in favour of the big companies unfortunately. They will now ask “would you like to send us usage data, like information about the app crashing” instead of separate “would you like to send crash reports” and “would you like to send everything down to your keystroke timing”.
There’s lots of new users who can’t tell the difference and become suspicious of anything being sent. And then debugging their problems takes multiple times as long - if it’s even possible to reproduce on demand. Yet again, big corps are why we can’t have nice things.
I don’t buy that this is a technical problem. Can Google really not spend enough money on direct contact and outreach to know what people are doing in private? It’s not like the private usage is secret - it’s mostly just not public. If it’s big enough, your corp representatives can establish the needed relationships… I guess unless you’re Google and have “not talking to customers” in your DNA.
How do you figure out who those people are if they don’t tell people who they are?
You let them know how to contact you. They likely want a relationship with you, so you say “if you’re running a large / interesting project, let us know, we’d like to understand your needs”. Often you provide consulting / training around the product. I’ve traveled before to meet up with developers of a project we were using, for essentially a meet and greet and to chat about their future plans. I’ve enrolled into partner program for an app I’m relying on. It seems to be a common thing between companies. I’m sure Percona (for example) knows which companies use which databases/features internally, without any telemetry.
And if they want to stay secret instead of private, they’ll kill the telemetry too, so that’s no difference.
I do not understand why “Don’t spy on people without their consent” is such a hard thing for programmers to accept.
On the other hand, I don’t understand how collecting anonymous usage data that is trivial to opt out of is at all equivalent to spying or is harmful to anyone. I was hopeful when reading the original post that having an example of a well designed anonymous telemetry system would encourage other people to adopt that approach, but given it wasn’t treated any differently as non-anonymous telemetry by the community I don’t know why anyone would go through the effort.
There is no such thing as “anonymous data” when it’s paired with an IP address.
Even when it’s trivial to opt out, it’s usually extremely difficult to never use the software in a context where you haven’t set the opt-out flag or whatever. Opting out for one operation might be trivial, remaining opted out continuously across decades without messing up once is non-trivial.
Just. Don’t. Spy. On. People. Without. Consent.
I agree IP address is non anonymous, which is why this system doesn’t collect it. Most privacy laws also draw the line at collecting PII as where consent is required and I think that’s a reasonable place to draw the line.
Most software and websites I use has far more invasive telemetry than this proposal, and I think my net privacy would be higher taking an approach like Go proposed rather than the status quo, which is why I was excited about it being a positive example of responsible telemetry. Good for you if you can go decades without encountering any of the existing telemetry that’s out there.
How does the telemetry get sent to Google’s servers in a way which doesn’t involve giving Google the IP address?
I agree that website telemetry is also an issue. But this discussion is about Go. There is no good example of responsively spying on users without their consent.
You do have to trust Google won’t retain the IP addresses, but the Go module cache also involves exposing IP addresses to Google. I think the on by default but turn it off if you don’t trust Google is reasonable. I also trust that the pre-built binaries don’t contain backdoors or other bad code, but if you don’t want to trust that you can always compile the binaries from source.
Anyways, I’m not trying to change your mind just trying to explain why some people don’t consider anonymous telemetry that’s opt-out to be non-consensual spying.
guidance of both GDPR and CCPA is that an IP address is not considered PII until it is actively correlated / connected to an individual.
None of the counters that are proposed to be collected contain your name, email, phone number or anything else that could personally identify you.
IANAL, but collectioning data associated with an IP address (or some other unique identifier) definitely required consent under the GDPR.
An IP address or UUID is considered pseudonymous data:
https://gdpr-info.eu/art-4-gdpr/
Pseudonymous data is subject to the GDPR:
https://edps.europa.eu/press-publications/press-news/blog/pseudonymous-data-processing-personal-data-while-mitigating_en
That is some really creative copy pasting you did there. I am also not a lawyer but I don’t think it is super relevant for this proposal since they follow the first principle of data collection: “do not collect personal data”.
Imagine the discussion goes like this:
You: “Hello Google, I am a Go user and according to the GDPR I would like you to send me a dump of my personal data that was sent via the Go tooling telemetry. To which I OPTED-IN when it was released.”
Google: “That data is anonymized. It is not connected to any personal data. We have the data you submitted but we cannot connect it to individuals.”
You: “Here is my IP address, will that help?”
Google: “No, we do not process or store the IP address for this data. (But thank you! now we know your IP! Just kidding!)”
You: “Here is the UUID that was generated for my data, will that help?”
Google: Unfortunately we cannot verify that is actually your UUID for this telemetry. And thus we don’t know whether you are requesting data for yourself.”
..
You can find all this in the GDPR. At any rate, I wasn’t criticizing The Go proposal, only the statement:
But I see now that this is a bit ambiguous. I read it as analytics associated with IP addresses is not PII, which is not really relevant, since it is pseudonymization according to the GDPR and pseudonymous data is subject to the GDPR. But I think what you meant (which becomes clear from your example) was that in this case there is no issue, because even though Google may temporarily have your IP address (they have to if you contact their servers), but they are not storing the IP address with the analytics. I completely agree that the analytics data is then not subject to the GDPR. (Still IANAL.)
For programmers, or the rest of the business?
Excellent! Telemetry is a slippery slope, especially when the world’s largest ad company is involved. There are lots of upsides to misusing telemetry data, and just because it omits personal info now doesn’t mean it will forever. So I hope it is utterly useless and the people who “need” it lose out.
I also support distros that patch this crap out.
This is funny and it hints at bad faith when suggesting opt out. Because how does that functionally change the sampling unless you assume ignorance of the choice and presence of the surveillance?
Apple is the world’s largest ad company, and they include telemetry in their developer tools.
Which is a good argument against telemetry. Thank you.
Boy, there sure are a lot of people who haven’t read the proposal but want to write opinions about it.
It’s indeed sad that this thread is dominated by hot-take reactions to the word “telemetry” mostly bringing up points already addressed by two long, thoughtful, evenhanded TFAs. I expected better, lobsters.
read the proposal, and still don’t want google to open this can of worms. what’s your point?
specifically, I don’t like how google will now include machinery to automatically phone home. I don’t have a problem with the current plans, especially that it’s now opt-in, but I have a problem with the idea that it will be very very hard for future google to ignore that they can now siphon data from anyone running Go apps. today it’s “anonymous, random” stuff, tomorrow it’s…. ?
My point is just that there are a lot of people who clearly haven’t read the proposal and who are invoking things like “but GDPR!” and “oh no, they’ll be able to build usage patterns”. (Not just here in this discussion page.) They clearly haven’t looked at the proposed mechanisms and the great care they would take to keep identifiable and invasive information out of the reports. It’s not perfect—there are changes I would make—but it’s better than many of the accusations being leveled against it.
Some people are instead just objecting on the basis of “Google is trying to slurp up information”. (In fact, while Golang came from Google and is still moderately associated with it, this isn’t Google per se, although I understand that that perception is real.) And that might be an OK objection if they used correct information, but most aren’t. I think if you want to object on that basis, you have to make a reasonable comparison with the current state of things. The Go team could try to do a frog-boil and start with innocuous telemetry and later start sneaking in some more invasive stuff, but… let’s be honest, we’ve got a community of detail-oriented nerds with strong opinions here. I don’t think it would work! And maybe they’d push it through anyway, but they could already do that today.
Honestly, it’s just very unfortunate that Go is the project bringing this proposal forward. I think that if a Python linter proposed this, the discussion would be going very differently. (And then depending on how that turned out, the Go folks could decide to adopt the same thing.) Someone has to pioneer privacy-preserving telemetry so that we can then go to existing projects and say “hey, use this instead”. Anything Google-associated is kind of doomed, though.
What do you mean by “anyone running Go apps”? Are you implying that the telemetry would be inserted into programs compiled by the Go compiler?
They did not read the proposal.
The author is apparently still approaching PII with a 1900s mindset. An increasing number of jurisdictions are unwilling to accept “that’s just how the software works” or “that’s just how the network behaves” as excuses for leaking metadata. The correct takeaway uses modus tollens; if it’s not possible to collect this sort of data without breaking norms around handling PII, then the data ought not to be collected.
I’m not a privacy fundamentalist by any means. But the community reaction feels normal and right if I should go with my gut.
The word telemetry sounds like a gigantic euphemism to something that any reasonably person would just say “no, thank you”.
Have you read the actual design: https://research.swtch.com/telemetry-design or are you making assumptions based on the word ‘telemetry’ ?
The onus is on you to argue why the particular proposed telemetry is OK. In general, spying on users without their consent (which is what opt-out telemetry fundamentally is) should be considered not OK.
I disagree that it’s on me to argue that. Russ proposed something, and it is the responsibility of the people who claim that this proposal is bad to explain what faults they see.
As far as my own pov - It’s ok because it’s not collecting any personal information about users. I’m honestly finding it hard to understand what is the issue here. The proposal is carefully designed to only collect information about the go toolchain itself, no personal information is sent over the network (it’s not even possible to send arbitrary strings, just counter values), the collected data is publicly available and everything is open source. Finally, everyone who finds that not acceptable can easily opt-out.
I haven’t read the proposal, but I find the ‘no information about users’ bit interesting, as a compiler writer. The hardest thing about getting actionable bug reports for a compiler is that absolutely anything that the compiler accesses can be sensitive. Even the approximate shape of an AST can leak proprietary information. I find it hard to imagine something that doesn’t contain any personal / sensitive information, but which is actually useful to me.
I would encourage you to read the proposal then, it may surprise you.
A quick skim suggests that it is mostly not collecting the things that I’d care about as a compiler writer, but is full of side channels that leak more than I’d be happy with as a user. In particular, being able to correlate those counters with other data sources could leak information about proprietary (or simply unreleased) code and I am unable to provide an upper bound on the amount of information that is leaked. This is a common technique in privacy-violating technology: show users what you collect, don’t show them what it correlates with or what other datasources you can combine it with and it looks benign.
The opt-out version was supposed to upload counters once a year. Correlation is not a realistic threat here.
You’re the one arguing in favor of spying on users without their consent. Everyone’s default position should be that spying on people without their consent is not okay.
IP address + time of use is identifiable. Not many people other than me have a routine where they regularly move between the place I live and the place I work.
Can you please stop claiming that they propose something that can collect time of use? The proposal was to upload counters once a year.
In fact if you think about this, Ip address and time of use is seen also by git servers. I don’t see anyone claiming that git servers are spying on users, do you? In fact git servers are much worse since they see time of use when you push/pull, much more often than once a year counters upload.
They necessarily know what time it is when the telemetry calls home, right? And the IP address from which the call happens?
Git servers are accessed when I ask some tool to access the git server. This is telemetry that’s sent to Google without consent. That’s different.
And you seriously claim that you are worried that owners of transparent telemetry servers will learn that you use Go toolchain once a year?
You could argue that this is the same thing. You need to educate yourself to understand what and when is being sent to git server, which commands communicate over network and which are local. The same thing could be said about future Go toolchain - you would have to educate yourself when and what could be sent, and how to completely disable it.
I’m saying the tool shouldn’t spy on you without your consent.
There is a list of things Go team wants to collect once a year: https://research.swtch.com/telemetry-uses - can you explain to me which of those examples do you consider ‘spying’ ?
I have written a detailed proposal for gathering telemetry from your daily household routine. I think this is good and that the information I intend to collect should not be considered “spying”. I also claim that the information I will gather is only going to be used for good purposes. Sure, the company I represent may have a horrendous track record of over-gathering personal information and using it for not-good purposes, but I am a well-known programmer who wrote thoughtul words, and that means this time it is automatically OK.
Anyway. This will require me to plant sensors and transmitters in your house that can collect and send the information to me, but if you read the proposal you will find why I think this is OK.
I understand that some misguided people have a reaction to this and think that some sort of “privacy” should exist in their own home, but they simply don’t understand the potential of this proposal – I could significantly improve my business with this information! And besides, it is not my burden to convince them; the default position is and always must be that I can plant my telemetry devices in your house. The only way to change that is for you read every single word of the proposal three times over and then rebut it line-by-line in a way that convinces me you’ve actually read it thoroughly enough.
TFA explains the proposal and what it actually does
The whole idea of having a compiler calling home is a very bad one. I want a simple program that does what I expect it to do. Which is compiling a language into another. This stuff always opens a channel waiting to be exploited. We all have seen this countless of times. Starts with some harmful telemetry by some well intended people, then years pass, other people gets in their place and before we know it, user privacy went out of the window. To the point of risking the lives of people in oppressive regimes.
Just don’t create a problem where there isn’t one. I think the general opinion of not wanting any kind of telemetry is not that difficult to understand nor is it baseless.
That is the main difference between you and me. You seem to oppose the idea of any kind of telemetry regardless how it would be implemented. I on the other hand accept that there might be a design that preserves privacy. In fact I find transparent telemetry design good enough on privacy front and I’m willing to use a tool that implements it. Doubt we can find a compromise.
What I think is important is to be clear about your reasons for opposing it. In the github discussion thread I felt that a lot of people shared your point of view but instead of saying it clearly tried to ‘hide’ it behind some invalid technical arguments. I’m saying invalid because I suspect (and this lobster post confirms it) that a lot of complaining people didn’t really read that proposal. Which is a shame because it wastes everyone’s time.
The latest.
That was pretty much my point. It is an horrible idea from the start. Just don’t. I don’t want it, obviously many don’t want it either. At all.
But I want to care about opposing views. Perhaps I am wrong? Maybe I am missing something. I read the design now. Still no thanks.
It’s an open source programming language implementation. Leave it alone. Stop this “let’s improve the customer based on their feedback” non sense. It is a tool for engineers by engineers. Let technical merits prevail. Do a survey if you want to get a bit of insight of the user base, maybe.
Give me a technical tool that we both, its maker and the user, understand. Something I know is useful because I know why it is useful. It is not because of politics, customer service or any of that. I am talking about engineering, technology. Sure we can talk about the rest. Do I want to support a project because the author is a nice guy? Or perhaps it is ran buy people with good principles. Maybe. I don’t know, but let’s keep those things separated please.
Sorry but your reply is very generic and I’m finding it hard to follow it.
I see that you have now read the design doc. Can you point out specific points of that design that you find unacceptable (and why)?
You appear to be in need of a reminder that people are allowed to dislike the idea of on-by-default telemetry without needing to critique the specific technical mechanisms by which it is collected or the specific pieces of information collected. They also are allowed to default to being against it without being required to first comprehensively read and point-by-point rebut every proposal for telemetry that is put forward.
Please keep these things in mind for the future.
The hole in this argument is the counter-question of “why do you need to know?”
If you look at the vast swath of projects that have dedicated ports to particular architectures, operating systems etc., none of them require telemetry, particularly because the people that actually use these ports contribute to the larger project discourse.
Why does Google need invasive metrics to judge what needs to be maintained? Can this not be inferred from literally any other voluntary source of information? Issue trackers? Pull requests? Why do you need invasive telemetry, when you can judge this from the discourse around your projects?
The fighter jet argument is also very weak: the justification that “we can make sure that fighter jets are still running!” is being used.. for telemetry? Do you really think anybody will leave Go’s telemetry enabled on anything related to these projects? These are walled-off areas of development that won’t (and shouldn’t) send anything anywhere.
The arguments for this are veering into a direction that hasn’t been backed up by simply harvesting existing data. Whoever’s making the decisions around ports, please, listen to the people that use them and contribute to the discourse. Linux handles this just fine.
I trained as an engineer, so one of my main biases is the ethics of what I build and expose people to. When solving problems, I justify whether it is right or wrong. As an ethical standpoint, I claim that we should try to minimize data collected from customers and gain explicit consent when collecting it.
It is interesting that you quote Dr. House, because they have a complex understanding of medical ethics. According to ethical principles, patients must be informed of all the possible harms of a procedure and they have the right to say no to any treatment that they think is too risky. Note the phrase they think. I claim that not asking for explicit consent is not ethical.
If the team maintaining Go claims to support a platform, they must own automated builds and testing like e.g. Rust Tier 1 targets. The fact that customers had to report issues to the Go team for a “supported” target does not speak to telemetry, it speaks to better automated testing. In this case running
otool
on Mac for all builds would work.Ask for my explicit consent 1) nicely, 2) using an evidence-based argument, 3) in a way that I can become a cheerleader to others including my company. Otherwise, I’ll make sure all my Dockerfiles and build scripts patch out all telemetry and never support your project.
As a trained engineer you probably also know that that there is a huge difference between people using tools in “the wild” vs in a static lab setup.
I thought the opt out telemetry sounded reasonable to me, but this also strikes me as a good take. As a thought experiment, I can ask myself what if the proposal had come from the Chrome team (lol, too late, they spy on everything) instead of the Go team, and I think I would be much less positively inclined. I trust the Go team. I don’t trust “Google”. I extremely distrust the Blink team. But for someone who doesn’t interact with the Go team, they’re all just “Google”.
I think I would be bitter that Russ went back on opt out telemetry, except I actually do think the mandatory opt in during install will get a fair number of users, and so it won’t be a complete wash.
I don’t see how any telemetry transmitted via the internet that is opt-out is not a direct violation of the GDPR. The IP address that is transmitted with it (in the IP packets) is protected information that you don’t have consent to collect - you failed at step 0 and broke the law before you even received the bits you actually care about.
Of course, the GDPR seems to be going routinely unenforced except against the largest and most blatant violations, but I really don’t see why a company like google would risk it. Why other large companies are actively risking it.
My understanding of the GDPR was that IP addresses are not automatically PII. Even in situations where they are, simply receiving a connection from an IP address does not incur any responsibilities because you require the IP for technical reasons to maintain the connection. It’s only when you record the IP address that it may hit issues. You can generally use some fairly simple differential privacy features to manage this (e.g. drop one of the bytes from your log).
The EU has ruled that IP addresses are GDPR::PII, sadly.
There’s nothing sad about it. I bet that you think that your home address, ICBM coordinates, etc. are PII too.
Do you have a link to that ruling, I’d be very interested in reading it.
(emphasis mine. via the GDPR text, Regulation (EU) 2016/679)
fwiw- “PII” is a US-centric term that isn’t used within GDPR, which instead regulates “processing personal data”.
This doesn’t actually say that collecting IP addresses is not allowed. It only states that when the natural person is known, online identifiers could be used to create profiles.
Furthermore this is only relevant if those online identifiers are actually processed and stored. According to the Google proposal they are not. They only keep record of the anonymous counters. Which is 100% fine with GDPR.
(IANAL) I’d seen analytics software like Fathom and GoatCounter rely on (as you mention) anonymised counters to avoid creating profiles on natural persons, but also we’ve seen a court frown upon automatic usage of Google Fonts due to automatic transmission of IP addresses to servers in the US.
It’s a shame the go compiler isn’t well positioned UX-wise to ask users for opt-in consent at installation (as an IDE might) since that’d likely solve privacy concerns while reaching folk that don’t know about an opt-in config flag.
[admittedly, Google already receives IP addresses of Go users through https://proxy.golang.org/ anyway (which does log IP addresses, but “for [no] more than 30 days”) ¯\_(ツ)_/¯]
Yes IP addresses are not automatically PII, but if you can’t enforce they are not you must assume they are. The telemetry data itself is probably not PII, because it’s anonymized.
GDPR prohibits processing[0] of (private) data, but contains some exceptions. The most common used one is to full fill a contract (this doesn’t need to be a written down contract with payment). So assume you have an online shop. A user orders i.e. a printer you need his address to send the printer to him. But when the user orders a ebook you don’t need the address because you don’t need to ship the ebook. In the case of go the service would be compiling go code. I don’t see a technical requirement to send google your IP-Address.
Next common exception is some requirement by other law (i.e. tax-law or money laundering protection law). I think there is none.
Next one is user consents: You know these annoying cookie banner. Consents must be explicit and can’t be assumed (and dark pattern are prohibit). So this requires an opt-in.
Next one would be legitimate interest. This is more or less the log file exception. Here you might argue that the go team needs this data to improve there compiler. I don’t think this would stand, because other compiler work pretty well without telemetry.
So all together I[1] would say the only legal way to collect the telemetry data is some sort of user consent.
[0] Yes processing not only storing, so having a web server answering http requests might also falls under GDPR.
[1] I’m not a lawyer
You are wrong. The GDPR is not some magic checkbox that says “do not ever send telemetry”. The GDPR cares about PII and your IP address and a bunch of anonymous counters are simply not PII. There is nothing to enforce in this case.
If something is permitted by the law, it doesn’t automatically mean it’s also good
It’s a good thing that nobody’s arguing that, then.
Hah, you’re right, I must have mixed up two comments. Glad we all agree then :)
The only reason this argument seems complicated is that the technicians involved insist on understanding the argument in terms of its technical merits. What the “my biases” section is missing is the acknowledgement of a meta-bias in favor of there being a technical solution to a problem like this. If there were one, then sure, Go’s approach to transparent telemetry would be a good candidate. There isn’t, though, because the instigating factors in this dispute lie outside the technical domain. From my perspective, the situation is pretty simple: you’ve got a powerful organization whose incentives re privacy are dramatically at odds with the incentives of individuals, and the tradeoff to your language ecosystem receiving abundant investment from that organization is that a vast swath of the userbase is going to understandably object to telemetry no matter the virtues of the manner in which it’s collected. Corporate investment, tooling improvement via telemetry collection, user trust: pick at most two. There’s simply no technical solution that will allow the choice of all three, and that’s just how it is in our current landscape. The corporate investment choice was made a long time ago, so the only gracious thing to do is acknowledge the tension between the remaining two options and choose one. The Go team, then, behaved graciously here. I don’t see any mistake being made.
You know, I like Nielsen ratings because I only fill the paper out if I feel like it, and they send me a dollar.
Say these analytics are “worth” something: can we put a number on it? Can I get two microdollars for all my kilobytes of trouble? I mean, what’s it worth to them, I wonder?
This is the same as no analytics. Like all micropayment ideas, it requires a working micropayments system to already be in place and used by everybody, without friction or middle-men taking all of your money.
Challenge for those opposed to this proposal: name a single real negative consequence that could have come from it.
“It violates my rules” is not a consequence.
A feature I rely upon is removed because the majority of users don’t use it.
IMO that can be good for the world, at some personal cost. Depends on the feature though.
Are there examples of widely used compilers with telemetry (like llvm, or Clojure, etc.)?
Xcode does. Visual Studio (code & regular) do. All the IntelliJ suite do. The Java JDK has telemetry but I think it’s opt-in.
It’s a very normal feature that’s really useful for people who work on tools to know that they’re investing in the right places.
The difference is that Go proposed something transparent rather than obscured and at the whims of large companies.
But we can’t have nice things.
It isn’t. You just gave a few examples, which means that most compilers and most interpreters, such as Python, GCC, LLVM/Clang, Perl, PHP, Ruby, Tcl, D, SBCL, CLisp, and so on, do no such thing, and feel no need to. Trying to normalize it is creepy, and trying to do so by merely stating that it’s normal is really something else.
It is a normal thing for proprietary software. I think that is one of the driving factors making this controversial: Golang is ostensibly an open-source platform, but that brings expectations that are sometimes at odds with its historical origin as a Google initiative.
The informal, soft power companies can have over open-source technologies that people depend on creates resentment.
Yeah, I read those last few comments, and which compilers had telemetry, and I think you’ve hit the nail on the head. Go-with-telemetry has to be considered a proprietary platform in a way that go-without-telemetry doesn’t.
Careful how you use ‘proprietary’ here, I’m sure some pedant somewhere would point out that the license is still OSI. However, governance has always been a blind spot of open licensing, and that is where this issue falls.
All of those, with the exception of the JDK, are IDEs. They are not compilers.
It’s somewhat defensible to have telemetry in an IDE, and as far as I’m aware, IntelliJ and Visual Studio both asked me before collecting it.
The reasons they give for wanting telemetry in the Go compiler – the public reasons, notwithstanding any private reasons that we don’t know – are weak at best, and just serve to reinforce the reasons I dislike Go at worst.
For example, tracking what platforms people compile for. Why not just let the community maintain ports? It amazes me that LLVM can manage to have community-built and driven ports to the M68K platform, despite LLVM being a significantly more complex codebase than Go. Yet, Go won’t even let users of Power support ISAs lower than Power8. Even when the community gave them PRs and offered CI, they refused it! Large commercial customers using Go on Power7/AIX were even told to pound sand, let alone those of us trying to run Linux workloads on older Power hardware.
I don’t know what Go compiler authors want telemetry for, but as a compiler maintainer I would certainly be interested in some telemetry-style data to see the wrong code that people write, and be able to use that to improve error messages and/or think about how to avoid common mistakes. It is easy to find valid code in my language over the internet, but people almost never commit code that does not compile. All the intermediate states that are ill-parsed or ill-typed, but people wrote because it felt natural, this is what I would love to have access to. Of course this could be opt-in, and a good rule of thumb would be to only collect this for projects that are already publicly available – to make sure that there are as few privacy concerns as possible.
I thought of a design once: have the compiler create a git repository somewhere on user machines (one git repository per project in the language), and then commit incorrect/invalid files there on each failed compile. Once in a while, show users a message saying: “hey, would you like to send your git repo to for us to look at your errors and improve the compiler?”. (In particular, users decide to send their data over the network, and it is in a format that they can easily inspect to make sure they are okay with its content.)
And this is a reason people run screaming away from telemetry, even if it’s arguably well-intentioned: If my compiler is sending my code to some other entity, that code can be used against me in a court of law. Am I writing encryption code? Am I writing code some AI flags as detrimental to some DRM scheme? It’s impossible to tell what could happen, potentially, and some of the scenarios are so horrible they outweigh the potential good.
I brought this up in the GitHub discussion and here, but got shouted down and silenced quite effectively.
Which is a bit suspicious.
So do I, but I’d hate to actually be responsible for processing it. People accidentally paste a load of sensitive things into code, or write almost-correct commercially sensitive code all the time. The only way I’d be happy collecting this would be to have a local service trying to extract representative samples and then an explicit manual step for users to approve uploading them.
Yes, see the design I sketched above with local git repositories:
Unfortunately your idea of “representative samples” sounds in fact very very hard to do right. In general I don’t know what I’m looking for in this data yet, my queries may change over time, and I don’t know how to summarize it in a way that remains useful. There has been work on automatically minimizing buggy code, and we could consider doing it, but minimizing is compute-intensive (so would users really want that?). I think that for parsing errors, one could design specific summary formats that would make sense. For typing errors it is much harder in general (but doable, keyword is “type error slicing”), and understanding a typing error usually benefit from being able to build the affected parts of the project and also understanding the code, minimization could easily prevent that. And for “what kind of bugs do people introduce in their code that passes the type-checker but is caught by the testsuite”, automatically minimizing that in a way that does not hamper our ability to analyze errors and turn them into language/tooling design feedback, well, that sounds like a research project of its own (starting on the existing work on program slicing, but in a way that should preserve useful context for code comprehension).
And I think that for the people worried that their code could contain something very problematic, minimization/summarization is not necessarily going to reassure them. They will disable any feeedback process in any case. Their choice! But so maybe working hard on summarization if the intent is to reassure people is not worth it. I think it is just easier to work with the other sort of people that, like, “would enjoy streaming their coding session anyway but they never bothered to set it up”.
Read the proposal.
I wonder if it would make sense to generate an “errors digest” or something, as build output, which can be optionally (but encouraged) to be committed directly to source, like a lockfile.
It’d have to be trivially merged by git, but the compiler itself could provide a merge tool and some registry somewhere where projects could opt-in, probably associated with a package manager.
Then full opt-out is “gitignore”,
the default is “people working on this project can use that telemetry”, since the tools are built in
and opt-in is “register my project with the telemetry scraper, which pulls from git”
I guess this doesn’t handle the “private code” argument, but it would allow for some of what you’re looking for I think, on a not per-user, but per-project basis, which I think helps the PII argument.
Yes, committing errors into the project proper is a one way to go about it, if we can make the UI unobtrusive enough.
(This does not handle “private code” but I think that it is realistic to assume that, fundamentally, if people are not willing to make their code public, they probably don’t want to export telemetry information about it either unless you force them to. Luckily there are many programmers willing to write public code.)
.Net is mentioned in one of the articles.
the flutter SDK comes to mind, also a google project.
It’s worth noting: Debian has had opt-in telemetry for nearly 20 years–without (AFAIK) complaints. This system runs afoul of the same “IP address” issues discussed here since, connecting to a server reveals your IP address.
I don’t understand how any of this is the problem for the user of Go. If the tool gets worse due to lack of “telemetry data” then user just finds a new tool.
Personally I don’t really care about the telemetry part, I care more about where the data goes to. In this case it seems to go to Google or Google employees. I do not trust Google to handle my data in any kind of manner. And if my data is valuable, then I hope they’re willing to pay for it.
Fuck.optout.telemetry. I’m surprised Google allowed optin.
and so is opt-out.
Would it be possible to use a privacy-preserving telemetry system like DivviUp?
DivviUp is already discussed by Russ: https://research.swtch.com/telemetry-opt-in