This site is claiming to offer a “standard for opting out of telemetry”, but that is something we we already have: Unless I actively opt into telemetry, I have opted out. If I run your software and it reports on my behavior to you without my explicit consent, your software is spyware.
but that is something we we already have: Unless I actively opt into telemetry, I have opted out.
I know this comes up a lot, but I disagree with that stance. The vast majority of people leaves things on their defaults. The quality of information you get from opt-in telemetry is so much worse than from telemetry by default that it’s almost not worth it.
The only way I could see “opt-in” telemetry actually work is caching values locally for a while and then be so obnoxiously annoying about “voluntarily” sending the data that people will do it just to shut the program up about it.
That comment acts like you deserve to have the data somehow? Why should you get telemetry data from all the people that don’t care about actively giving it to you?
That comment acts like you deserve to have the data somehow?
I’ve got idiosyncratic views on what “deserving” is supposed to mean, but I’ll refrain from going into philosophy here.
Why should you get telemetry data from all the people that don’t care about actively giving it to you?
Because the data is better and more accurate. Better and more accurate data can be used to improve the program—which is something everyone will eventually benefit from. But if you skew the data towards the kinds of people who opt into telemetry.
Without any telemetry, you’ll instead either (a) get the developers’ gut instinct (which may fail to reflect real-world usage), or (b) the minority that opens bug tickets dictate the UI improvements instead, possibly mixed with (a). Just as hardly anyone (in the large scale of things) bothers with opting into telemetry, hardly anyone bothers opening bug tickets. Neither group may be representative of the silent majority that just wants to get things done.
Consider the following example for illustration of what I mean (it is a deliberate oversimplification, debate my points above, not the illustration):
Assume you have a command-line program that has 500 users. Assume you have telemetry. You see that a significant percentage of invocations involve the subcommand check, but no such command exists; most such invocations are immediately followed by the correct info command. Therefore, you decide to add an alias. Curiously, nobody has told you about this yet. However, once the alias is there, everyone is happier and more productive.
Had you not had telemetry, you would not have found out (or at least not found out as quickly, only when someone got disgruntled enough to open an issue). The “quirk” in the interface may have scared off potential users to alternatives, not actually giving your program a fair shot because of it.
Bob really wants a new feature in a software he uses. Bob suggests it to developers, but they don’t care. As far as they can tell, Bob is the only one wanting it. Bob analyzes the telemetry-related communication and writes a simple script that imitates it.
Developers are concerned about privacy of their users and don’t store IP addresses (it’s less than useless to hash it), only making it easier for Bob to trick them. What appears as a slow growth of active users, and a common need for a certain feature, is really just Bob’s little fraud.
It’s possible to make this harder, but it takes effort. It takes extra effort to respect users’ privacy. Is developing a system to spy on the users really more worthy than developing the product itself?
You also (sort of) argued that opt-in telemetry is biased. That’s not exactly right, because telemetry is always biased. There are users with no Internet access, or at least an irregular one. And no, we don’t have to be talking about developing countries here. How do you know majority of your users aren’t medical professionals or lawyers whose computers are not connected to the Internet for security reasons? I suspect it might be more common than we think. Then on the other hand, there are users with multiple devices. What can appear as n different users can really just be one.
It sort of depends on you general philosophical view. You don’t have to develop a software for free, and if you do, it’s up to you to decide the terms and conditions and the level of participation you expect from your users. But if we talk about a free software, I think that telemetry, if any, should be completely voluntary on a per-request basis, with a detailed listing of all information that’s to be sent in both human- and machine- readable form (maybe compared to average), and either smart enough to prevent fraudulent behavior, or treated with a strong caution, because it may as well be just an utter garbage. Statistically speaking, it’s probably the case anyway.
I’m well aware that standing behind a big project, such as Firefox, is a huge responsibility and it would be really silly to advice developers to rather trust their guts instead of trying to collect at least some data. That’s why I also suggested how I imagine a decent telemetry. I believe users would be more than willing to participate if they saw, for example, that they used a certain feature above-average number of times, and that their vote could stop it from being removed. It’s also possible to secure per-request telemetry with a captcha (or something like that) to make it slightly more robust. If this came up once in a few months, “hey, dear users, we want to ask”, hardly anyone would complain. That’s how some software does it, after all.
The fraud thing is an interesting theory, but I am unaware how likely it is; you’ve theorised a Bob who can generate fraudulent analytics but couldn’t fake an IP address or use multiple real IP addresses or implement the feature he actually wants.
It’s not that he couldn’t do it, it’s just much simpler without that. It’s really about the cost. It’s easy to curl, it’s more time consuming or expensive to use proxies, and even more so to solve captchas (or any other puzzles). The lower the cost, the higher the potential inaccuracy. And similarly, with higher cost, even legitimate users might be less willing to participate.
I don’t have some universal solution or anything. It’s just something to consider. Sometimes it might be reasonable to put effort into making a robust telemetric system, sometimes none at all would be preferred. I’m trying to think of a case “in between”, but don’t see a single situation where jokingly-easy-to-fake results could be any good.
Telemetry benefits companies, otherwise companies wouldn’t use it. Perhaps it can benefit users, if the product is improved as a result of telemetry. But it also harms users by compromising their privacy.
The question is whether the benefits to users outweigh the costs.
Opt-out telemetry-using companies obviously aren’t concerned about the costs to users, compared to the benefits they (the companies) glean from telemetry-by-default. They are placing their own interests first, ahead of their users. That’s why they resort to dark patterns like opt-out.
You assume that we actually need telemetry to develop good software. I’m not so sure. We developed good software for decades without telemetry; why do we need it now?
When I hear the word “telemetry”, I’m reminded of an article by Joel Spolsky where he compared Sun’s attempts at developing a GUI toolkit for Java (as of 2002) to Star Trek aliens watching humans through a telescope. The article is long-winded, but search for “telescope” to find the relevant passage. It’s no coincidence that telemetry and telescope share the same prefix. With telemetry, we’re measuring our users’ behavior from a distance. There’s not a lot of signal there, and probably a lot of noise.
It helps if we can develop UsWare, not ThemWare. And I think this is why it’s important for software development teams to be diverse in every way. If our teams have people from diverse backgrounds, with diverse abilities and perspectives, then we don’t need telemetry to understand the mysterious behaviors of those mysterious people out there.
(Disclaimer: I work at Microsoft on the Windows team, and we do collect telemetry on a de-facto opt-out basis, but I’m posting my own opinion here.)
we don’t need telemetry to understand the mysterious behaviors of those mysterious people out there
Telemetry usually is not about people’s behaviors, it’s about the mysterious environments the software runs in, the weird configurations and hardware combinations and outdated machines and so on.
One concrete benefit of telemetry: “How many people are using this deprecated feature? Should we delete it in this version or leave it in a while longer?”
We developed good software for decades without telemetry; why do we need it now?
Decades-old software is carrying decades-old cruft that we could probably delete, but we just don’t know for sure. And we all pay the complexity costs one paper cut at a time.
I’m as opposed to surveillance as anybody else in this forum. But there’s a steelman question here.
The quality of information you get from opt-in telemetry is so much worse than from telemetry by default that it’s almost not worth it.
A social scientist could likewise say: “The quality of information you get from observing humans in a lab is so much worse than when you plant video cameras in their home without them knowing.”
The data from a hidden camera is not anonymizable. Telemetry, if done correctly (anonymization of data as much as possible, no persistent identifiers, transparency as to what data is and has been sent in the past), cannot be linked to a natural person or an indvidual handle. Therefore, I see no harm to the individual caused by telemetry implemented in accordance with best data protection practices.
Furthermore, the data from the hidden camera cannot cause corrective action. The scientist can publish a paper, maybe it’ll even have revolutionary insight, but can take no direct action. The net benefit is therefore slower to be achieved and very commonly much less than the immediate, corrective action that a software developer can take for their own software.
Finally, it is (currently?) unreasonable to expect a hidden camera in your own home, but there is an increased amount of awareness of the public that telemetry exists and settings should be inspected if this poses a problem. People who do care to opt out will try to find out how to opt out.
Finally, it is (currently?) unreasonable to expect a hidden camera in your own home, but there is an increased amount of awareness of the public that telemetry exists and settings should be inspected if this poses a problem. People who do care to opt out will try to find out how to opt out.
I think this is rather deceptive. Basically it’s saying: “we know people would object to this, but if we slowly and covertly add it everywhere we can eventually say that we’re doing it because everyone is doing it and you’ve just got to deal with it”.
You seem to miss a very easy, obvious, opt-in only strategy that worked for the longest time without feeling like your software was that creepy uncle in the corner undressing everyone.
As you pointed out everyone keeps the defaults, you know what else most normies do? Click next until they can start their software. So you add a dialog in that first run dialog that is supposed to be there to help the users and it has a simple “Hey we use telemetry to improve our software (here is where you can see your data)[https://yoursoftware.com/data] and our (privacy policy)[https://yoursoftware.com/privacy]. By checking this box you agree to telemetry and data collection as outlined in our (data collection policy)[https://yoursoftware.com/data_collection] [X]”
and boom you satisfy both conditions, the one where people don’t go out of their way to opt into data collection and the other where you’re not the creepy uncle in the corner undressing everyone.
This is a bad comment, because it doesn’t add anything except for “I think non-consensual tracking is bad”, and is only tangentially related to OP insofar as OP is used as a soapbox for the above sentiment. Therefor I have flagged the comment as “Me-too”, regardless however much I may agree with it.
Except that in the European Union, the GDPR requires opt-in in most cases. IANAL, but I think it applies to the analytics that Homebrew collects as well. From the Homebrew website:
A Homebrew analytics user ID, e.g. 1BAB65CC-FE7F-4D8C-AB45-B7DB5A6BA9CB. This is generated by uuidgen and stored in the repository-specific Git configuration variable homebrew.analyticsuuid within $(brew –repository)/.git/config.
The data subjects are identifiable if they can be directly or indirectly identified, especially by reference to an identifier such as a name, an identification number, location data, an online identifier or one of several special characteristics, which expresses the physical, physiological, genetic, mental, commercial, cultural or social identity of these natural persons.
I am pretty sure that this UUID falls under identification number or online identifier. Personally identifyable information may not be collected without consent:
Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement.
So, I am pretty sure that Homebrew is violating the GDPR and EU citizens can file a complaint. They can collect the data, but then they should have an explicit step during the installation and the default should (e.g. user hits RETURN) be to disable analytics.
The other interesting implication is that (if this is indeed collection of personal information under the GDPR) is that any user can ask Homebrew which data they collected and/or to remove the data. To which they should comply.
This Regulation lays down rules relating to the protection of natural persons with regard to the processing of personal data and rules relating to the free movement of personal data.
GDPR article 4(1) defines personal data (emphasis mine):
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
Thus it does not apply to data about people that are netiher identified nor identifiable. An opaque identifier like 1BAB65CC-FE7F-4D8C-AB45-B7DB5A6BA9CB is not per se identifiable, but as per recital 26, determining whether a person is identifiable should take into account all means reasonably likely to be used, such as singling out, suggesting that “identifiable” in article 4(1) needs to be interpreted in a very practical sense. Recitals are not technically legally binding, but are commonly referred to for interpretation of the main text.
Additionally, if IP addresses are stored along with the identifier (e.g. in logs), it’s game over in any case; even before GDPR, IP addresses (including dynamically assigned ones) were ruled by the ECJ to be personal data in Breyer v. Germany (ECLI:EU:C:2016:779 case no. C-582/14).
Sorry for the short answer in my other comment. I was on my phone.
Thus it does not apply to data about people that are netiher identified nor identifiable. An opaque identifier like 1BAB65CC-FE7F-4D8C-AB45-B7DB5A6BA9CB is not per se identifiable,
It seems to me that an UUID is similar to cookie ID or advertising identifier. Using the identifier, it would also be trivially possible to link data. They use Google Analytics. Google could in principle cross-reference some application installs with Google searches and time frames. Based on the UUID they could then see all other applications that you have installed. Of course, Google does not do this, but this thought experimentat shows that such identifiers are not really anonymous (as pointed out in the working party opinion of 2014, linked on the EC page above).
Again, IANAL, but it would probably be ok to reporting installs without any identifier linking the installations. They could also easily do this, make it opt-in, report all people who didn’t opt in using a single identifier, generate a random identifier for people who opt-in.
They locked the PR talking about it and accused me of implying a legal threat for bringing it up. The maintainer who locked the thread seems really defensive about analytics.
I, too, thought that your pointing out their EU-illegal activity was distinct from a legal threat (presumably you are not a prosecutor), and that they were super lame for both mischaracterizing your statement and freaking out like that.
Now I really wish I had an ECJ decision to cite because at this point it’s an issue of interpretation. What is an advertising identifier in the sense that the EC understood it when they wrote that page—Is it persistent and can it be correlated with some other data to identify a person? Did they take into account web server logs when noting down the cookie ID?
Interesting legal questions, but unfortunately nothing I have a clear answer to.
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
The data subjects are identifiable if they can be directly or indirectly identified, especially by reference to an identifier such as a name, an identification number, location data, an online identifier or one of several special characteristics, which expresses the physical, physiological, genetic, mental, commercial, cultural or social identity of these natural persons.
Please ^F this entire string in the GDPR. I fail to find it as-is. They only start matching up in the latter half starting at “an identifier” and ending with “social identity”.
(1) ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
I agree it’s pedantic of me, but it’s not a 1:1 quote from the GDPR if a sentence is modified, no matter how small.
I’ve edited in the second half in any case though. I do not, however, see any way that modification would invalidate any of the points I’ve made there, however.
But the above user didn’t post that did they? Your comment was meaningful and useful, but theirs was just sentimental. A law violation is a law violation, but OP just posted their own feelings about what they think is spyware and didn’t say anything about GDPR.
hmm I disagree, the OP is claiming that we should have a unified standard for “Do_Not_Track”. Finn is arguing that we shouldn’t need such a standard because unless I specifically state that I would like to be tracked, I should not be tracked and that any attempts to track is a violation of consent. Finn here is specifically disagreeing with the website in question. Should we organize against attempts to track without explicit consent, or give a unified way to opt out. These are fundamentally different questions and are actually directly related. If I say everyone should be allowed into any yard unless they have a private property sign, that may cause real concern for people who feel that any yard shouldn’t permit trespassing unless they have explicit permission. They are different concerns, that are related, and are more nuanced than “thing is bad”.
Okay. By your (non-accepted) definition, spyware abounds and is in common use.
Simply calling it “spyware” and throwing up your hands doesn’t work. They have knobs to turn the spying off, to opt-out. I just want all those knobs to have the same label.
We should also standardize a DNT for face scanners, like an invisible ink tattoo on your forehead. Then we can say “at least we did what we could” when bad actors do not respect it.
Do Not Track is not a browser setting. It’s a server setting, sent by a browser. The servers are free to ignore it
This is an initiative for local software, like Homebrew and Gatsby. They do not use the network as a matter of course. It’s a horse of a different color.
Syncthing doesn’t even seem to “track” anything, it just checks if there is a newer version. How is this “tracking”? It’s just a completely different category and painting everything with this mile-wide brush as “tracking” is not constructive.
Keep reading the list. Syncthing doesn’t do ad tracking, but it does absolutely do a few of the other things that DO_NOT_TRACK is designed to indicate a lack of consent for.
It’s absolutely able to log my IPs as a user of the software because it phones home. I don’t want it to do that.
I read it. You made a PR to disable update checking.
It’s absolutely able to log my IPs
So is Lobsters, or any other site you visit, your email host (or if you selfhost: people you send emails to), anyone on IRC, peers on bittorrent, any git push/pull, etc. etc. etc.
You can object to that, and that’s all fine, but this is just not “tracking” as understood by almost anyone else and equating it to that is just spreading FUD.
I agree that a distinction should be made, but I’d offer that a further distinction is necessary between software that communicates with a second-party and software that communicates with a third-party.
We have an interesting example in syncthing in that it has an explicit design goal to enable peer-only syncing. Assuming someone goes to the trouble of configuring their own relay & discovery servers, it seems odd they’d further have to opt out of communicating with a third-party, as there isn’t even a second-party.
Lobste.rs is a website, and it’s impossible to use the website without exchanging packets with it. That’s essential for functionality, and not avoidable (unless you use tor, which is somewhat analogous to setting DO_NOT_TRACK=1).
Homebrew and Gatsby are local applications. Syncthing can be configured to use lan discovery only and talk only between my computers. There is no need for it to phone home, unlike a website.
Checking if there’s a newer version is real functionality. And you haven’t really explained why it’s such a terrible thing that Syncthing can (in theory) log IP addresses , especially when a zillion other things can already do so.
The zillion other things that do so I choose to opt-in to, when I visit their websites. I don’t consent to local software phoning home unnecessarily simply because I used it on my local machine or LAN.
It doesn’t need to be terrible for me to not want it. I don’t actually need a reason for wanting to preserve my privacy. The application taking that data and sending it away requires consent; consent which I do not give.
Why I do not consent is irrelevant, and personal, and may itself be private.
This is a non-discussion if you just want to assert tautologies. I can choose to not consent to programs creating files without my permission and create do-not-make-files.com, but if I then refuse to explain any reasoning behind it at the slightest hint of critical questioning, I’m just being unconstructive and silly.
100% complete privacy and consent is unworkable and unrealistic in all but the simplest examples. You’re the one making claims, so you’re the one who has to justify them. If you don’t want to do that: fine. But don’t be surprised if people are going to roll their eyes and dismiss whatever you’re saying.
I have no idea why you feel the need to come back to this after 3 months, especially considering your reply adds nothing to any discussion, but let me be clear: calling people’s hobby projects “malware” over a disagreement about whether or not it should check for updates is deeply toxic. This is the sort of behaviour that quite literally makes people leave Open Source development.
I think this proposal raises a good pont, but has two major flaws. First, it does not distinguish between anonymous telemetry and user tracking. Telemetry, if done right, helps developers in improving their software (just think about Debian’s ages-old popcon). Real (even pseudonymous) user tracking is a whole different beast.
Second, anything collecting even pseudonymous user data must be opt-in as per GDPR. This proposal gets it the wrong way round. There should be the inverse of an environment variable that indicates consent to tracking. Since people still install popcon, I don’t think this effectively means that nobody will ever opt-in.
I make a counter-proposal. How about this: An environment variable DATA_COLLECTION_CONSENT, with these possible values: no (default assumed if missing), yes (consent to everything, including pseudonymous data collection), anonymous (allow data transmission that cannot, ever, be mapped to a user, such as telemetry or Debian’s popcon).
Also, please, do not file such proposals against individual projects as PRs. This topic needs wider attention and should be under the umbrella of freedesktop.org’s guidelines.
Edit: This comment got me thinking. I leave this here for reference, but after reading said comment I’m not so sure anymore if telemetry is really something developers should have access to.
Telemetry, if done right, helps developers in improving their software (just think about Debian’s ages-old popcon).
Popcon was opt-in, if I recall correctly. I have no issue with off-by-default tracking. I just want all of the apps that force me to manually opt-out to have the lever to do so labelled consistently.
I’d really like a standard like this. Personally I don’t care about telemetry data these applications collect but a general flag that allows those who really care to opt out without worrying about every program they use also helps to empower apps which use telemetry to improve in giving a consistent interface signalling non-consent. However, the author has seemingly gone about this poorly in coordination around popular applications that collect telemetry. From the linked PR’s:
We would rather use our own variable for now (at least until this is much more widely adopted).
…
Hi!
Thank you for your contribution!
We are happy to add this, however I have couple of questions/comments first:
Is there a description of this standard anywhere?
Are there any other projects that have adopted this?
Could you link to a standard description in a comment above the condition?
Thank you!
So you made the website, posted the PR’s, and then dropped your page on link aggregators, without actually discussing this with the maintainers of the aforementioned software? Perhaps a better route would’ve been first publicly initiating discourse surrounding such a flag, coordinating the initiation of such discourse with the support of maintainers who collect telemetry, and providing a timeline and place for such discourse, before just throwing PR’s at the wall as though the thing you want on your website is instantly a standard.
Perhaps a better route would’ve been first publicly initiating discourse surrounding such a flag, coordinating the initiation of such discourse with the support of maintainers who collect telemetry, and providing a timeline and place for such discourse, before just throwing PR’s at the wall as though the thing you want on your website is instantly a standard.
I’m not sure how what you proposed is different from what I did. Publicly initiating discourse: I made a website and posted it to other websites with comments sections. Coordinating the initiation of such discourse: I created patches and PRs, on a website that has a comments section.
first publicly initiating discourse surrounding such a flag
You did not do this. You instead declared your standard and posted PR’s, with discourse instead not on how to establish such a standard, but just about the standard you yourself decided on.
coordinating the initiation of such discourse with the support of maintainers who collect telemetry
You did not do this. You posted a PR after you initiated the discourse with an already cemented “standard”.
providing a timeline and place for such discourse
You neither provided a timeline for developing a standard nor did you provide a place to develop a discussion about such a standard, because you decided on the “standard” before posting it, without including anyone else. That’s not a standard, that’s an opinion.
I’ve avoided Homebrew like the plague exactly because of this, and their stance on this. I’ve switched to Macports since, and I haven’t missed any packages there, but the day that I do I will try to add it as a port there.
I think that a Do Not Track standard is doomed to fail because (1) it failed on the web and (2) the linked PRs don’t seem to keen on implementing this.
In the console I can still vote with my feet, and I do. On the web it’s a different story..
The way the maintainers respond to your PRs is disappointing to say the least. Pretending not to know what tracking implies, or “it’s not garnered enough adoption for us to consider it”-type thinking…
Another similar standard is NO_COLOR by @jcs of lobste.rs fame. I’m immensely sympathetic to the idea (I am opposed to both underhanded tracking and pointless fruit salad in my terminal), but the depressing fact is that most people don’t seem to care.
It’d be really nice to compile a few firewall sandbox profile scripts for various platforms which either block or ask for confirmation when a script process or any child hits the network, and then have a dotfile line which listed programs to automatically enforce those sandboxes over, perhaps with a recommended list of programs hosted on that site for each platform. For macOS it could used sandbox-exec, for Linux iptables/ufw/(ebpf?), for OpenBSD pledge or pf, etc.
In general I agree with the reactions “we should already be opted out”/“this sucks” but at least this proposal would mean an actionable change. If the DNT header is justifiable (which is debatable!), then this is, and the DNT header exists, so…
The DNT header was a failure because it was sent with requests to a server. Homebrew, Gatsby, and Syncthing do not need to contact a server to work. Their phoning home is nonessential behavior, and is unrelated to their functionality—entirely unlike a web browser, which is a client to a network service that is explicitly being used.
Maybe this should be more granular? In many cases users can be served by e.g. (anonymously, or pseudonymously) letting the developer know which versions are in active use. Or checking if there’s a newer version available upstream, even if not in the distribution. And distribution can patch those checks out, but that can cause some friction with the devs (jwz & xscreensaver?).
I agree the standard is poorly defined, but I’m still quite surprised by the hostility it was met with. Most tools they sent PRs to already have an option for that, what they were proposing is changing if(our_own_variable) to if(our_own_variable || DO_NOT_TRACK), which is a trivial change without any compatibility concerns.
I would happily add it if I had any programs with tracking functionality.
This site is claiming to offer a “standard for opting out of telemetry”, but that is something we we already have: Unless I actively opt into telemetry, I have opted out. If I run your software and it reports on my behavior to you without my explicit consent, your software is spyware.
I know this comes up a lot, but I disagree with that stance. The vast majority of people leaves things on their defaults. The quality of information you get from opt-in telemetry is so much worse than from telemetry by default that it’s almost not worth it.
The only way I could see “opt-in” telemetry actually work is caching values locally for a while and then be so obnoxiously annoying about “voluntarily” sending the data that people will do it just to shut the program up about it.
That comment acts like you deserve to have the data somehow? Why should you get telemetry data from all the people that don’t care about actively giving it to you?
I’ve got idiosyncratic views on what “deserving” is supposed to mean, but I’ll refrain from going into philosophy here.
Because the data is better and more accurate. Better and more accurate data can be used to improve the program—which is something everyone will eventually benefit from. But if you skew the data towards the kinds of people who opt into telemetry.
Without any telemetry, you’ll instead either (a) get the developers’ gut instinct (which may fail to reflect real-world usage), or (b) the minority that opens bug tickets dictate the UI improvements instead, possibly mixed with (a). Just as hardly anyone (in the large scale of things) bothers with opting into telemetry, hardly anyone bothers opening bug tickets. Neither group may be representative of the silent majority that just wants to get things done.
Consider the following example for illustration of what I mean (it is a deliberate oversimplification, debate my points above, not the illustration):
Assume you have a command-line program that has 500 users. Assume you have telemetry. You see that a significant percentage of invocations involve the subcommand
check
, but no such command exists; most such invocations are immediately followed by the correctinfo
command. Therefore, you decide to add an alias. Curiously, nobody has told you about this yet. However, once the alias is there, everyone is happier and more productive.Had you not had telemetry, you would not have found out (or at least not found out as quickly, only when someone got disgruntled enough to open an issue). The “quirk” in the interface may have scared off potential users to alternatives, not actually giving your program a fair shot because of it.
Bob really wants a new feature in a software he uses. Bob suggests it to developers, but they don’t care. As far as they can tell, Bob is the only one wanting it. Bob analyzes the telemetry-related communication and writes a simple script that imitates it.
Developers are concerned about privacy of their users and don’t store IP addresses (it’s less than useless to hash it), only making it easier for Bob to trick them. What appears as a slow growth of active users, and a common need for a certain feature, is really just Bob’s little fraud.
It’s possible to make this harder, but it takes effort. It takes extra effort to respect users’ privacy. Is developing a system to spy on the users really more worthy than developing the product itself?
You also (sort of) argued that opt-in telemetry is biased. That’s not exactly right, because telemetry is always biased. There are users with no Internet access, or at least an irregular one. And no, we don’t have to be talking about developing countries here. How do you know majority of your users aren’t medical professionals or lawyers whose computers are not connected to the Internet for security reasons? I suspect it might be more common than we think. Then on the other hand, there are users with multiple devices. What can appear as n different users can really just be one.
It sort of depends on you general philosophical view. You don’t have to develop a software for free, and if you do, it’s up to you to decide the terms and conditions and the level of participation you expect from your users. But if we talk about a free software, I think that telemetry, if any, should be completely voluntary on a per-request basis, with a detailed listing of all information that’s to be sent in both human- and machine- readable form (maybe compared to average), and either smart enough to prevent fraudulent behavior, or treated with a strong caution, because it may as well be just an utter garbage. Statistically speaking, it’s probably the case anyway.
I’m well aware that standing behind a big project, such as Firefox, is a huge responsibility and it would be really silly to advice developers to rather trust their guts instead of trying to collect at least some data. That’s why I also suggested how I imagine a decent telemetry. I believe users would be more than willing to participate if they saw, for example, that they used a certain feature above-average number of times, and that their vote could stop it from being removed. It’s also possible to secure per-request telemetry with a captcha (or something like that) to make it slightly more robust. If this came up once in a few months, “hey, dear users, we want to ask”, hardly anyone would complain. That’s how some software does it, after all.
The fraud thing is an interesting theory, but I am unaware how likely it is; you’ve theorised a Bob who can generate fraudulent analytics but couldn’t fake an IP address or use multiple real IP addresses or implement the feature he actually wants.
It’s not that he couldn’t do it, it’s just much simpler without that. It’s really about the cost. It’s easy to
curl
, it’s more time consuming or expensive to use proxies, and even more so to solve captchas (or any other puzzles). The lower the cost, the higher the potential inaccuracy. And similarly, with higher cost, even legitimate users might be less willing to participate.I don’t have some universal solution or anything. It’s just something to consider. Sometimes it might be reasonable to put effort into making a robust telemetric system, sometimes none at all would be preferred. I’m trying to think of a case “in between”, but don’t see a single situation where jokingly-easy-to-fake results could be any good.
Telemetry benefits companies, otherwise companies wouldn’t use it. Perhaps it can benefit users, if the product is improved as a result of telemetry. But it also harms users by compromising their privacy.
The question is whether the benefits to users outweigh the costs.
Opt-out telemetry-using companies obviously aren’t concerned about the costs to users, compared to the benefits they (the companies) glean from telemetry-by-default. They are placing their own interests first, ahead of their users. That’s why they resort to dark patterns like opt-out.
You assume that we actually need telemetry to develop good software. I’m not so sure. We developed good software for decades without telemetry; why do we need it now?
When I hear the word “telemetry”, I’m reminded of an article by Joel Spolsky where he compared Sun’s attempts at developing a GUI toolkit for Java (as of 2002) to Star Trek aliens watching humans through a telescope. The article is long-winded, but search for “telescope” to find the relevant passage. It’s no coincidence that telemetry and telescope share the same prefix. With telemetry, we’re measuring our users’ behavior from a distance. There’s not a lot of signal there, and probably a lot of noise.
It helps if we can develop UsWare, not ThemWare. And I think this is why it’s important for software development teams to be diverse in every way. If our teams have people from diverse backgrounds, with diverse abilities and perspectives, then we don’t need telemetry to understand the mysterious behaviors of those mysterious people out there.
(Disclaimer: I work at Microsoft on the Windows team, and we do collect telemetry on a de-facto opt-out basis, but I’m posting my own opinion here.)
Telemetry usually is not about people’s behaviors, it’s about the mysterious environments the software runs in, the weird configurations and hardware combinations and outdated machines and so on.
Behavioral data should not be called telemetry.
One concrete benefit of telemetry: “How many people are using this deprecated feature? Should we delete it in this version or leave it in a while longer?”
Decades-old software is carrying decades-old cruft that we could probably delete, but we just don’t know for sure. And we all pay the complexity costs one paper cut at a time.
I’m as opposed to surveillance as anybody else in this forum. But there’s a steelman question here.
A social scientist could likewise say: “The quality of information you get from observing humans in a lab is so much worse than when you plant video cameras in their home without them knowing.”
How is this an argument that it’s ok?
There are three differences as far as I can tell:
The data from a hidden camera is not anonymizable. Telemetry, if done correctly (anonymization of data as much as possible, no persistent identifiers, transparency as to what data is and has been sent in the past), cannot be linked to a natural person or an indvidual handle. Therefore, I see no harm to the individual caused by telemetry implemented in accordance with best data protection practices.
Furthermore, the data from the hidden camera cannot cause corrective action. The scientist can publish a paper, maybe it’ll even have revolutionary insight, but can take no direct action. The net benefit is therefore slower to be achieved and very commonly much less than the immediate, corrective action that a software developer can take for their own software.
Finally, it is (currently?) unreasonable to expect a hidden camera in your own home, but there is an increased amount of awareness of the public that telemetry exists and settings should be inspected if this poses a problem. People who do care to opt out will try to find out how to opt out.
Finally, it is (currently?) unreasonable to expect a hidden camera in your own home, but there is an increased amount of awareness of the public that telemetry exists and settings should be inspected if this poses a problem. People who do care to opt out will try to find out how to opt out.
I think this is rather deceptive. Basically it’s saying: “we know people would object to this, but if we slowly and covertly add it everywhere we can eventually say that we’re doing it because everyone is doing it and you’ve just got to deal with it”.
I still disagree but I upvoted your post for clearly laying out your argument in a reasonable way.
You seem to miss a very easy, obvious, opt-in only strategy that worked for the longest time without feeling like your software was that creepy uncle in the corner undressing everyone. As you pointed out everyone keeps the defaults, you know what else most normies do? Click next until they can start their software. So you add a dialog in that first run dialog that is supposed to be there to help the users and it has a simple “Hey we use telemetry to improve our software (here is where you can see your data)[https://yoursoftware.com/data] and our (privacy policy)[https://yoursoftware.com/privacy]. By checking this box you agree to telemetry and data collection as outlined in our (data collection policy)[https://yoursoftware.com/data_collection] [X]”
and boom you satisfy both conditions, the one where people don’t go out of their way to opt into data collection and the other where you’re not the creepy uncle in the corner undressing everyone.
You can also view this as an standardized way for opt-in, which isn’t currently available either.
No, it is not. It is a standardized way for opt-out.
This is a bad comment, because it doesn’t add anything except for “I think non-consensual tracking is bad”, and is only tangentially related to OP insofar as OP is used as a soapbox for the above sentiment. Therefor I have flagged the comment as “Me-too”, regardless however much I may agree with it.
Except that in the European Union, the GDPR requires opt-in in most cases. IANAL, but I think it applies to the analytics that Homebrew collects as well. From the Homebrew website:
A Homebrew analytics user ID, e.g. 1BAB65CC-FE7F-4D8C-AB45-B7DB5A6BA9CB. This is generated by uuidgen and stored in the repository-specific Git configuration variable homebrew.analyticsuuid within $(brew –repository)/.git/config.
https://docs.brew.sh/Analytics
From the GDPR:
The data subjects are identifiable if they can be directly or indirectly identified, especially by reference to an identifier such as a name, an identification number, location data, an online identifier or one of several special characteristics, which expresses the physical, physiological, genetic, mental, commercial, cultural or social identity of these natural persons.
I am pretty sure that this UUID falls under identification number or online identifier. Personally identifyable information may not be collected without consent:
Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement.
So, I am pretty sure that Homebrew is violating the GDPR and EU citizens can file a complaint. They can collect the data, but then they should have an explicit step during the installation and the default should (e.g. user hits RETURN) be to disable analytics.
The other interesting implication is that (if this is indeed collection of personal information under the GDPR) is that any user can ask Homebrew which data they collected and/or to remove the data. To which they should comply.
As far as I can tell, you’re not actually citing the GDPR (CELEX 32016R0679), but rather a website that tries to make it more understandable.
GDPR article 1(1):
GDPR article 4(1) defines personal data (emphasis mine):
Thus it does not apply to data about people that are netiher identified nor identifiable. An opaque identifier like 1BAB65CC-FE7F-4D8C-AB45-B7DB5A6BA9CB is not per se identifiable, but as per recital 26, determining whether a person is identifiable should take into account all means reasonably likely to be used, such as singling out, suggesting that “identifiable” in article 4(1) needs to be interpreted in a very practical sense. Recitals are not technically legally binding, but are commonly referred to for interpretation of the main text.
Additionally, if IP addresses are stored along with the identifier (e.g. in logs), it’s game over in any case; even before GDPR, IP addresses (including dynamically assigned ones) were ruled by the ECJ to be personal data in Breyer v. Germany (ECLI:EU:C:2016:779 case no. C-582/14).
Sorry for the short answer in my other comment. I was on my phone.
The EC thinks differently:
https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en
It seems to me that an UUID is similar to cookie ID or advertising identifier. Using the identifier, it would also be trivially possible to link data. They use Google Analytics. Google could in principle cross-reference some application installs with Google searches and time frames. Based on the UUID they could then see all other applications that you have installed. Of course, Google does not do this, but this thought experimentat shows that such identifiers are not really anonymous (as pointed out in the working party opinion of 2014, linked on the EC page above).
Again, IANAL, but it would probably be ok to reporting installs without any identifier linking the installations. They could also easily do this, make it opt-in, report all people who didn’t opt in using a single identifier, generate a random identifier for people who opt-in.
They locked the PR talking about it and accused me of implying a legal threat for bringing it up. The maintainer who locked the thread seems really defensive about analytics.
Once you pop, you can’t stop.
I, too, thought that your pointing out their EU-illegal activity was distinct from a legal threat (presumably you are not a prosecutor), and that they were super lame for both mischaracterizing your statement and freaking out like that.
It seems this is just a general trait. See e.g. this
Now I really wish I had an ECJ decision to cite because at this point it’s an issue of interpretation. What is an advertising identifier in the sense that the EC understood it when they wrote that page—Is it persistent and can it be correlated with some other data to identify a person? Did they take into account web server logs when noting down the cookie ID?
Interesting legal questions, but unfortunately nothing I have a clear answer to.
Please cite the rest of paragraph 4, definitions:
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
https://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX%3A32016R0679
Which was what I quoted.
Your comment makes the following quotations:
Please ^F this entire string in the GDPR. I fail to find it as-is. They only start matching up in the latter half starting at “an identifier” and ending with “social identity”.
I agree it’s pedantic of me, but it’s not a 1:1 quote from the GDPR if a sentence is modified, no matter how small.
I’ve edited in the second half in any case though. I do not, however, see any way that modification would invalidate any of the points I’ve made there, however.
If that is true, consider submitting a PR, because GDPR violations are serious business.
Or don’t submit a PR. As the project has stated:
People have been banned from the project for doing exactly this.
“We don’t want to hear complaints” is not a new stance for Homebrew.
Yeah, I got the impression that they are pretty hardline on this. I hope that they’ll reconsider before someone files a GDPR complaint.
Personally, I don’t really have a stake in this anymore, since I barely use my Mac.
I guess a more creative solution would be to fork the main repo and disable the analytics code and point people to that.
Edit: the linked PR is from before the GDPR though.
But the above user didn’t post that did they? Your comment was meaningful and useful, but theirs was just sentimental. A law violation is a law violation, but OP just posted their own feelings about what they think is spyware and didn’t say anything about GDPR.
hmm I disagree, the OP is claiming that we should have a unified standard for “Do_Not_Track”. Finn is arguing that we shouldn’t need such a standard because unless I specifically state that I would like to be tracked, I should not be tracked and that any attempts to track is a violation of consent. Finn here is specifically disagreeing with the website in question. Should we organize against attempts to track without explicit consent, or give a unified way to opt out. These are fundamentally different questions and are actually directly related. If I say everyone should be allowed into any yard unless they have a private property sign, that may cause real concern for people who feel that any yard shouldn’t permit trespassing unless they have explicit permission. They are different concerns, that are related, and are more nuanced than “thing is bad”.
Okay. By your (non-accepted) definition, spyware abounds and is in common use.
Simply calling it “spyware” and throwing up your hands doesn’t work. They have knobs to turn the spying off, to opt-out. I just want all those knobs to have the same label.
It’s bloody disappointing that we’ve reached the stage where this is necessary.
We should also standardize a DNT for face scanners, like an invisible ink tattoo on your forehead. Then we can say “at least we did what we could” when bad actors do not respect it.
why assume that all programs performing opt-out analytics are inherently bad actors?
There is no such assumption in my comment.
Do Not Track failed in browsers. I’m very skeptic that this alternative version will succeed.
Do Not Track is not a browser setting. It’s a server setting, sent by a browser. The servers are free to ignore it
This is an initiative for local software, like Homebrew and Gatsby. They do not use the network as a matter of course. It’s a horse of a different color.
I find using terms like “ad tracking” to be rather misleading to describe collection of basic information on what people are doing with the software so developers have a better idea which areas to focus on improving.
Syncthing doesn’t even seem to “track” anything, it just checks if there is a newer version. How is this “tracking”? It’s just a completely different category and painting everything with this mile-wide brush as “tracking” is not constructive.
Keep reading the list. Syncthing doesn’t do ad tracking, but it does absolutely do a few of the other things that DO_NOT_TRACK is designed to indicate a lack of consent for.
It’s absolutely able to log my IPs as a user of the software because it phones home. I don’t want it to do that.
I read it. You made a PR to disable update checking.
So is Lobsters, or any other site you visit, your email host (or if you selfhost: people you send emails to), anyone on IRC, peers on bittorrent, any git push/pull, etc. etc. etc.
You can object to that, and that’s all fine, but this is just not “tracking” as understood by almost anyone else and equating it to that is just spreading FUD.
I agree that a distinction should be made, but I’d offer that a further distinction is necessary between software that communicates with a second-party and software that communicates with a third-party.
We have an interesting example in syncthing in that it has an explicit design goal to enable peer-only syncing. Assuming someone goes to the trouble of configuring their own relay & discovery servers, it seems odd they’d further have to opt out of communicating with a third-party, as there isn’t even a second-party.
Lobste.rs is a website, and it’s impossible to use the website without exchanging packets with it. That’s essential for functionality, and not avoidable (unless you use tor, which is somewhat analogous to setting DO_NOT_TRACK=1).
Homebrew and Gatsby are local applications. Syncthing can be configured to use lan discovery only and talk only between my computers. There is no need for it to phone home, unlike a website.
Local software != website
Checking if there’s a newer version is real functionality. And you haven’t really explained why it’s such a terrible thing that Syncthing can (in theory) log IP addresses , especially when a zillion other things can already do so.
The zillion other things that do so I choose to opt-in to, when I visit their websites. I don’t consent to local software phoning home unnecessarily simply because I used it on my local machine or LAN.
That still doesn’t answer the question on why Syncthing’s upgrade server potentially logging your IP address is “tracking”, or such a terrible thing.
It doesn’t need to be terrible for me to not want it. I don’t actually need a reason for wanting to preserve my privacy. The application taking that data and sending it away requires consent; consent which I do not give.
Why I do not consent is irrelevant, and personal, and may itself be private.
This is a non-discussion if you just want to assert tautologies. I can choose to not consent to programs creating files without my permission and create do-not-make-files.com, but if I then refuse to explain any reasoning behind it at the slightest hint of critical questioning, I’m just being unconstructive and silly.
100% complete privacy and consent is unworkable and unrealistic in all but the simplest examples. You’re the one making claims, so you’re the one who has to justify them. If you don’t want to do that: fine. But don’t be surprised if people are going to roll their eyes and dismiss whatever you’re saying.
I do not consent to my usage being transmitted is not a tautology. It is a fact.
Software that contacts the network without my consent, transmitting information about my usage, is malware.
I have no idea why you feel the need to come back to this after 3 months, especially considering your reply adds nothing to any discussion, but let me be clear: calling people’s hobby projects “malware” over a disagreement about whether or not it should check for updates is deeply toxic. This is the sort of behaviour that quite literally makes people leave Open Source development.
I think this proposal raises a good pont, but has two major flaws. First, it does not distinguish between anonymous telemetry and user tracking. Telemetry, if done right, helps developers in improving their software (just think about Debian’s ages-old
popcon
). Real (even pseudonymous) user tracking is a whole different beast.Second, anything collecting even pseudonymous user data must be opt-in as per GDPR. This proposal gets it the wrong way round. There should be the inverse of an environment variable that indicates consent to tracking. Since people still install
popcon
, I don’t think this effectively means that nobody will ever opt-in.I make a counter-proposal. How about this: An environment variable
DATA_COLLECTION_CONSENT
, with these possible values:no
(default assumed if missing),yes
(consent to everything, including pseudonymous data collection),anonymous
(allow data transmission that cannot, ever, be mapped to a user, such as telemetry or Debian’spopcon
).Also, please, do not file such proposals against individual projects as PRs. This topic needs wider attention and should be under the umbrella of freedesktop.org’s guidelines.
Edit: This comment got me thinking. I leave this here for reference, but after reading said comment I’m not so sure anymore if telemetry is really something developers should have access to.
Popcon was opt-in, if I recall correctly. I have no issue with off-by-default tracking. I just want all of the apps that force me to manually opt-out to have the lever to do so labelled consistently.
Package maintainers for distros should just patch all telemetry and tracking out of software.
Forgive the shameless plug: this is something that Debian does often and a lot of people are unaware of.
That’s great, thank you :)
What if the package manager itself is the thing doing the spying?
Well then that’s a package manager I wouldn’t want to use? Or are you referring to something specific?
Homebrew does this by default, and there are no other good package managers for macOS.
I’d really like a standard like this. Personally I don’t care about telemetry data these applications collect but a general flag that allows those who really care to opt out without worrying about every program they use also helps to empower apps which use telemetry to improve in giving a consistent interface signalling non-consent. However, the author has seemingly gone about this poorly in coordination around popular applications that collect telemetry. From the linked PR’s:
…
So you made the website, posted the PR’s, and then dropped your page on link aggregators, without actually discussing this with the maintainers of the aforementioned software? Perhaps a better route would’ve been first publicly initiating discourse surrounding such a flag, coordinating the initiation of such discourse with the support of maintainers who collect telemetry, and providing a timeline and place for such discourse, before just throwing PR’s at the wall as though the thing you want on your website is instantly a standard.
I’m not sure how what you proposed is different from what I did. Publicly initiating discourse: I made a website and posted it to other websites with comments sections. Coordinating the initiation of such discourse: I created patches and PRs, on a website that has a comments section.
I didn’t throw anything at a wall.
You did not do this. You instead declared your standard and posted PR’s, with discourse instead not on how to establish such a standard, but just about the standard you yourself decided on.
You did not do this. You posted a PR after you initiated the discourse with an already cemented “standard”.
You neither provided a timeline for developing a standard nor did you provide a place to develop a discussion about such a standard, because you decided on the “standard” before posting it, without including anyone else. That’s not a standard, that’s an opinion.
Does that clear it up?
I’ve avoided Homebrew like the plague exactly because of this, and their stance on this. I’ve switched to Macports since, and I haven’t missed any packages there, but the day that I do I will try to add it as a port there.
I think that a Do Not Track standard is doomed to fail because (1) it failed on the web and (2) the linked PRs don’t seem to keen on implementing this.
In the console I can still vote with my feet, and I do. On the web it’s a different story..
The way the maintainers respond to your PRs is disappointing to say the least. Pretending not to know what tracking implies, or “it’s not garnered enough adoption for us to consider it”-type thinking…
Another similar standard is NO_COLOR by @jcs of lobste.rs fame. I’m immensely sympathetic to the idea (I am opposed to both underhanded tracking and pointless fruit salad in my terminal), but the depressing fact is that most people don’t seem to care.
I added a link to it on the site, thank you for the pointer!
It’d be really nice to compile a few firewall sandbox profile scripts for various platforms which either block or ask for confirmation when a script process or any child hits the network, and then have a dotfile line which listed programs to automatically enforce those sandboxes over, perhaps with a recommended list of programs hosted on that site for each platform. For macOS it could used sandbox-exec, for Linux iptables/ufw/(ebpf?), for OpenBSD pledge or pf, etc.
In general I agree with the reactions “we should already be opted out”/“this sucks” but at least this proposal would mean an actionable change. If the DNT header is justifiable (which is debatable!), then this is, and the DNT header exists, so…
The DNT header was a failure because it was sent with requests to a server. Homebrew, Gatsby, and Syncthing do not need to contact a server to work. Their phoning home is nonessential behavior, and is unrelated to their functionality—entirely unlike a web browser, which is a client to a network service that is explicitly being used.
Maybe this should be more granular? In many cases users can be served by e.g. (anonymously, or pseudonymously) letting the developer know which versions are in active use. Or checking if there’s a newer version available upstream, even if not in the distribution. And distribution can patch those checks out, but that can cause some friction with the devs (jwz & xscreensaver?).
[Comment from banned user removed]
Does this include DNS resolution? /troll
While I do appreciate the intent, this feels more like an initiative and less of a “standard” (as it’s being advertised).
If you had a bit more of a formal spec, those PRs would have a much higher chance of being merged.
I agree the standard is poorly defined, but I’m still quite surprised by the hostility it was met with. Most tools they sent PRs to already have an option for that, what they were proposing is changing
if(our_own_variable)
toif(our_own_variable || DO_NOT_TRACK)
, which is a trivial change without any compatibility concerns.I would happily add it if I had any programs with tracking functionality.