None of this would be an issue if users brought their own data with them.
Imagine if users showed up at a site and said “Hey, here is a revokable token for storing/amending information in my KV store”. The site itself never needs to store anything about the user, but instead makes queries with that auth token to modify their slice of the user’s store.
This entire problem with privacy and security would go away, because the onus would be on the user to keep their data secure–modulo laws saying that companies shouldn’t (and as a matter of engineering and cost-effectiveness, wouldn’t) store their own copies of customer data.
http://remotestorage.io/ did this. I’ve worked with it and it’s nowhere near usable. There are so many technical challenges (esp. with performance) you face on the way that result of you basically having to process all user data clientside, but storing the majority of data serverside. It gets more annoying when you attempt to introduce any way of interaction between two users.
We did try this, saw that it’s too hard (and for some services an unsolved problem) and did something else. There’s no evil corporatism in that, nor is it a matter of making profit, even if a lot of people especially here want to apply that imagination to everything privacy-related. It’s human nature.
basically having to process all user data clientside
If I go to a site, grant that site a token, couldn’t that server do processing server side?
It gets more annoying when you attempt to introduce any way of interaction between two users.
Looking at remotestorage it appears there’s no support for pub/sub, which seems like a critical failing to me. To bikeshed an example, this is how I see something like lobste.rs ought to be implemented:
User data is stored in servers (like remotestorage) called pods, which contain data for users. A person can sign up at an existing pod or run their own, fediverse-style.
These pods support pub/sub over websocket.
A particular application sits on an app server. That app server subscribes to a list of pods for pub/sub updates, for whatever users that have given that application permission. On top of these streams the app server runs reduce operations and keeps the result in cache or db. A reduce operation might calculate something like, give me the top 1000 items sorted by hotness (a function of time and votes), given streams of user data.
A user visits the site. The server serves the result instantly from its cache.
Additionally the pub/sub protocol would have to support something like resuming broken connections, like replay messages starting from point T in time.
Anyway, given this kind of architecture I’m not sure why something like lobste.rs for example couldn’t be created - without the performance issues you ran into.
If I go to a site, grant that site a token, couldn’t that server do processing server side?
If your data passes through third-party servers, what’s the point of all of this?
The rest of your post is to me, with all due respect, blatant armchair-engineering.
The pub/sub stuff completely misses the point of what I am trying to say. I’m not talking about remotestorage.io in particular.
Lobste.rs is a trivial usecase, and not even an urgent one in the sense that our centralized versions violate our privacy, because how much privacy do you have on a public forum anyway? Let’s try something like Facebook. When I post any content at all, that content will have to be copied to all different pods, making me subject to the lowest common denominator of both their privacy policies and security practices. This puts my privacy at risk. Diaspora did this. It’s terrible.
Let’s assume you come up with the very original idea of having access tokens instead, where the pods would re-fetch the content from my pod all the time instead of storing a copy. This would somewhat fix the risk of my privacy (though I’ve not seen a project that does this), but:
Now the slowest pod is a bottleneck for the entire network. Especially stuff like searching through public postings. How do you implement Twitter moments, global or even just local (on a geographical level, not on network topology level) trends?
Fetching the data from my pod puts the reader’s privacy at risk. I can host a pod that tracks read requests, and, if the system is decentralized enough, map requests from pods back to users (if the request itself doesn’t already contain user-identifying info)
If your data passes through third-party servers, what’s the point of all of this?
It decouples data and app logic. Which makes it harder for an application to leverage its position as middle man to the data you’re interested in. Doing stuff like selling your data or presenting you with ads. Yet you put up with it because you are still interested in the people there. Because if data runs over a common protocol you’re free to replace the application-side of things without being locked in. For example, I bet there’s some good content on Facebook but I never go there because I don’t trust that company with my data. I wish there were some open source, privacy friendly front end to the Facebook network available, that would let me interact with people there, without sitting on Facebook’s servers, and open source. Besides that, if an application changes its terms of use, maybe you signed up trusting the application, but now you’re faced with a dilemma of rejecting the ToS and losing what you still like about the application, or accepting new crappy terms.
The rest of your post is to me, with all due respect, blatant armchair-engineering.
Ha! Approaching a design question by first providing an implementation without discussion seems pretty backwards to me. Anyway, as far as I’m concerned I’m just talking design. Specifically I’m criticizing what I perceive as a deficiency in remotestorage’s capabilities. And arguing that a decentralized architecture doesn’t have to be slow, is at least as good as a centralized architecture, and better, in many regards, for end users.
Let’s try something like Facebook. When I post any content at all, that content will have to be copied to all different pods,
No, I was saying that this would be published to subscribing applications. There could be a Facebook application. And someone else could set up a Facebook-alternative application, with the same data, but a different implementation. Hey, you could even run your own instance of Facebook-X application.
making me subject to the lowest common denominator of both their privacy policies and security practices.
If you grant an application access to your data, you grant it access to your data. I don’t see a way around that puzzle in either a centralized or decentralized architecture. If anything, in a decentralized architecture you have more choices. Which means you don’t have to resign yourself to Facebook’s security and privacy policies if you want to interact with the “Facebook” network. You could move to Facebook-X.
Now the slowest pod is a bottleneck for the entire network. Especially stuff like searching through public postings. How do you implement Twitter moments, global or even just local (on a geographical level, not on network topology level) trends?
What I was describing was an architecture where pods just store data. Apps consume and present it. If I have an app, and I subscribe to X pods, there’s no reason I have to wait for the slowest pod’s response in order to construct a state that I can present users of my app.
So for something like search, or Twitter moments, you would have an application that subscribes to whatever pods it knows about. Those pods publish notifications to the app over web socket, for example whenever a user tweets. Your state is a reduction over these streams of data. Let’s say I store this in an indexed lookup like ElasticSearch. So every time a user posts a tweet, I receive a notification and add it to my instance of ElasticSearch. Now someone opens my app, maybe by going to my website. They search for X. The app queries the ElasticSearch instance. It returns the matching results. I present those results to the user’s browser.
Fetching the data from my pod puts the reader’s privacy at risk.
Hmm, I’m not sure if we’re on the same page. In the design I laid out, the app requests this data, not the pod.
“With respect, “social media” and aggregator sites are red herrings here. They cant be made to protect privacy by their very nature.”
Sure they can. Starting with Facebook, they can give privacy settings per post defaulting on things like Friends Only. They could even give different feeds for stuff like Public, Friends Only, or Friends of Friends. They can use crypto with transparent key management to protect as much of the less-public plaintext as possible. They can support E2E messaging. They can limit discovery options for some people where they have to give you a URL or something to see their profile. Quite a few opportunities for boosting privacy in the existing models.
Far as link aggregators, we have a messaging feature that could be private if it isn’t already. Emails and IP’s if not in public profile. The filters can be seen as a privacy mechanism. More to that point, though, might be things like subreddits that were only visible to specific, invited members. Like with search, even what people are looking at might be something they want to keep private. A combo of separation of user activities in runtime, HTTPS and little to no log retention would address that. Finally, for a hypothetical, a link aggregator might also be modified to easily support document drops over an anonymity and filesharing service.
Because the most formidably grown business of late are built on the ability to access massive amounts of user data at random. Companies simply don’t know how to make huge money on the Internet without it.
The real problems are around an uneducated consumption-driven populous: Who can resist finding out “which spice girl are you most like?” – but would we be so willing to find out if it meant we get a president we wouldn’t like?
It is very hard for people to realise how unethical it is to hold someone responsible for being stupid, but we crave violence: We feel no thrill that can compare serving food, working in an office, or driving a taxi. Television and Media give us this violence, an us versus them; Hillary versus Urine Hilarity or The Corrupt Incumbent versus a Chance to Make America Great Again, or even Kanye versus anybody and everybody.
How can we make a decision to share our data? We can never be informed of how it will be used against us.
The GDPR does something very interesting: It says you’re not allowed to use someones data in a way they wouldn’t want you to.
I wish it simply said that, but it’s made somewhat complicated by a weird concept of “data” It’s clear that things like IP addresses aren’t [by themselves] your data, and even a name like John Smith isn’t data. Software understands data but not the kind of “data” that the GDPR is talking about. Pointing to “you” and “data” is a fair thick bit of regulation if you don’t want to draw a box around things and prevent sensible people from interpreting the forms of “data” nobody has yet thought of.
But keep it simple: Would that person want you doing this? Can you demonstrate why you think that is and convince reasonable people?
I’m doing a fair bit of GDPR consulting at the moment, and whilst there’s a big task in understanding their business, there’s also a big task getting them to approach their compliance from that line of questioning: How does this make things better for that person? Why do they want us to do this?
We’re not curing cancer here, fine, but certainly there are degrees.
Browser cookies is something that crossed my mind after I suggested this, but my experience as a web dev makes me immediately suspect of them as durable stores. :)
This still doesn’t solve problems with tracking, because companies have already started to require GDPR opt-in to use their products (even when using the product doesn’t necessarily require data tracking), or to use their products without a degraded user experience.
See cloudflare, recaptcha, facebook, etc.
“You can’t use this site without Google Analytics having a K/V-auth-token”, “We will put up endless ‘find-the-road-sign’ captchas if we can’t track you”, etc.
It’s a mistake to think you can “GDPR opt-in”. You can’t.
You have to prove that the data subject wants this processing. One way to do this is to ask for their consent and make them as informed as possible about what you’re doing. But they can decide not to, and they can even decide to revoke their consent at any time until you’ve actually finished the processing and erased their data.
These cookie/consent banners are worse than worthless; a queer kind of game people like Google are playing to try to waste time of the regulators.
We will put up endless ‘find-the-road-sign’ captchas if we can’t track you
I’ve switched to another search engine for the time being. It’s faster, the results are pretty good, and I don’t have to keep fiddling with blocking that roadblock on Google’s properties.
I wish we had this hysteria about the recent child protection acts that target prostitution specifically in the US, making life terribly dangerous for sex workers. There was a recent Reply All podcast where they interviewed a researcher who examined murder rates for women in markets as Craigslist offered their persons section (often used for adult services) to cities. The before and after picture is that murders went down 17% on average (that’s all women; not just sex workers, so we’re just talking correlation, not causation):
The GDPR, just like the child sex trafficking protection laws in the US, will be a real time case study and it will be interesting to see the effect over the new few years.
I think American devs are afraid of the GDPR because we’ve seen how laws like this can backfire. Specifically the GDPR probably couldn’t pass in the US simply due to freedom of speech (which is why we can’t have a protected sex offender list like Australia does, or real criminal record expulsion).
I like the idea of the GDPR, but I hope it doesn’t turn into a tool for censorship (like the Right to be Forgotten laws, which the EFF opposes).
I think people should do their best to comply and some of the projects that have closed are being hysterical, but at the same time, people don’t really know what will and won’t be acceptable until we see actual enforcement and what that will involve.
The theory is that the government isn’t allowed to interfere with people speaking.
It’s of course not true: The US has Libel laws and can obviously choose to recognise protections for certain kinds of speech (e.g. secret/clearance documents, etc).
I think American devs are afraid of the GDPR because we’ve seen how laws like this can backfire.
I see this as: Devs are afraid because they have to comply to something (annoying) that they didn’t have before.
the GDPR probably couldn’t pass in the US simply due to freedom of speech
I don’t see the link between GDPR and freedom of speed. GDPR is about user data retention. Freedom of speech is pretty key in most countries in western Europe, and I don’t think they plan anything to sabotage it.
I like the idea of the GDPR, but I hope it doesn’t turn into a tool for censorship
Again, GDPR is about user data retention. You could probably use that to censor a company in some way, but that would be pretty hard to prove and the company censored would first have to be audited for that matter.
I think you might mix-up GDPR with something else.
I see this as: Devs are afraid because they have to comply to something (annoying) that they didn’t have before.
That’s wrong. They did need to keep my data safe from being hacked off their servers, there was simply little-to-no threat of law.
Again, GDPR is about user data retention.
GDPR is not about data retention. There’s no minimum or maximum time that you have to retain data.
I’m doing a fair amount of GDPR consulting at the moment, and this isn’t the strangest theory I’ve heard about the regulations.
There’s a big chunk about keeping data safe. If you have personal data, you have a responsibility to keep yourself from being hacked. That means using best practices for minimising risk like encryption and deleting it when you don’t need it anymore, and understanding who in your company can access the data (and when they do it).
There’s also a big chunk about making sure if you use personal data, you’re only using it in a way that the subject would approve of. This really means being able to demonstrate (perhaps to a regulator) why you think you have their consent to use this data. Records and contracts can help, but the subject can also back out at any time and needs to have controls to do so.
You could probably use that to censor a company in some way
You cannot. If you believe a company is using your data inappropriately, you report them to a regulator. You do not get to “prove your case” and you won’t be asked to show up in court. The regulator will assess the situation and prioritise it based on the claim and risk for further damage. The regulator will talk to that company and find out what is going on and correct the issue.
If that non-compliance is egregious and wilful, then the regulator has a pretty big stick, but this is far removed from “censoring a company” in any possible interpretation of the term.
They did need to keep my data safe from being hacked off their servers, there was simply little-to-no threat of law.
No they didn’t. If it’s for-profit and no laws stopping it, then keeping your data in a barely-secure form is legal and maybe even beneficial for the organization. Most organizations that have data breaches take a financial hit before going back to normal. Strong investments in security cost money every year. Managers might also believe they reduce productivity if applied everywhere. The managers apathetic to security wanting more ways to make money will see your data as an asset whose leaking barely concerns them.
So, capitalist companies operating under their theory of morality in a system with no liability for data-related externalities should continue to collect on you ignoring as many risks as they can get away with. That’s what most were doing before regulation forced them to care a little more. Also, why I support such regulations.
Thank you very much for correcting my false ideas.
GDPR is not about data retention.
I’m not a native speaker but to me retention is the fact of holding the data, so indeed, holding it securely. In addition of that I particulary meant the “Right to erasure” and “Right of access”, I’m more familiar with the side of friends having to deal with the documentation process (to actually have somewhere why you can hold this data). But I”m by no means an expert on the subject.
By censoring I was thinking that since the proof that you need to hold a data might be pretty subjective, the regulator could probably damage a company which business is holding the data, but I agree that it’s very extrapolated.
If you (a business) actually need to hold data on a subject, then indeed the “proof” is quite subjective. You have to feel comfortable you can convince regulators that your processing is a part of you providing a service for that subject, and that they would expect you to use their data in this way. Simple examples might be keeping someone’s address in order to ship them goods that they ordered.
If you are an individual and you want to compel a company to remove/erase data they have on you, understand that they can ignore such a request with regards to things like the address they used to ship goods (among other reasons).
If you are an individual and you want to ask a company to provide data they have on you, it should be easy to do so with regards to things like the address they used to ship goods to you. They’re under no obligation (however) to discover who you are – that is, if you send them an IP address they’re not required to link any information or activity they have on that IP address to you.
the GDPR probably couldn’t pass in the US simply due to freedom of speech
It would obviously depend on the details, but it’s not inconsistent with the US’s view of free speech to regulate various kinds of commercial record-keeping and enforce privacy and access protections on those records. For example, healthcare data is fairly strongly regulated in the US, and this hasn’t been found to be a constitutional problem. (The “right to be forgotten” laws are a different story.)
I health company (insurance, hospital, whoever) is bound by HIPPA. A school is bound by FERPA. They can’t divulge information. But if someone leaks someone else’s medical records and a news paper publishes them, that information is protected in the paper. Now wherever the leak happened, that’s a problem if it was someone covered under HIPPA.
Criminal records can’t be expunged in the US. Not really. While your record was public, some other company scooped that data up and can sell it forever even if your official record is clear. Maybe we’ll have laws that will force companies to ignore those styles of background checks (some states probably do).
Actually this is a good question, how does the GDPR affect collecting data about people who aren’t your customers or who ever visit your website or store front? Does it say anything about collecting public data?
There are many companies offering to help with GDPR for 6 figure amounts. Cost of compliance is in the millions for many larger companies. (this author clearly doesn’t understand the true cost of things)
So far there are no real privacy benefits for me as a user. I don’t care about people tracking my IPs personally, or running analytics, retargeting, or doing split testing. I care about people losing access to my passwords, social, credit cards, messages, pictures, location data etc. I haven’t seen much improvement in that area. End result so far seems to be more checkboxes and the ability to delete my user account. #awesome
It’s too early to tell if anything good will come out of GDPR. Fingers crossed though, there are real privacy issues to solve and I hope it helps with that.
All the GDPR stuff is full of “business organization company …”
What about individuals operating non-commercial websites? Specifically outside the EU?
Let’s say I operate a forum in Russia, some EU citizens come and save their home addresses into their profiles on the forum and then want me to erase all the things. I reply with “sorry, your data is stored in 10 blockchains, it’s immutable”. They complain to their local data protection thingy. What happens next?
Say I wrote some social network that runs as a P2P application. As I want the network to take off, I also host my own nodes. Users, by using the software, will broadcast all kinds of data into the network, which is then stored, cached, and redistributed by the nodes. It will be impossible to give users a “delete all my data” flip; nodes may be hosted by anyone, anywhere, anonymously. I could potentially delete user data from my nodes on demand, but even that may be difficult to arrange for in practice. If an encrypted hash->value store is used, it might actually require me to collect more data in order to identify the rightful “owner” of a given object. But collecting such data about users may be impossible, since much of it depends on what client they use and what nodes they connect through, how they authenticate, etc.
I think this actually is possible to handle. If a third party gives you a statement about GDPR-compliance then that’s possible.
Also note that part of this actually isn’t a new legal problem in most European countries. Even though such topics come up again with GDPR. Informing people about collected data and in many countries the right to correction and deletion of data has been existing for decades. Also why not binding the EU in 1995 adopted the Data Protection Directive, which in large parts was implemented by various European countries.
In theory, if people’s fear is greater than the true cost, it seems like there’s an opportunity for people to offer recommendations and then accept the legal risk in exchange for a fee. Since the risk here is posited to be tiny, you could charge them a tiny amount and take all the risk on yourself and make money. Then we can see the true cost of implementation. For instance, I have a blog I haven’t written anything on since college that has comments on it. If the cost to indemnify is $2.5k, I might as well just black hole EU traffic before it hits the site. If the cost to recommend a bunch of Wordpress plugins and then accept indemnification is $25, I might pay it and install the plugins.
Since this blogpost is clearly in reaction to announcements of the sort that monal.im recently made I think it is reasonable to demand something simple from proponents of the law. Here’s an offer: $25 for initial audit, and $5/yr afterwards for indemnification so long as I comply with remediation suggestions. I’ll retain the right to publish the remediation suggestions. What’s your offer?
(Personally, I’m not worried about GDPR for my blog, which is why I value these services so low, but I think the lack of existence of these services is evidence that those arguing that these reactions are hysteria are incorrectly pricing the cost of operating under the increased regulation)
I can’t audit a company for $25. That’s crazy. A big part of GDPR consulting is getting companies to document what data they collect, what they do with it, how long they keep it, and why do they think they’re keeping it safe. This takes days- sometimes months to tease out. If you can write it on a napkin, I could just about read that napkin for $25, and tell you if it’s compliant or not, but really? I’d tell you for a beer.
If you’ve got a Wordpress site, and you respond to reasonable email/post on how to (a) delete themselves (if they’re a commenting user) and/or (b) their comments, as well as (c) get a list of all their comments and log-in attempts, then you’re probably fine.
If you’ve got a Wordpress site without any comments/users, then you’re probably fine.
If you’ve got a Wordpress site without any comments/users, but you use some plugin from some third-party site to do comments; etc, then you might need to talk to that third-party. Maybe.
Chasing down that third party is where most people blow their budget.
Well, then whatever you’d charge is the true cost of compliance, isn’t it? OP’s big argument is that people are hysterical about GDPR: i.e. that fear of costs are much higher than costs and that costs after accounting for risk are actually not high. If OP is right, then OP could become fabulously wealthy by arbitrage between these two by selling indemnification insurance.
OP is not doing that, and as a matter of fact, there’s no one doing that at a rate these people would use. This means that either people aren’t hysterical or OP has underestimated costs after accounting for risk.
Essentially, to OP I’m saying “if what you’re saying is true, you have a direct path to incredible wealth. That you aren’t taking it makes it apparent to me that what you’re saying isn’t true”. An audit is worthless without the indemnification.
That’s not true. It’s like buying insurance against bad PR that requires that you’ll behave in some fashion and assuming the fallout of the PR is measurable.
If cost to make this happen is low but I think it’s high and I’m willing to pay some number past the cost to make it happen, then you can get rich. The article is arguing that cost to make this happen is low and I think it’s high. The only wiggle room it has is whether I’ll pay some number past the cost.
So far no one has offered the service while taking on the responsibility of having done a good job. Not one person.
There is no shortage of people saying it’s easy and low risk and a huge shortage of people who’ll do it. Chances are it’s what you’re saying: it’s months of work. It’s expensive, and people are pricing accordingly when they exclude the EU.
If the cost of the audit and assurances from the regultors are lower than this indemnification insurance (which I can only offer if I audit and speak to the regulators) then nobody will offer than indemnification you’re looking for except snake-oil salesmen.
Same thing for PR: once you’ve accurately evaluated your specific risk, there’s no point in buying insurance. Just fix the problems.
Here’s another way to look at it: People buy insurance when they can’t evaluate the risk, and actuaries are able to take the mean projected risk to cost that insurance. If the variance on that risk is low then it is mass market insurance. If it is high but still lower than the market mean then you can buy revenue protection (like futures). If the variance exceeds the market mean, however, you’re not selling the work of an actuary, you’re making bets not selling insurance. Market segmentation for insurance purposes is our audit in this case, and since I can accurately determine compliance/risk it’s much better for the company to just fix these issues.
Here’s yet another way to look at it: the risk of your being in a plane crash isn’t 1:11million if you don’t fly. A sucker is someone who buys plane crash insurance — even at those rates — when they don’t fly.
Also: That the GDPR is not so complex that few companies actually need an audit is the point of the article. Not that “risk is low”.
Just dropping a little helper for people working on the subject (I think that it has already been posted on lobsters though): http://gdprchecklist.io/
So, this might be a good time to float an idea:
None of this would be an issue if users brought their own data with them.
Imagine if users showed up at a site and said “Hey, here is a revokable token for storing/amending information in my KV store”. The site itself never needs to store anything about the user, but instead makes queries with that auth token to modify their slice of the user’s store.
This entire problem with privacy and security would go away, because the onus would be on the user to keep their data secure–modulo laws saying that companies shouldn’t (and as a matter of engineering and cost-effectiveness, wouldn’t) store their own copies of customer data.
Why didn’t we do this?
http://remotestorage.io/ did this. I’ve worked with it and it’s nowhere near usable. There are so many technical challenges (esp. with performance) you face on the way that result of you basically having to process all user data clientside, but storing the majority of data serverside. It gets more annoying when you attempt to introduce any way of interaction between two users.
We did try this, saw that it’s too hard (and for some services an unsolved problem) and did something else. There’s no evil corporatism in that, nor is it a matter of making profit, even if a lot of people especially here want to apply that imagination to everything privacy-related. It’s human nature.
If I go to a site, grant that site a token, couldn’t that server do processing server side?
Looking at remotestorage it appears there’s no support for pub/sub, which seems like a critical failing to me. To bikeshed an example, this is how I see something like lobste.rs ought to be implemented:
User data is stored in servers (like remotestorage) called pods, which contain data for users. A person can sign up at an existing pod or run their own, fediverse-style.
These pods support pub/sub over websocket.
A particular application sits on an app server. That app server subscribes to a list of pods for pub/sub updates, for whatever users that have given that application permission. On top of these streams the app server runs reduce operations and keeps the result in cache or db. A reduce operation might calculate something like, give me the top 1000 items sorted by hotness (a function of time and votes), given streams of user data.
A user visits the site. The server serves the result instantly from its cache.
Additionally the pub/sub protocol would have to support something like resuming broken connections, like replay messages starting from point T in time.
Anyway, given this kind of architecture I’m not sure why something like lobste.rs for example couldn’t be created - without the performance issues you ran into.
If your data passes through third-party servers, what’s the point of all of this?
The rest of your post is to me, with all due respect, blatant armchair-engineering.
The pub/sub stuff completely misses the point of what I am trying to say. I’m not talking about remotestorage.io in particular.
Lobste.rs is a trivial usecase, and not even an urgent one in the sense that our centralized versions violate our privacy, because how much privacy do you have on a public forum anyway? Let’s try something like Facebook. When I post any content at all, that content will have to be copied to all different pods, making me subject to the lowest common denominator of both their privacy policies and security practices. This puts my privacy at risk. Diaspora did this. It’s terrible.
Let’s assume you come up with the very original idea of having access tokens instead, where the pods would re-fetch the content from my pod all the time instead of storing a copy. This would somewhat fix the risk of my privacy (though I’ve not seen a project that does this), but:
See also this Tweet, from an ex-Diaspora dev
It decouples data and app logic. Which makes it harder for an application to leverage its position as middle man to the data you’re interested in. Doing stuff like selling your data or presenting you with ads. Yet you put up with it because you are still interested in the people there. Because if data runs over a common protocol you’re free to replace the application-side of things without being locked in. For example, I bet there’s some good content on Facebook but I never go there because I don’t trust that company with my data. I wish there were some open source, privacy friendly front end to the Facebook network available, that would let me interact with people there, without sitting on Facebook’s servers, and open source. Besides that, if an application changes its terms of use, maybe you signed up trusting the application, but now you’re faced with a dilemma of rejecting the ToS and losing what you still like about the application, or accepting new crappy terms.
Ha! Approaching a design question by first providing an implementation without discussion seems pretty backwards to me. Anyway, as far as I’m concerned I’m just talking design. Specifically I’m criticizing what I perceive as a deficiency in remotestorage’s capabilities. And arguing that a decentralized architecture doesn’t have to be slow, is at least as good as a centralized architecture, and better, in many regards, for end users.
No, I was saying that this would be published to subscribing applications. There could be a Facebook application. And someone else could set up a Facebook-alternative application, with the same data, but a different implementation. Hey, you could even run your own instance of Facebook-X application.
If you grant an application access to your data, you grant it access to your data. I don’t see a way around that puzzle in either a centralized or decentralized architecture. If anything, in a decentralized architecture you have more choices. Which means you don’t have to resign yourself to Facebook’s security and privacy policies if you want to interact with the “Facebook” network. You could move to Facebook-X.
What I was describing was an architecture where pods just store data. Apps consume and present it. If I have an app, and I subscribe to X pods, there’s no reason I have to wait for the slowest pod’s response in order to construct a state that I can present users of my app.
So for something like search, or Twitter moments, you would have an application that subscribes to whatever pods it knows about. Those pods publish notifications to the app over web socket, for example whenever a user tweets. Your state is a reduction over these streams of data. Let’s say I store this in an indexed lookup like ElasticSearch. So every time a user posts a tweet, I receive a notification and add it to my instance of ElasticSearch. Now someone opens my app, maybe by going to my website. They search for X. The app queries the ElasticSearch instance. It returns the matching results. I present those results to the user’s browser.
Hmm, I’m not sure if we’re on the same page. In the design I laid out, the app requests this data, not the pod.
With respect, “social media” and aggregator sites are red herrings here. They cant be made to protect privacy by their very nature.
I’m more thinking about, say, ecommerce or sites that aren’t about explicitly leaking your data with others.
“With respect, “social media” and aggregator sites are red herrings here. They cant be made to protect privacy by their very nature.”
Sure they can. Starting with Facebook, they can give privacy settings per post defaulting on things like Friends Only. They could even give different feeds for stuff like Public, Friends Only, or Friends of Friends. They can use crypto with transparent key management to protect as much of the less-public plaintext as possible. They can support E2E messaging. They can limit discovery options for some people where they have to give you a URL or something to see their profile. Quite a few opportunities for boosting privacy in the existing models.
Far as link aggregators, we have a messaging feature that could be private if it isn’t already. Emails and IP’s if not in public profile. The filters can be seen as a privacy mechanism. More to that point, though, might be things like subreddits that were only visible to specific, invited members. Like with search, even what people are looking at might be something they want to keep private. A combo of separation of user activities in runtime, HTTPS and little to no log retention would address that. Finally, for a hypothetical, a link aggregator might also be modified to easily support document drops over an anonymity and filesharing service.
Because the most formidably grown business of late are built on the ability to access massive amounts of user data at random. Companies simply don’t know how to make huge money on the Internet without it.
We did. They’re called browser cookies.
The real problems are around an uneducated consumption-driven populous: Who can resist finding out “which spice girl are you most like?” – but would we be so willing to find out if it meant we get a president we wouldn’t like?
It is very hard for people to realise how unethical it is to hold someone responsible for being stupid, but we crave violence: We feel no thrill that can compare serving food, working in an office, or driving a taxi. Television and Media give us this violence, an us versus them; Hillary versus Urine Hilarity or The Corrupt Incumbent versus a Chance to Make America Great Again, or even Kanye versus anybody and everybody.
How can we make a decision to share our data? We can never be informed of how it will be used against us.
The GDPR does something very interesting: It says you’re not allowed to use someones data in a way they wouldn’t want you to.
I wish it simply said that, but it’s made somewhat complicated by a weird concept of “data” It’s clear that things like IP addresses aren’t [by themselves] your data, and even a name like John Smith isn’t data. Software understands data but not the kind of “data” that the GDPR is talking about. Pointing to “you” and “data” is a fair thick bit of regulation if you don’t want to draw a box around things and prevent sensible people from interpreting the forms of “data” nobody has yet thought of.
But keep it simple: Would that person want you doing this? Can you demonstrate why you think that is and convince reasonable people?
I’m doing a fair bit of GDPR consulting at the moment, and whilst there’s a big task in understanding their business, there’s also a big task getting them to approach their compliance from that line of questioning: How does this make things better for that person? Why do they want us to do this?
We’re not curing cancer here, fine, but certainly there are degrees.
Browser cookies is something that crossed my mind after I suggested this, but my experience as a web dev makes me immediately suspect of them as durable stores. :)
I agree with your points though.
This still doesn’t solve problems with tracking, because companies have already started to require GDPR opt-in to use their products (even when using the product doesn’t necessarily require data tracking), or to use their products without a degraded user experience.
See cloudflare, recaptcha, facebook, etc.
“You can’t use this site without Google Analytics having a K/V-auth-token”, “We will put up endless ‘find-the-road-sign’ captchas if we can’t track you”, etc.
It’s a mistake to think you can “GDPR opt-in”. You can’t.
You have to prove that the data subject wants this processing. One way to do this is to ask for their consent and make them as informed as possible about what you’re doing. But they can decide not to, and they can even decide to revoke their consent at any time until you’ve actually finished the processing and erased their data.
These cookie/consent banners are worse than worthless; a queer kind of game people like Google are playing to try to waste time of the regulators.
I’ve switched to another search engine for the time being. It’s faster, the results are pretty good, and I don’t have to keep fiddling with blocking that roadblock on Google’s properties.
I wish we had this hysteria about the recent child protection acts that target prostitution specifically in the US, making life terribly dangerous for sex workers. There was a recent Reply All podcast where they interviewed a researcher who examined murder rates for women in markets as Craigslist offered their persons section (often used for adult services) to cities. The before and after picture is that murders went down 17% on average (that’s all women; not just sex workers, so we’re just talking correlation, not causation):
https://www.gimletmedia.com/reply-all/119-no-more-safe-harbor#episode-player
The GDPR, just like the child sex trafficking protection laws in the US, will be a real time case study and it will be interesting to see the effect over the new few years.
I think American devs are afraid of the GDPR because we’ve seen how laws like this can backfire. Specifically the GDPR probably couldn’t pass in the US simply due to freedom of speech (which is why we can’t have a protected sex offender list like Australia does, or real criminal record expulsion).
I like the idea of the GDPR, but I hope it doesn’t turn into a tool for censorship (like the Right to be Forgotten laws, which the EFF opposes).
I think people should do their best to comply and some of the projects that have closed are being hysterical, but at the same time, people don’t really know what will and won’t be acceptable until we see actual enforcement and what that will involve.
Sorry but why couldn’t the US have the GDPR and why is freedom of speech relevant?
The theory is that the government isn’t allowed to interfere with people speaking.
It’s of course not true: The US has Libel laws and can obviously choose to recognise protections for certain kinds of speech (e.g. secret/clearance documents, etc).
I see this as: Devs are afraid because they have to comply to something (annoying) that they didn’t have before.
I don’t see the link between GDPR and freedom of speed. GDPR is about user data retention. Freedom of speech is pretty key in most countries in western Europe, and I don’t think they plan anything to sabotage it.
Again, GDPR is about user data retention. You could probably use that to censor a company in some way, but that would be pretty hard to prove and the company censored would first have to be audited for that matter.
I think you might mix-up GDPR with something else.
That’s wrong. They did need to keep my data safe from being hacked off their servers, there was simply little-to-no threat of law.
GDPR is not about data retention. There’s no minimum or maximum time that you have to retain data.
I’m doing a fair amount of GDPR consulting at the moment, and this isn’t the strangest theory I’ve heard about the regulations.
There’s a big chunk about keeping data safe. If you have personal data, you have a responsibility to keep yourself from being hacked. That means using best practices for minimising risk like encryption and deleting it when you don’t need it anymore, and understanding who in your company can access the data (and when they do it).
There’s also a big chunk about making sure if you use personal data, you’re only using it in a way that the subject would approve of. This really means being able to demonstrate (perhaps to a regulator) why you think you have their consent to use this data. Records and contracts can help, but the subject can also back out at any time and needs to have controls to do so.
You cannot. If you believe a company is using your data inappropriately, you report them to a regulator. You do not get to “prove your case” and you won’t be asked to show up in court. The regulator will assess the situation and prioritise it based on the claim and risk for further damage. The regulator will talk to that company and find out what is going on and correct the issue.
If that non-compliance is egregious and wilful, then the regulator has a pretty big stick, but this is far removed from “censoring a company” in any possible interpretation of the term.
No they didn’t. If it’s for-profit and no laws stopping it, then keeping your data in a barely-secure form is legal and maybe even beneficial for the organization. Most organizations that have data breaches take a financial hit before going back to normal. Strong investments in security cost money every year. Managers might also believe they reduce productivity if applied everywhere. The managers apathetic to security wanting more ways to make money will see your data as an asset whose leaking barely concerns them.
So, capitalist companies operating under their theory of morality in a system with no liability for data-related externalities should continue to collect on you ignoring as many risks as they can get away with. That’s what most were doing before regulation forced them to care a little more. Also, why I support such regulations.
Thank you very much for correcting my false ideas.
I’m not a native speaker but to me retention is the fact of holding the data, so indeed, holding it securely. In addition of that I particulary meant the “Right to erasure” and “Right of access”, I’m more familiar with the side of friends having to deal with the documentation process (to actually have somewhere why you can hold this data). But I”m by no means an expert on the subject.
By censoring I was thinking that since the proof that you need to hold a data might be pretty subjective, the regulator could probably damage a company which business is holding the data, but I agree that it’s very extrapolated.
No problem.
If you (a business) actually need to hold data on a subject, then indeed the “proof” is quite subjective. You have to feel comfortable you can convince regulators that your processing is a part of you providing a service for that subject, and that they would expect you to use their data in this way. Simple examples might be keeping someone’s address in order to ship them goods that they ordered.
If you are an individual and you want to compel a company to remove/erase data they have on you, understand that they can ignore such a request with regards to things like the address they used to ship goods (among other reasons).
If you are an individual and you want to ask a company to provide data they have on you, it should be easy to do so with regards to things like the address they used to ship goods to you. They’re under no obligation (however) to discover who you are – that is, if you send them an IP address they’re not required to link any information or activity they have on that IP address to you.
It would obviously depend on the details, but it’s not inconsistent with the US’s view of free speech to regulate various kinds of commercial record-keeping and enforce privacy and access protections on those records. For example, healthcare data is fairly strongly regulated in the US, and this hasn’t been found to be a constitutional problem. (The “right to be forgotten” laws are a different story.)
I health company (insurance, hospital, whoever) is bound by HIPPA. A school is bound by FERPA. They can’t divulge information. But if someone leaks someone else’s medical records and a news paper publishes them, that information is protected in the paper. Now wherever the leak happened, that’s a problem if it was someone covered under HIPPA.
Criminal records can’t be expunged in the US. Not really. While your record was public, some other company scooped that data up and can sell it forever even if your official record is clear. Maybe we’ll have laws that will force companies to ignore those styles of background checks (some states probably do).
Actually this is a good question, how does the GDPR affect collecting data about people who aren’t your customers or who ever visit your website or store front? Does it say anything about collecting public data?
There are many companies offering to help with GDPR for 6 figure amounts. Cost of compliance is in the millions for many larger companies. (this author clearly doesn’t understand the true cost of things)
So far there are no real privacy benefits for me as a user. I don’t care about people tracking my IPs personally, or running analytics, retargeting, or doing split testing. I care about people losing access to my passwords, social, credit cards, messages, pictures, location data etc. I haven’t seen much improvement in that area. End result so far seems to be more checkboxes and the ability to delete my user account. #awesome
It’s too early to tell if anything good will come out of GDPR. Fingers crossed though, there are real privacy issues to solve and I hope it helps with that.
All the GDPR stuff is full of “business organization company …” What about individuals operating non-commercial websites? Specifically outside the EU?
Let’s say I operate a forum in Russia, some EU citizens come and save their home addresses into their profiles on the forum and then want me to erase all the things. I reply with “sorry, your data is stored in 10 blockchains, it’s immutable”. They complain to their local data protection thingy. What happens next?
I assume this is about IPv6, because legacy IP addressees don’t have any portions (they used to have).
Any idea how GDPR would relate to P2P software?
Say I wrote some social network that runs as a P2P application. As I want the network to take off, I also host my own nodes. Users, by using the software, will broadcast all kinds of data into the network, which is then stored, cached, and redistributed by the nodes. It will be impossible to give users a “delete all my data” flip; nodes may be hosted by anyone, anywhere, anonymously. I could potentially delete user data from my nodes on demand, but even that may be difficult to arrange for in practice. If an encrypted hash->value store is used, it might actually require me to collect more data in order to identify the rightful “owner” of a given object. But collecting such data about users may be impossible, since much of it depends on what client they use and what nodes they connect through, how they authenticate, etc.
The XMPP community is asking these questions right now and it’s interesting how this would affect distributed systems like Mastadon.
If, by any chance, you happen to have links to relevant discussion, I’d be so grateful. :-)
I think this actually is possible to handle. If a third party gives you a statement about GDPR-compliance then that’s possible.
Also note that part of this actually isn’t a new legal problem in most European countries. Even though such topics come up again with GDPR. Informing people about collected data and in many countries the right to correction and deletion of data has been existing for decades. Also why not binding the EU in 1995 adopted the Data Protection Directive, which in large parts was implemented by various European countries.
In theory, if people’s fear is greater than the true cost, it seems like there’s an opportunity for people to offer recommendations and then accept the legal risk in exchange for a fee. Since the risk here is posited to be tiny, you could charge them a tiny amount and take all the risk on yourself and make money. Then we can see the true cost of implementation. For instance, I have a blog I haven’t written anything on since college that has comments on it. If the cost to indemnify is $2.5k, I might as well just black hole EU traffic before it hits the site. If the cost to recommend a bunch of Wordpress plugins and then accept indemnification is $25, I might pay it and install the plugins.
Since this blogpost is clearly in reaction to announcements of the sort that monal.im recently made I think it is reasonable to demand something simple from proponents of the law. Here’s an offer: $25 for initial audit, and $5/yr afterwards for indemnification so long as I comply with remediation suggestions. I’ll retain the right to publish the remediation suggestions. What’s your offer?
(Personally, I’m not worried about GDPR for my blog, which is why I value these services so low, but I think the lack of existence of these services is evidence that those arguing that these reactions are hysteria are incorrectly pricing the cost of operating under the increased regulation)
I can’t audit a company for $25. That’s crazy. A big part of GDPR consulting is getting companies to document what data they collect, what they do with it, how long they keep it, and why do they think they’re keeping it safe. This takes days- sometimes months to tease out. If you can write it on a napkin, I could just about read that napkin for $25, and tell you if it’s compliant or not, but really? I’d tell you for a beer.
If you’ve got a Wordpress site, and you respond to reasonable email/post on how to (a) delete themselves (if they’re a commenting user) and/or (b) their comments, as well as (c) get a list of all their comments and log-in attempts, then you’re probably fine.
If you’ve got a Wordpress site without any comments/users, then you’re probably fine.
If you’ve got a Wordpress site without any comments/users, but you use some plugin from some third-party site to do comments; etc, then you might need to talk to that third-party. Maybe.
Chasing down that third party is where most people blow their budget.
Well, then whatever you’d charge is the true cost of compliance, isn’t it? OP’s big argument is that people are hysterical about GDPR: i.e. that fear of costs are much higher than costs and that costs after accounting for risk are actually not high. If OP is right, then OP could become fabulously wealthy by arbitrage between these two by selling indemnification insurance.
OP is not doing that, and as a matter of fact, there’s no one doing that at a rate these people would use. This means that either people aren’t hysterical or OP has underestimated costs after accounting for risk.
Essentially, to OP I’m saying “if what you’re saying is true, you have a direct path to incredible wealth. That you aren’t taking it makes it apparent to me that what you’re saying isn’t true”. An audit is worthless without the indemnification.
Not really.
It’s more like buying insurance against bad PR: I’m sure you can find someone to sell it to you, but it’s much cheaper to simply not be an asshole.
That’s not true. It’s like buying insurance against bad PR that requires that you’ll behave in some fashion and assuming the fallout of the PR is measurable.
If cost to make this happen is low but I think it’s high and I’m willing to pay some number past the cost to make it happen, then you can get rich. The article is arguing that cost to make this happen is low and I think it’s high. The only wiggle room it has is whether I’ll pay some number past the cost.
So far no one has offered the service while taking on the responsibility of having done a good job. Not one person.
There is no shortage of people saying it’s easy and low risk and a huge shortage of people who’ll do it. Chances are it’s what you’re saying: it’s months of work. It’s expensive, and people are pricing accordingly when they exclude the EU.
Yes it is true.
If the cost of the audit and assurances from the regultors are lower than this indemnification insurance (which I can only offer if I audit and speak to the regulators) then nobody will offer than indemnification you’re looking for except snake-oil salesmen.
Same thing for PR: once you’ve accurately evaluated your specific risk, there’s no point in buying insurance. Just fix the problems.
Here’s another way to look at it: People buy insurance when they can’t evaluate the risk, and actuaries are able to take the mean projected risk to cost that insurance. If the variance on that risk is low then it is mass market insurance. If it is high but still lower than the market mean then you can buy revenue protection (like futures). If the variance exceeds the market mean, however, you’re not selling the work of an actuary, you’re making bets not selling insurance. Market segmentation for insurance purposes is our audit in this case, and since I can accurately determine compliance/risk it’s much better for the company to just fix these issues.
Here’s yet another way to look at it: the risk of your being in a plane crash isn’t 1:11million if you don’t fly. A sucker is someone who buys plane crash insurance — even at those rates — when they don’t fly.
Also: That the GDPR is not so complex that few companies actually need an audit is the point of the article. Not that “risk is low”.