Author here. I’d figured they were big, but I had no idea that big until I did a www-wide TLS scan. The dataset is freely available on S3 if you wanna analyze it yourself! Here for questions
They give away paid products for free. Products that Google penalizes you for not having. Then, many people needing or wanting that product used their free alternative.
I still don’t know if that vs version with more paying customers is a good thing in long run. Good for now, though.
The old situation with SSL certs was classic artificial scarcity. They successfully convinced people that having a human in the issuance pipeline offers better security and charged people for it.
I don’t know if scandals with StartSSL et al. issuing backdated and outright fake certs and experiments with EV certs issues for “Stripe inc.” registered in a different state reached the mass consciousness, but it seems there’s now tacit acceptance of the fact that “the cert belongs to the person who controls the domain” is all authenticity you can achieve without checking the fingerprints yourself. Naturally that took business away from people charging $100/yr for incredibly cursory authenticity checks.
See also https://en.wikipedia.org/wiki/Fibonacci_number#Closed-form_expression for more details on the closed form solution.
See also: https://en.wikipedia.org/wiki/Minifloat
Related: hoops that one jumps through when training using fp16 accelerated in hardware: https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#training
You might be interested in something like http://colorbrewer2.org — multi-hue, single-hue, with awareness of reception printed / photocopied / seen by various eyes
We need a better classification and naming for these SgxSpectre/MeltdownPrime/SpectrePrime/BranchScope/Meltdown/Spectre-like branch prediction vulnerabilities as using arbitrary names is a mess. What was wrong with simply referring to these vulnerabilities by their CVE numbers? Do we really need a logo and marketing glossy for each new variant? I blame the Heartbleed people.
Six months from now, you’ll remember the difference between CVE-2017-2345 and CVE-2017-2435?
What about naming them from a rotating list of names, à la tropical storms? Particularly devastating ones could have their names retired. That or using some other arbitrary naming scheme that could complement the CVE numbers.
We’re at number CVE-2018-9105 for this year, and it’s not even the end of March.
How about a combination system, something similar to a set of DiceWare-style wordlists?
You’d likely need at least 5 lists to make enough phrases, and new wordlists can be chosen each year - maybe by some ridiculous contest system which could be used to promote computer security. An [proper-name’s adjective noun adjective verb] system might be fun, first thing I randomly generated with such a system was:
“Merlin’s Automatic Priesthood Relevant Chat” and MAPRC for short.
When you discover a vulnerability, when it’s assigned a CVE number, you’ll be given the opportunity to choose a new from a set of random rolls, and then for major vulnerabilities, the words used will be retired, and the system can exclude future rolls of the same acronym.
Interesting note: This might be impossible because of politics. I’m in Florida, and can tell you first-hand the ridiculousness that goes into naming. After protests by women and allegations of sexism, they started adding men’s and women’s names to the lists. The system has to be continually tweaked by an international committee at the World Meteorological Organization. In recent times after some very destructive hurricanes (Katrina, etc.), some women have protested that attributing destructive and deadly forces to women is inappropriate, and that such ‘negative’ violence and destruction is the realm of men. Then, in regards to the names themselves, they now include popular French, Spanish, Dutch and English names because they don’t want to use names from only one culture or appear exclusionary, but now in recent times some other people are complaining they don’t want “their” culture or country associated with these terrible things. I believe the list was determined by allowing for names and cultures from anywhere the storms hit.
Anyway, this is why we can’t have nice things.
Probably not, but I’m not going to remember the difference between all these names anymore either! Perhaps something similar to the CARO/virus scanner industry solution, of a somewhat descriptive and roughly hierarchical name that’s hopefully self-describing … however, this still breaks in implementation with different scanners giving an identification of the same threat very differently.
Working on polishing up my new backup tool. I set out to solve a set of problems:
approaching something i’m happy with people reviewing and using, though there is lots of testing and stabilization that i want to do.
This sounds like the tool I’ve been looking for, plus some features I didn’t know I wanted :-D
Sound’s interesting! Did you, by chance, try borgbackup and could elaborate on the differences between borg and yours? I am not sure about your last point but at least the others all seem to be supported by it as far as I understand.
I was unsatisfied with borg for a few reasons, which I will probably elaborate more on in a post somewhere, in general I am highlighting ease of use and I think I have a more user friendly design. I will see if anyone agrees with me once I get it out there.
Write only mode
Write only mode
Yes!! Thank you!
I have been so jealous of Borg users for so long, but can’t switch because only Duplicity has this feature.
Isn’t borg serve --append-only what we are talking about here?
borg serve --append-only
No. Borg only supports symmetric encryption, and closed the public key encryption issue as wontfix: https://github.com/borgbackup/borg/issues/672
By implementing public key encryption, you allow data sources to operate in what @ac calls “write only mode”, because if a compromised device only has your public key, it cannot compromise your backups (there is also the issue of data destruction by overwriting, but even raw S3 can be used as an append only store if you enable object versioning).
My use case is installing backup software liberally on every device I use (and I use more devices than I have sole control over). For example, with Borg, you could not back up your home directory on a shared server without giving the administrator of that system the ability to decrypt your entire repository.
My implementation is currently not exactly as you described, but perhaps I can accommodate this with not too much difficulty.
edit: I am sitting in a cafe thinking carefully about how to do it without affecting usability for less advanced users right now.
Good points, thanks for the explaination!.
If you trust the server to not leak data a next best approach is to have a symmetric key per device and then use ssh access controls to prevent access.
If you trust the server just use TLS or SSH tunnels to encrypt in motion. If that’s really your threat model there is no need for additional complexity.
For example, with Borg, you could not back up your home directory on a shared server without giving the administrator of that system the ability to decrypt your entire repository.
For example, with Borg, you could not back up your home directory on a shared server without giving the administrator of that system the ability to decrypt your entire repository.
You have to backup to a different machine with a different administrator, it is true the first admin can decrypt your data, but he cannot fetch it because the ssh key can be granted write only access, even with borg via append only. a random key that is encrypted with a public key then discarded by the client is probably better though, still thinking how to do it well.
Looking forward to test this! Much struggling with actual backup solutions!
Awesome! Can’t wait to hear more about this.
I’d really like to know about how you tackle the intersection of client-side encryption and de-duplication.
Its relatively straight forward using a https://en.wikipedia.org/wiki/Rolling_hash function. The ‘shape’ or ‘fingerprint’ of data guides you in finding split points, and each split chunk is encrypted independently. There is potential that size of chunks may give some clues about potential contents, but there are a few mitigations you can do such as random padding, keeping your hash function secret and a few others.
Another sticking point is allowing the server to do garbage collection of chunks that are no longer needed while at the same time not being able to read the user data. I came up with a solution I hope to get reviewed around layering trust.
I know about splitting a file into chunks, but how do you derive a repeatable IV/key for a given chunk without leaking the contents of it, or opening yourself up to some form of chosen-plaintext attack?
I use a random IV, and random encryption key, but the content address (i.e. dedup key) generated is repeatable by the client as HMAC(DATA, CLIENT_SECRET). AFAIK the attacker cannot recover the secret or decryption key even if he has a chosen plaintext, and has no way to derive the data without the secret. An attacker also cannot forge chunks because the HMAC address will be obviously wrong to a client.
There is also a write cache that prevents the same data from being uploaded twice with the same content address but different IV. Though that is more a performance thing than security, I could be wrong. I hope people can shoot down any flaws in my design which is why I need to get it finalized a bit.
That sounds like a really useful tool! I’d seen a reference to Convergent Encryption today/yesterday, which “deduplication of similar backups to save space” sounded like. https://en.wikipedia.org/wiki/Convergent_encryption sounds like there are fundamental security implications to using it btw; deduping sounds pretty orthogonal to the rest of what it does, and I’d be excited to see a Show and Tell post :)
Yeah, I rejected that specific approach for the reasons described. My keys are random, but some of the ideas are similar from a distance.
This sounds similar to something I’ve had a couple of stabs at (one such stab “currently”). What language are you writing in?
My approach is built around two basic “processes”.
A “collection” phase during which a series of scripts do source specific “dumps” (eg dump mysql, ldap, etc, identify user generated file system data, etc) into service backup directories
A “store” process which compares a hash of each raw file (created by the collection phase) to an existing hash (if found). If no match is made, the hash is stored and the file is gpg encrypted using the specified public key. Once this process finishes, the hash files and gpg files are stored remotely via rsync, using the --link-dest option to create “time machine style” snapshots.
The heavy lifting is obviously all “written already” in terms of shashum, gpg and rsync. The glue layer is just shell.
I’d be keen to see how our approaches differ and if we can take any ideas from each other’s solutions.
Mine is written in go currently. A big difference is It doesn’t sound like your approach deduplicates within files or across similar but not identical files, a tool such as mine could easily be hooked into your store phase to deal with cross file deduplication.
There are similar tools currently out there such as ‘bup’, ‘borg’ and ‘restic’ you should look into. I feel like mine is superior, but those all work and are ready today.
No, it doesn’t attempt any kind of de-dupe except for not storing dupes of the same file if it hasn’t changed.
That’s part of why I’m not using those other tools - I want pubkey encryption (as mentioned elsewhere here, it means eg two+ devices can share backup space without leaking data they don’t already possess to the other) and I’d prefer if, when all else fails, I/someone can restore data from a backup by just running regular shell commands.
This part can of course be built into a companion tool, but being able to do ssh backup-space -- cat backups/20180210-2300/database/users.sql | gpg | mysql prod-restore-copy is a huge bonus to me. No need for the remote end to support anything beyond basic file access, no worrying about recombining files. No worrying about whether I have the same version of the backup tool installed, and/or if the format/layout has changed.
ssh backup-space -- cat backups/20180210-2300/database/users.sql | gpg | mysql prod-restore-copy
So possibly we have not as many overlapping goals as I originally thought, but it’s always nice to hear about activity in the same space.
Yeah, tbh I wouldn’t release a backup tool as without fully documenting the formats and having them re-implementable in a simple way, e.g. as a python script. You need a complexity cap to protect you from yourself. I agree using public/private key pairs is a good idea.
Your system seems decent, though you don’t really have access controls protecting the machine from deleting its own backups (perhaps a worm that spreads via ssh). Do you deal with backup rotation?
So, the original version of this was built to store on an internally controlled file server, and the “store” process finished by touching a trigger file, which (via inotify) caused a root-perms having daemon to run on the storage host and remove write accesss to the last backup from the ssh user rsync connected as
The same daemon also handled pruning of old backups.
The new version is designed to work with offsite storage like rsync.net/similar so for now it relies on remote end functionality to protect previous versions (eg zfs snapshots).
The map: https://broadbandmap.fcc.gov/
Link to poll: https://lobste.rs/s/kuzz8x/unicode_emoji_11_0_characters_now_final#c_7qa5od
Christopher Alexander is mentioned in the text, as being against the blobitecture and pro gardens-and-ponds; this is the Christopher Alexander who created a pattern language for architecture, that a gang of four in the 1990s riffed off of to make a pattern language for C++ (design patters)
Previous discussion: https://lobste.rs/s/tcjjdx/origins_pattern_theory_by_christopher
Seems like a great optimization to add to the query planner based on table statistics!
Yeah this is genuinely embarrassing. I feel embarrassed for them
Not to derail about presentation, but does anyone else trying to view this get redirected to https and then get a self-signed cert valid for one day?
Yes, everyone. Its working as intended by the author.
Yes. Flag and move on.
Reprising and reformatting something I wrote on that other site about this:
The problem with JWT/JOSE is that it’s too complicated for what it does. It’s a meta-standard capturing basically all of cryptography which wasn’t written by or with cryptographers. Crypto vulnerabilities usually occur in the joinery of a protocol. JWT was written to maximize the amount of joinery.
Negotiation: Good modern crypto constructions don’t do complicated negotiation or algorithm selection. Look at Trevor Perrin’s Noise protocol, which is the transport for Signal. Noise is instantiated statically with specific algorithms. If you’re talking to a Chapoly Noise implementation, you cannot with a header convince it to switch to AES-GCM, let alone “alg:none”. The ability to negotiate different ciphers dynamically is an own-goal. The ability to negotiate to no crypto, or (almost worse) to inferior crypto, is disqualifying.
Defaults: A good security protocol has good defaults. But JWT doesn’t even get non-replayability right; it’s implicit, and there’s more than one way to do it.
Inband Signaling: Application data is mixed with metadata (any attribute not in the JOSE header is in the same namespace as the application’s data). Anything that can possibly go wrong, JWT wants to make sure will go wrong.
Complexity: It’s 2017 and they still managed to drag all of X.509 into the thing, and they indirect through URLs. Some day some serverside library will implement JWK URL indirection, and we’ll have managed to reconstitute an old inexplicably bad XML attack.
Needless Public Key: For that matter, something crypto people understand that I don’t think the JWT people do: public key crypto isn’t better than symmetric key crypto. It’s certainly not a good default: if you don’t absolutely need public key constructions, you shouldn’t use them. They’re multiplicatively more complex and dangerous than symmetric key constructions. But just in this thread someone pointed out a library — auth0’s — that apparently defaults to public key JWT. That’s because JWT practically begs you to find an excuse to use public key crypto.
These words occur in a JWT tutorial (I think, but am not sure, it’s auth0’s):
“For this reason encrypted JWTs are sometimes nested: an encrypted JWT serves as the container for a signed JWT. This way you get the benefits of both.”
There are implementations that default to compressing plaintext before encrypting.
There’s a reason crypto people table flip instead of writing detailed critiques of this protocol. It’s a bad protocol. You look at this and think, for what? To avoid the effort of encrypting a JSON blob with libsodium and base64ing the output? Burn it with fire.
I have a related but somewhat OT question. In one of the articles linked to by the article , they say this:
32 bytes of entropy from /dev/urandom hashed with sha256 is sufficient for generating session identifiers.
What purpose does the hash serve here besides transforming the original random number into a different random number? Surely the only reason to use hashing in session ID generation is if there’s no good RNG available in which case one might do something like hash(IP, username, user_agent, server_secret) to generate a unique token? (And in the presence of server-side session storage there’d be no point to including the secret in the hash because its presence in the session table would prove its validity.)
hash(IP, username, user_agent, server_secret)
Yeah, if urandom is actually good, then hashing it serves no real purpose. (In fact if you want to get mathematical, it can only decrease the randomness, but luckily by an absolutely negligible amount). Certain kinds of less-than-great randomness can be improved by hashing (as a form of whitening), but no good urandom deserves to be treated that way.
The reason for that is PHP is weird. PHP hashes session entropy with MD5 by default. Setting it to SHA256 just minimizes the entropy reduction by this step. There is no “don’t hash, just use urandom” configuration directive possible (unless you’re rolling your own session management code, in which case, please just use random_bytes()).
This is no longer the case in PHP 7.1.0, but that blog post is nearly two years old.
Thanks for that very thorough dissection of JWT. Are there web app frameworks/stacks that do have helpfully secure and well-engineered defaults that you’d recommend?
The post itself offers a suggestion (at the bottom): use libsodium.
The author refers to Fernet as a JWT alternative. https://github.com/fernet/spec/blob/master/Spec.md
However, Fernet is not nearly as comprehensive as JOSE and does not appear to be a suitable alternative.
Hah, it seems the article changed a few times, and not just the title…
And comments on https://datatracker.ietf.org/wg/cose/documents/ ?
What surprises me most about this is that the author is surprised about commodity hardware failing. I remember hardware breaking all the time when I was younger, quite a large number of rotating media failures (cd-r, floppy, magnetic disk) and a few motherboard failures from old workhorse computers, and came to take for granted that consumer electronics could fail.
I don’t have any consumer electronics for as long as I did when I was younger, so perhaps the nature of the upgrade cycle insulates one from such infelicities.
It does sort of point out a distinction between building out your own server and renting cloud time. When building out a single server people often spend (lots of) extra money on “reliable” components, with the view that maybe it will help uptime. Whereas if you have 100000 servers in a data center you’re just going to expect a few to fail every hour no matter how “reliable” the components, so why spend the extra cash on something you’ll just replace anyway?
I wonder just what ESR intends to do after he converts all the CVS repos he can find. Is he going to force upload them and make the project use git?
Sure, I mean the only reason OpenBSD is still using CVS is because we couldn’t find anyone with enough RAM to convert our tree to git.
Wait, really? If http://www.openbsd.org/cvsync.html is accurate, then a machine with 100x the memory of the largest repo is under three bucks an hour on EC2 (r3.8xl, $2.80/hr). What sort of a conversion process is it?
No, not really.
Interesting, but the Java example is a caricature, and no functional equivalent example is provided for comparison.
If you’re curious about what people are up to in the FP world, you should learn Haskell. I used to be a Clojure and Common Lisp user, not going to get expose to the full breadth and depth of how far the field has come outside of Haskell.
Yes, I know that Haskell is probably the best candidate to learn FP nowadays. Thanks for the advice.
Can you delve deeper into learning Haskell vs. Clojure? I’m currently learning Clojure (http://www.braveclojure.com/) but would be willing to switch to Haskell given a good reason.
I do Clojure for a living. Don’t bother with Clojure - go straight to Haskell.
Your blurb at the end of this article helped a lot. Thanks :).
Here’s a question about that, though. (I learned Haskell about a decade ago and have dabbled since, and it’s really cool. I just picked up Clojure because Ruby’s DSL for Storm was a flustercluck and have been writing topologies in Storm using Clojure for a couple months.)
Are you recommending not learning Clojure for the educational experience of Haskell, or the practical?
If the latter, what about all the aspects of writing in a programming language that aren’t code?
Clojure doesn’t feel very mind-expanding to me; it just feels like Java with lambdas and parentheses and a much harder work ethic. Whereas I feel like I learn a lot writing Haskell.
But if I need to use any of a thousand Java libraries I can, and leiningen makes it super easy to import any library in Maven. And the usage of the more popular libraries means that other people may have run into similar bugs or limitations, which makes writing code that much easier. Also, the interop with Java (from pulling in maven packages to the FFI to and from Java) feels really seamless, in a way that Haskell’s FFI interop with arbitrary C code somewhere has never seemed.
I’d be interested to hear your thoughts
Those are part of the reasons I favor Haskell. The code isn’t the only reason that it’s delightful.
But if I need to use any of a thousand Java libraries I can, and leiningen makes it super easy to import any library in Maven.
You don’t need a thousand Java libraries. It’s highly unlikely you’re boiling the ocean. I’ve heard this stuff 40 times time before. If you have a specific dependency you’re worried about not existing, say so. Otherwise, you’re subjecting yourself to FUD. :)
Also, the interop with Java (from pulling in maven packages to the FFI to and from Java) feels really seamless, in a way that Haskell’s FFI interop with arbitrary C code somewhere has never seemed.
C FFI in Haskell has seemed fine to me, but I guess if you’re not used to C that would do it.
I don’t consider FFI to Java a plus because I don’t need or want Java. It doesn’t do anything I couldn’t do better in Haskell. Whereas, C lets me do things I wouldn’t otherwise be able to do. That said, I never actually need the FFI. Pure Haskell is all I’ve needed.
I made a distributed k-ordered unique id service named Blacktip quite fast (faster than the original Flake by ~2x) just with simple, pure Haskell. If I wanted to push things a bit I might make some C wrappers for lower level APIs, but it’d be pretty minimal.
If the last time you poked at Haskell was a decade ago and all you did was dabble, you need to learn Haskell.
I’ve done a lot of Clojure and have moved over all my work to Haskell post-haste.
Here’s my guide for learning Haskell: https://github.com/bitemyapp/learnhaskell
Just learn both; Clojure is a pretty cool, fun and useful language; certainly worth learning! Haskell has some more in-depth type theory stuff; but well over half the fun of that requires you to care a far bit about type theory. Which is fun; but is maybe not the whole point of programming :)
(I quite like both languages, and would be very happy to know them both well.)
Haskell has some more in-depth type theory stuff; but well over half the fun of that requires you to care a far bit about type theory.
What? No. I don’t know crap about type-theory and I’m way happier using Haskell.
You are encouraging people to impoverish themselves. :\
I honestly don’t understand how your response is related to what I said …; did you read what I wrote?
I got the feeling you were implying Haskell would be less fun than Clojure if you didn’t care deeply about type theory.
I guess I was saying that; but I was trying to say “Half the fun of haskell is caring a lot about types and type theory, and that’s not for everyone, and maybe isn’t even the whole point of programming”.
That’s good to know. :)
I work at a (mostly) Rails startup but I’ve been dipping my toes into a few other languages that are faster so I can bring something valuable to the table. Elixir, Clojure, and Go are languages I’ve had my eyes on but Haskell has an advantage over the other functional languages in that it compiles to machine code.
They all have advantages and disadvantages. Do whatever seems the most fun and relevant for you :)
I’ve heard that MQTT is a nice protocol, but the article here doesn’t entertain any alternatives to MQTT, just a passing mention. Anyone with more experience know why to use MQTT versus other messaging protocols?
Hi, I’m the author. The reason I didn’t spend much time on the alternatives was (1) I was running out of space (lame, I know, but it’s a reality that authors have to keep in mind) and (2) MQTT is the most promising. My intention was to get devs thinking outside the box of “oh, I’ll just use HTTP” or “oh I’ll just open a TCP socket” and underline the problems with those approaches.
Many people pointed out that CoAP is an alternative. I plan to write another blog soon that centers around CoAP. It’s an excellent M2M protocol, but it’s more of a lightweight alternative to HTTP since it’s RESTful architecture, not publish/subscribe like MQTT. Also, architectures that use CoAP are typically inverted from what you normally think of with HTTP. All the other pub/sub options that people are using for M2M/IoT aren’t very well suited for the job as compared to MQTT.
STOMP is immature and rather bulky. AMQ & XMPP are too heavy on the client side. Kafka scales well, but is very heavy on the client side. Then there’s always the option of opening up a raw socket and writing to it; hopefully I’ve talked you down from that one though ;) MQTT fits the domain well because it’s very resource conscious and lets you ignore pesky problems like network disconnects.
You mention in your comments that XMPP “uses HTTP.” This is incorrect. Yes, we do have extensions for long-polling HTTP, but that is not the primary interface. Also, I would expect to see JSON transforms before too long.
Thanks, that’s informative. I plan to get my hands deeper into XMPP soon. Maybe then I’ll say less uneducated things about it :)