Reminds me of Format Transforming Encryption (FTE) which can encrypt data such that the ciphertext conforms to some regular expression. Its purpose was in subverting regex-based DPI engines used for network censorship, for example by making the ciphertext look like an ordinary HTTP connection.
A weakness of this approach (in evading detection) is that the ciphertext is oblivious to what it’s actually mimicking, e.g. the ciphertext wouldn’t necessarily conform to the protocol specification (e.g. the Content-Length header wouldn’t match up with the size of the body)
However, it seems quite applicable here since any cryptographic cipher can be turned into a makeshift CSPRNG by encrypting random noise
Interesting! That kinda reminds me of tor’s pluggable transports. At least it sounds like they try to achieve the same thing.
There is in fact a Tor pluggable transport that uses FTE: https://blog.torproject.org/tor-heart-bridges-and-pluggable-transports/
The project is here: https://github.com/kpdyer/fteproxy
Would it be sufficiently high entropy to generate a random key, use “aaaaaaaaaaaaaaaaaaaaaaaaaa” as the input encrypt it, and use the encrypted text as the password ?
Yes, ciphertext is indistinguishable from random: https://en.wikipedia.org/wiki/Ciphertext_indistinguishability
However I’ve not attempted to generate passwords with FTE myself, so you can try it for yourself
Surprisingly similar to a post I wrote about this topic before. I landed on a design involving a key-value store, and I found that it’s much easier to explain leftover random data when the data storage is shared between multiple people
Usability and complexity are more existential threats to IPFS than any lack of privacy from content-addressing. Hashing the hash seems like just another tacked on afterthought
I have had the growing feeling that the “encrypt everything” push of the past few years has done nothing to improve safety for the average human being, while increasing centralization and making censorship easier than ever. How long before the next “huh, half the internet is down again” is caused not by a DDoS or a borked configuration at AWS but by a certificate revocation which everyone will claim is accidental, but which will just so happen to disrupt a major political rally in Western Morgravia? How long before I have to have my ID on file before any CA will talk to me?
I have had the growing feeling that the “encrypt everything” push of the past few years has done nothing to improve safety for the average human being
That’s an extraordinary and non-obvious claim, based on what?
Now we just need someone to package python + pip into a zip on GitHub and setup a bash script to install that to install tailwind. (kidding, I do get that the goal is to get everything to also be implemented in python)
I’m not really sure I understand the desire to port projects that were relying on Node into Python, Ruby, etc. Node “won” the competition around how to compile front-end assets years ago. If Node requires too much fiddling around, why not improve the Node ecosystem, instead of porting a bunch of the logic to another platform?
It is open source code though, so people should work on whatever makes them happy inside.
https://deno.land/ is a great alternative to node
Looks cool but please don’t import this.. it’s a 3 line implementation
func Of[T any](value T) *T {
return &value
}
Very leftpad-esque
It’d be nice if there was a consistent standard for vendoring a subset of packages. That way you could have the benefits of code re-use, while still maintaining a reference to where the code came from, and avoiding the dangers of that code being modified by a malicious actor.
I wouldn’t use it for a one-liner like this, but it’d be nice to have for one-file rarely-changing packages so you could avoid adding bloat to the dependency graph.
For me the reason I use Windows over Linux nowadays is for better game support and better GPU drivers.
I beg to differ. One question would be, what is substantially different between bash and other scripting languages that makes it more prone to a single-file script where different sections of the code are only indicated by comments?
I like my configuration to be “easily portable”; that is, copying a single file to a new machine is a lot less work than copying six files. And sure, there are a myriad ways to deal with this; I even wrote an entire script for this myself. But sometimes it’s just convenient to be able to copy/scp (or even manually copy/paste) a single file and just be done with it.
I used to use the multi-file approach, but went back to the single-file one largely because of this.
I also somewhat prefer larger files in general (with both config files and programming), rather than the “one small thing per file”-approach. Both schools of thought are perfectly valid, just a matter of personal preference. I can open my zshrc or vimrc and search and I don’t have to think about which file I have to open. I never cared much for the Linux “/etc/foo.d”-approach either, and prefer a single “/etc/foo.conf”.
How I personally use it is that the non-portable snippets go to ${BASHRC_D}
instead. Having worked as a developer in projects with very heterogeneous stacks, I got fed up of the constant changes to ~/.bashrc
that would have to be cleaned up sooner or later.
My usual workflow when I am working on a new machine temporarily is to copy only ~/.bashrc
. Any additional config is added to ${BASHRC_D}
as needed.
copying a single file to a new machine is a lot less work than copying six files
Is it? I have all of my configs in a git repo, so it’s a single command for me to git clone
to a new machine. Copying a single file is maybe simpler if that’s the only operation that you do, but copying and versioning a single file is no easier than copying and versioning a directory. The bash config part of my config repo has a separate file or directory per hostname, so I can have things in there that only make sense for a specific machine, but everything is versioned as a whole.
I never cared much for the Linux “/etc/foo.d”-approach either, and prefer a single “/etc/foo.conf”.
For files that are edited by hand, this is very much a matter of personal preference. The big difference is for files that need to be updated by tools. It’s fairly trivial to machine edit FreeBSD’s rc.conf
in theory, because it’s intended to be a simple key-value store, but it’s actually a shell script and so correctly editing in a tool it has a bunch of corner cases and isn’t really safe unless you have a full shell script parser and symbolic execution environment (even for a simple case such as putting the line that enables the service in an if
block: how should a tool that uninstalls that service and cleans up the config handle it?). Editing rc.conf.d
by hand is a lot more faff (especially since most files there contain only one line) but dropping a new file in there or deleting it is a trivial operation for a package installer do do.
Same thing I’d say about Python: it’s a interpreted scripting language where multiple files are only loosely linked together and there’s no compilation or verification step. At least usually you have source files right next to each other but in this case they’re associated using environment variables. Just feels like overengineering.
(…) there’s no compilation or verification step
Still no difference from a single-file approach. So I’m afraid I fail to see how is this a relevant aspect in making such an option.
At least usually you have source files right next to each other but in this case they’re associated using environment variables.
Environment variables like ${BASHRC_D}
are nothing but a convenience. It could be replaced by local variables or sheer repetition with no downside. It is a matter of personal preference.
Just feels like overengineering.
There is no engineering involved in that at all, so calling it “overengineering” feels like overestimation :)
You can implement this yourself in nginx like this:
location /ip {
add_header Content-Type "application/json";
return 200 '{"host":"$server_name","ip":"$remote_addr","port":"$remote_port","server_ip":"$server_addr","server_port":"$server_port"}\n';
}
and you will get back a little JSON with everything you want. For 1-off manual requests it doesn’t matter what you use, but if you are writing code then just host it yourself, it’s not difficult.
I don’t know much about nginx, but isn’t that essentially the very first optimization step taken?
I migrated to nginx and set up nginx to answer the requests by itself and removed the Python scripts.
Yes. I think the offer is that you can self-host this with just a copy and paste if you decide you don’t want to depend on icanhazip.
But is it really decentralized without a white paper, an IETF working group draft, and endless discussions on whether the format should be JSON or XML (with s-expression zealots sniping from the sidelines)?
Probably just me, to be honest.
(Look, I’m not a fan of XML, but there are so many of the issues people complain about with JSON that are simply solved with XML. Like types.)
Go ships with type-safe JSON parsing: https://blog.golang.org/json
Of course. Right tool for the job and all that.
Contrary to popular rhetoric, XML ad JSON are not competitors and solve wildly different problems. Each have their place.
That’s what content negotiation is for. If unspecified, text/plain
output with just the IP address.
Yes, but the author didn’t share how that was done and I did. I’m not saying you shouldn’t or can’t use icanhazip, I’m saying if you are writing code to figure out an external address, you should not use icanhazip and friends, you should copy and paste this little snippet into your NGINX (or do similar in whatever server you are using) and host it yourself. It’s not difficult to do. If everyone bothered to do this, then this guy would have had way less heartache over the past decade+.
icanhazip and friends are great for 1-off manual requests, but relying on it and other free services in deployed code that are trivial to implement yourself are likely bad ideas. Here, this guy worked HARD to make all of our lives easier, and for that I’m grateful, but we should have been doing the right thing and hosted stuff like this ourselves if we were doing anything other than manual 1 request a day like usage against it.
With the caveat that this must be deployed separately from the existing infra, on a public facing network and expose only the public facing interface. Otherwise, the public IP will actually be the internal one.
Well, like all things.. complications arise! We deploy it and don’t take any of these precautions. I want the address the client uses to connect TO ME, which is usually the public IP.
Your use case may be different and perhaps your caveats are warranted. You are never guaranteed a public IP, you are just guaranteed the IP the client used to connect. Usually they are the same thing, but not always, and the IP used to connect to google.com might be different still, even if you get back a public IP, depending on the network the client is on, etc. But this is all mostly true of icanhzip and friends also.
I would definitely put some tiny level of obfuscation in this just to mitigate the risk that some miscreant botnet author finds and uses my copy.
That’s certainly not a bad idea. Almost certainly doesn’t help at all if your code is open-sourced.
Yeah I was thinking of corporate contexts where it either wouldn’t be, or it would be trivial to add a very small patch to the version that I’m running.
Always vaguely annoyed when Cloudflare takes over more stuff, but I can understand not wanting to deal with all this bullshit when you’re just one person. If he reads this, thanks for the service :)
Not all of the interactions were positive, however. One CISO of a US state emailed me and threatened all kinds of legal action claming that icanhazip.com was involved in a malware infection in his state’s computer systems. I tried repeatedly to explain how the site worked and that the malware authors were calling out to my site and I was powerless to stop it.
Shades of when CentOS hacked Oklahoma City’s website.
Always vaguely annoyed when Cloudflare takes over more stuff
I’m genuinely curious as to why? Cloudflare seems at least so far the least “evil” of large internet companies.
Cloudflare seems at least so far the least “evil” of large internet companies.
Still, I would prefer it to be more decentralized. Lots of Internet traffic is already going through CF.
The protocols that the Internet relies on have an inherent centralising effect on Internet services. While a decentralised Internet would be nice, I don’t know of any good proposals for making this happen while also solving the numerous problems to do with how to share power and control in a decentralised manner while still maintaining efficiency and functionality.
inb4 blockchain :eyeroll:
Unless cloudflare turns out to be working closely with no such agency. Then the decentralization crowd will be vindicated and we will still need to solve all the problems cloudflare solves for us.
Of course the decentralisation crowd solve them right now instead of waiting.
There’s always someone in the peanut gallery who’s vindicated, because the peanut gallery is rich in opinions. For any opinion, there’s someone who holds it, doesn’t volunteer or otherwise act on it, is vindicated on reddit/hn/…/here if something bad happens or can be alleged to happen, and blames the people who did volunteer.
Blaming those who volunteer is a shameful thing to do IMNSHO.
It looks like the site (though still managed by post author) migrated entirely to Cloudflare’s systems in 2020 or so.
I’m not entirely sure if I follow what the threat model is here. “Failure of key rotation results in lack of non-repudiation of communications and indeterminate potential for impersonation and man-in-the-middle attacks” seems hyperbolic at best. It just transfers the key during a migration, which seems like copying your existing pgp key to a new computer. I don’t see the problem really, and it also seems like a good trade-off (constant “key has changed” notifications don’t exactly make things more secure).
So unless I missed something … Signal Outrage Article #518. The Signal people must have a fun time dealing with this stuff.
Expecting similar alerts to be sent out to my existing chat threads upon phone changeover, I messaged a few of my more recent chats.
As I read it they sent it after the key changed? Ehm…
I think the point here is: should you be able to notice when your chat partner adds/removes devices to their account or not?
If it’s easy to silently add a device, an attacker who has a few minutes of access to your phone can scan a QR code and eavesdrop on all conversations.
Wasn’t very clear to me either, what I got from it:
This issue is debatable, e2e with multiple devices is hard to pull off, and not many protocols/applications manage to do it in a usable way (Matrix/Element is the one I know which is the most usable while still secure).
I assume this happens with iCloud Backup(?)/Google Backup(?) active on the mobile devices, and results in the data/chat history/safety number restoring out of the blue.
If it happens also when not doing device migration, it does sound like a problem. Seems to me, if this is the case, that Apple/Google may be able to access your Signal conversations and impersonate you.
I’m hoping someone with more information on the issue(s) might explain better (Moxie, are you here?)
If it happens also when not doing device migration, it does sound like a problem.
Yeah, that would be a problem, potentially a huge one (depending on details), but unless I misunderstood something this isn’t the issue at all. All it seems like is “Signal does a cp of my key when I ask it to”, and having the key remain on your local phone after uninstallation has exactly the same threat model as having it on your phone when installing.
Actually, there is perhaps a tiny potential threat in that last one since keys may be recovered from discarded phones; although you’d expect that anyone would wipe their phone before reselling, and you do need to confirm your phone number again on reinstall in any case IIRC. It’s a very tiny problem at best.
And since clearing data does seem to remove the key, it doesn’t seem stored remotely on iCloud or whatnot. But “key is stored remotely” is an entirely different thing than what this article is about anyway.
A stolen key from a discarded phone would not allow an attacker to decrypt past communications due to the signal protocol’s forward secrecy scheme.
2019 Intel i9 16” MacBook Pro
Apple reinforced Intel’s “contextless tier name” marketing trick so much. It was always just “i9” which sounds impressive, doesn’t it. The actual chip name is i9-9880H. That’s Coffee Lake, a mild refresh of a mild refresh of Skylake, still on the 14nm process which was around since 2014.
That is important info.
That said, they kind of deserved this with their obstruse product names. Up till Pentium 4 or so, I could easily follow what is the new model. But having newer i5 being faster than an old i9, I hate that. The important info is then a cryptic number.
But the i9-9880H was launched in Q2 2019. It’s not like Apple is putting old chips in their laptops; the i9-9880H was about the best mobile 45W chip Intel offered at the time.
It’s just both the best Intel had to offer in 2019 and a refresh of Skylake on their 14nm process.
There’s a reason Apple and AMD are both surpassing Intel at the moment.
Note that it’s a i9-9880H inside a MacBook which has a cooling system that heavily prioritises quietness and slimness over performance. This is advantageous for the M1 since Intel chips are heavily dependent on thermals in order to reach and maintain boost clocks.
Zig’s cross-compilation story is the best I’ve ever seen. It’s so good I didn’t even think it would be possible. Even if Zig-the-language never gains any traction (which would be a tragedy), Zig-the-toolchain is already fantastic and will be around for a long time.
Go’s is good, don’t get me wrong, but Zig solves a much harder problem and does it so amazingly seamlessly.
To be honest, the difficulty of cross compilation is something I have never really understood. A compiler takes source code written in some human readable formalism, and produces binary code in some machine readable formalism. That is it. It’s frankly baffling, and a testament to a decades long failure of our whole industry, that “cross compilation” is even a word: it is after all just like compilation: source code in, machine code out. We just happen to produce machine code for other systems than the one that happens to host the compiler.
I see only two ways “cross” compilation can ever be a problem: limited access to target specific source code, and limited access to the target platform’s specifications. In both cases, it looks to me like a case of botched dependency management: we implicitly depend on stuff that vary from platform to platform, and our tools are too primitive or too poorly designed to make those dependencies explicit so we can change them (like, depending on the target platform’s headers and ABI instead of the compiler’s platform).
I would very much like to know what went wrong there. Why is it so hard to statically link the C standard library? Why do Windows programs need VCRedists? Can’t a program just depend on it’s OS’s kernel? (Note: I know the security and bloat arguments in favour of dynamic linking. I just think solving dependency hell is more important.)
Why is it so hard to statically link the C standard library?
Well, because glibc… Maybe musl will save us 😅
If you really want to go down that rabbit hole: https://stackoverflow.com/a/57478728
Good grief, glibc is insane. What it does under the hood is supposed to be an implementation detail, and really should not be affected by linking strategy. Now, this business about locales may be rather tricky; maybe the standard painted them into a corner: from the look of it, a C standard library may have to depend on more than the kernel¹ to fully implement itself. And if one of those dependencies does not have a stable interface, we’re kinda screwed.
When I write a program, I want a stable foundation to ship on. It’s okay if I have to rewrite the entire software stack to do it, as long as I have stable and reliable ways to make the pixels blinks and the speaker bleep. Just don’t force me to rely on flaky dependencies.
[1]: The kernel’s userspace interface (system calls) is very stable. The stackoverflow page you link to suggests otherwise, but I believe they’re talking about the kernel interface, which was never considered stable (resulting in drivers having to be revised every time there’s a change).
It’s worth noting (since your question was about generic cross-platform cross-compilation, and you mentioned e.g. Windows) that this comment:
The kernel’s userspace interface (system calls) is very stable.
is only really true for Linux among the mainstream operating systems. In Solaris, Windows, macOS, and historically the BSDs (although that may’ve changed), the official, and only stable, interface to the kernel, is through C library calls. System calls are explicitly not guaranteed to be stable, and (at least on Windows and Solaris, with which I’m most familiar) absolutely are not: a Win32k or Solaris call that’s a full-on user-space library function in one release may be a syscall in the next, and two separate syscalls in the release after that. This was a major, major issue with how Go wanted to do compilation early on, because it wanted The Linux Way to be the way everywhere, when in fact, Linux is mostly the odd one out. Nowadays, Go yields to the core C libraries as appropriate.
As long as I have some stable interface, I’m good. It doesn’t really matter where the boundary is exactly.
Though if I’m being honest, it kinda does: for instance, we want interfaces to be small, so they’re easier to stabilise and easier to learn. So we want to find the natural boundaries between applications and the OS, and put the stable interface there. It doesn’t have to be the kernel to be honest.
I agree it is over complicated, but it isn’t as simple as you are saying.
One answer is because many tools want to run code during the build process, so they need both compilers and a way to distinguish between the build machine and target machine. this does not need to be complicated, but immediately breaks your idealized world view.
Another answer is our dependency management tools are so poor it is not easy to setup the required libraries to link the program for the target.
many tools want to run code during the build process
Like, code we just compiled? I see two approaches to this. We could reject the concept altogether, and cleanly separate the build process itself, that happens exclusively on the source platform from, tests, that happen exclusively on the target platform. Or, we could have a portable bytecode compiler and interpreter, same as Jonathan Blow does with his language. I personally like going the bytecode route, because it make it easier to have a reference implementation you can compare to various backends.
a way to distinguish between the build machine and target machine.
As far as I understand, we only need a way to identify the target machine. The build machine is only relevant insofar as it must run the compiler and associated tools. Now I understand how that alone might be a problem: Microsoft is not exactly interested in running MSVC on Apple machines… Still, you get the idea.
Another answer is our dependency management tools are so poor it is not easy to setup the required libraries to link the program for the target.
Definitely.
Like, code we just compiled?
There are two common cases of this. The first is really a bug in the build system: try to compile something, run it, examine its behaviour, and use that to configure the build. This breaks even the really common cross-compilation use case of trying to build something that will run on a slightly older version of the current system. Generally, these should be rewritten as either try-compile tests or run-time configurable behaviour.
The more difficult case is when you have some build tools that are built as part of the compilation. The one that I’m most familiar with is LLVM’s TableGen tool. To make LLVM support cross compilation, they needed to first build this for the host, then use it to generate the files that are compiled for the target, then build it again for the target (because downstream consumers also use it). LLVM is far from the only project that generates a tool like this, but it’s one of the few that properly manages cross compilation.
Oh, so that what you meant by distinguishing the build platform from the target platform. You meant distinguishing what will be build for the host platform (because we need it to further the build process) from the final artefacts. Makes sense.
Another example would be something like build.rs in rust projects, though that seems less likely to cause problems. The linux kernel build also compiles some small C utilities that it then uses during the build so they have HOSTCC as well as CC.
The concept of self hosted language is fading away. The last truly self hosted language might have been Pascal.
On its surface this sounds preposterous. Can you elaborate? I know of maybe a dozen self-hosted languages since Pascal so I think I must be misunderstanding.
Edit: I’m guessing you mean that not only is the compiler self-hosted, but every last dependency of the compiler and runtime (outside the kernel I guess?) is also written in the language? That is a much more limited set of languages (still more than zero) but it’s not the commonly accepted meaning of self-hosted.
The original Project Oberon kernel was written in assembly, but the newer version is written almost entirely in Oberon.
Some of the early Smalltalks were written almost entirely in Smalltalk, with a weird syntactic subset that had limited semantics but compatible syntax that could be compiled to machine code.
And of course LISP machines, where “garbage collection” means “memory management.”
It’s an interesting distinction even if the terminology isn’t what I’d use. There’s a trend right now among languages to hop on an existing runtime because rebuilding an entire ecosystem from first principles is exhausting, especially if you want to target more than one OS/architecture combo. Sometimes it’s a simple as just “compile to C and benefit from the existing compilers and tools for that language”. But it seems fitting that we should have a way to describe those systems which take the harder route; I just don’t know what the word would be.
limited access to the target platform’s specifications. […] it looks to me like a case of botched dependency management
This is exactly what’s going on. You need to install the target platform’s specifications in an imperative format (C headers), and it’s the only format they provide.
And it makes extreme assumptions about file system layout, which are all necessarily incorrect because you’re not running on that platform.
Go can cross-compile Go programs but cgo requires an external toolchain even natively; cross compiling cgo is a pain.
Zig compiles Zig and C from almost any platform to almost any platform pretty seamlessly.
Zig compiles Zig and C from almost any platform to almost any platform pretty seamlessly.
As I understand it, Zig doesn’t do much more than clang does out of the box. With clang + lld, you can just provide a directory containing the headers and libraries for your target with --sysroot=
and specify the target with -target
. Clang will then happily cross-compile anything that you throw at it. Zig just ships a few sysroots pre-populated with system headers and libraries. It’s still not clear to me that this is legal for the macOS ones, because the EULA for most of them explicitly prohibits cross compiling, though it may be fine if everything is built from the open source versions.
This is not the difficult bit. It’s easy if your only dependency is the C standard library but most non-trivial programs have other dependencies. There are two difficult bits:
The first is pretty easy to handle if you are targeting an OS that distributes packages as something roughly equivalent to tarballs. On FreeBSD, for example, every package is just a txz with some metadata in it. You can just extract these directly into your sysroot. RPMs are just cpio archives. I’ve no idea what .deb files are, but probably something similar. Unfortunately, you are still responsible for manually resolving dependencies. It would be great if these tools supported installing into a sysroot directly.
The second is really hard. For example, LLVM builds a tablegen tool that generates C++ files from a DSL. LLVM’s build system supports cross compilation and so will first build a native tablegen and then use that during the build. If you’re embedding LLVM’s cmake, you have access to this. If you have just installed LLVM in a sysroot and want to cross-build targeting it then you also need to find the host tablegen from somewhere else. The same is true of things like the Qt preprocessor and a load of other bits of tooling. This is on top of build systems that detect features by trying to compile and run something at build time - this is annoying, but at least doesn’t tend to leak into downstream dependencies. NetBSD had some quite neat infrastructure for dealing with these by running those things in QEMU user mode while still using host-native cross-compile tools for everything else.
As I understand it, Zig doesn’t do much more than clang does out of the box. With clang + lld, you can just provide a directory containing the headers and libraries for your target with –sysroot= and specify the target with -target. Clang will then happily cross-compile anything that you throw at it. Zig just ships a few sysroots pre-populated with system headers and libraries.
That’s what it does but to say that it “isn’t much more than what clang does out of the box” is a little disingenuous. It’s like saying a Linux distro just “packaged up software that’s already there.” Of course that’s ultimately what it is, but there’s a reason why people use Debian and Fedora and not just Linux From Scratch everywhere. That “isn’t much more” is the first time I’ve seen it done so well.
It solves the trivial bit of the problem: providing a sysroot that contains libc, the CSU bits, and the core headers. It doesn’t solve the difficult bit: extending the sysroot with the other headers and libraries that non-trivial programs depend on. The macOS version is a case in point. It sounds as if it is only distributing the headers from the open source Apple releases, but that means that you hit a wall as soon as you want to link against any of the proprietary libraries / frameworks that macOS ships with. At that point, the cross-compile story suddenly stops working and now you have to redo all of your build infrastructure to always do native compilation for macOS.
Ah, I’ve been wondering what people mean when they say that Clojure has a bad license. Thanks for this.
On a separate note, I don’t see the benefit of using the MIT license over an even more permissive license, like the Unlicense. I would love to hear some arguments for using the former, as I personally am quite unsure of what to license my own projects (they’re either not been licensed at all, or using the Unlicense).
I think Google and other big corps don’t allow contributing to or using unlicensed projects because public domain is not legally well defined in some states lawyer pedantry, which to me seems like a positive thing :^)
Personally I go with Unlicense for one-off things and projects I don’t really want/need to maintain, MIT or ISC (a variant of MIT popular in the OCaml ecosystem) if I’m making a library or something I expect people to actually use because of the legal murkiness of the Unlicense, and if I were writing something like the code to a game or some other end-user application I’d probably use the GPLv3, for example if it was a mobile app to discourage people from just repackaging it and adding trackers or ads and dumping it on the play store.
Yes! “Copyleft is more appropriate to end-user apps” is my philosophy as well. Though actually I end up using the Unlicense for basically all the things.
legal murkiness of the Unlicense
Isn’t that kinda just FUD? The text seems good to me, but IANAL of course.
Isn’t that kinda just FUD?
Reading the other comments seems like it is, I guess I was just misinformed. I still prefer MIT because, as others have said, it’s more well known.
This is somewhat off-topic, but I never thought the ISC license was really popular in the OCaml ecosystem. For a crude estimate:
$ cd ~/.opam/repo/default/
$ grep -r 'license: "ISC"' . | wc -l
1928
$ grep -r 'license: "MIT"' . | wc -l
4483
Might be. It would be interesting to get some stats about language/package ecosystem and license popularity.
Here it is for Void Linux packages; not the biggest repo but what I happen to have on my system:
$ rg -I '^license' srcpkgs |
sed 's/license="//; s/"$//; s/-or-later$//; s/-only$//' |
sort | uniq -c | sort -rn
1604 GPL-2.0
1320 MIT
959 GPL-3.0
521 LGPL-2.1
454 BSD-3-Clause
392 Artistic-1.0-Perl, GPL-1.0
357 Apache-2.0
222 BSD-2-Clause
150 GPL-2
133 ISC
114 LGPL-3.0
104 Public Domain
83 LGPL-2.0
83 GPL-2.0-or-later, LGPL-2.1
63 GPL-3
50 MPL-2.0
47 OFL-1.1
41 AGPL-3.0
36 Zlib
31 BSD
26 GPL-2.0-or-later, LGPL-2.0
23 Unlicense
21 Artistic, GPL-1
20 Apache-2.0, MIT
19 ZPL-2.1
19 BSL-1.0
[...]
It groups the GPL “only” and “-or-later” in the same group, but doesn’t deal with multi-license projects. It’s just a quick one-liner for a rough indication.
This sounds like a nice scheme for choosing a license. Thanks for you explanations regarding choosing each one of them.
On a separate note, I don’t see the benefit of using the MIT license over an even more permissive license, like the Unlicense
It’s impossible to answer the question without context. No license is intrinsically better or worse than another without specifying what you want to achieve from a license. With no license, you prevent anyone from doing anything, so any license is a vector away from this point, defining a set of things that people can do with your code. For example:
From this list, there are two obvious differences between the MIT license and Unlicense: MIT is well-established and everyone knows what it means, so there’s no confusion about what it means and what a court will decide it means, and it requires attribution and so I can’t take an MIT-licensed file, put it in my program / library and pretend that I wrote it. Whether these are advantages depends on what you want to allow or disallow with your license.
[1] There’s a common misconception that the GPL and similar licenses require people to give back. They don’t, they require people to give forwards, which amounts to the same thing for widely-distributed things where it’s easy to get a copy but is not so helpful to the original project if it’s being embedded in in-house projects.
[2] Even then, YMMV. The FSF refused to pursue companies that were violating the LGPL for GNUstep. Being associated with the FSF was a serious net loss for the project overall.
I should have clarified that I don’t care about attribution. Thank you for the informative and well structued overview.
Looking at the text of Unlicense, it also does not contain the limitations of liability or warranty. That’s probably not a problem - when the BSD / MIT licenses were written there was a lot of concern about implied warranty and fitness for purpose, but I think generally that’s assumed to be fine for things that are given away for free.
You might want to rethink the attribution bit though. It can be really useful when you’re looking for a job to have your name associated with something that your employer is able to look at. It is highly unlikely that anyone will choose to avoid a program or library because it has an attribution clause in the library, so the cost to you of requiring attribution is negligible whereas the benefits can be substantial.
If you’re looking for people to contribute to your projects, that can have an impact as well.
I don’t care about attribution mainly for philosophical reasons. I dislike copyright as a concept and want my software to be just that, software. People should be able to use it without attributing the stuff to me or anyone else.
Attribution is more closely related to moral rights than IP rights, though modern copyright has subsumed both. The right of a creator to be associated with their work predates copyright law in Europe. Of course, that’s not universal: in China for a long time it was considered rude to claim authorship and so you got a lot of works attributed to other people.
Right, I don’t want to claim authorship of much of the stuff I create. I simply want to have it be a benefit to the people who use it. I don’t have a moral issue with not crediting myself, so I won’t.
Perhaps you would like the ZLib license, then? Unlike MIT, it does not require including the copyright and license text in binary distributions.
I’m no lawyer, but as I understand it, authorship is a “natural right” that cannot be disclaimed at least within U.S. law. It is separate from copyright. The Great Gatsby is in the public domain, but that doesn’t mean that I get to say that I wrote it. You probably can’t press charges against me for saying so as an individual, but plagiarism is a serious issue in many industries, and may have legal or economic consequences.
My point is that the Unlicense revokes copyright, but that someone claiming to have created the work themselves may still face consequences of a kind. Whether that is sufficient protection of your attribution is a matter of preference.
My understanding is that it’s a lot more complex in the US. Authorship is under the heading of ‘moral rights’, but these are covered by state law and not federal. There are some weird things, such as only applying to statues in some states.
Not licensing make the product proprietary, even when the source is publicly shown no one can use it without your permission. IANAL but Unlicense (just like CC0) aren’t really legally binding in some countries (you cannot make your work public domain without dying and waiting). So MIT is not that bad choice as the only difference is that you need to be mentioned by the authors of the derivative work.
0-BSD is more public domain-like as it has zero conditions. It’s what’s known as a “public-domain equivalent license”.
https://en.wikipedia.org/wiki/Public-domain-equivalent_license
The Unlicense is specifically designed to be “effectively public domain” in jurisdictions that don’t allow you to actually just put something in the public domain, by acting as a normal license without any requirements.
That’s, like, the whole point of the Unlicense :) Otherwise it wouldn’t need to exist at all.
I have heard that the Unlicense is still sometimes not valid in certain jurisdictions. 0-BSD is a decent alternative as it’s “public-domain equivalent”, i.e. it has no conditions.
Right, I’ve heard there’s some legal issues with it before, thanks for reminding me.
EDIT: Looks like there’s no public domain problems with the Unlicense after all, so I’m not worried about this.
The whole point of the CC0 is to fully disclaim all claims and rights inherent to copyright to the fullest extent possible in jurisdictions where the concept of Public Domain does not exist or cannot be voluntarily applied. There’s very little reason to suspect that choosing the CC0 is less legally enforceable than MIT.
CC0 seems fine but is somewhat complex. I prefer licenses that are very simple and easy to digest.
I don’t see the benefit of using the MIT license over an even more permissive license, like the Unlicense
Purely pragmatically, the MIT license is just better known. Other than that: the biggest difference is that the MIT requires attribution (“The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.”), and the Unlicense doesn’t.
As for the concerns over “public domain”, IMHO this is just lawyer pedantry. The text makes it plenty clear what the intent is, and I see no reason why it shouldn’t be upheld in court. The gist of the first paragraph is pretty much identical as MIT except with the “the above copyright notice and this permission notice shall be included in all copies” omitted. If it would only say “this is public domain”: then sure, you could land in to a conflict what “public domain” means exactly considering this isn’t a concept everywhere. But that’s not the case with the Unlicense.
I’ve released a lot of code under a dual MIT/Unlicense scheme. ripgrep is used by millions of people AFAIK (through VS Code) and is under this scheme. I have heard zero complaints. The purpose of such a thing is to make an ideological point with the Unlicense, while also providing the option to use something like the MIT which is a bit more of a known quantity. Prior to this, I was releasing code with just the Unlicense and I did receive complaints about it. IIRC, from big corps but also individuals from jurisdictions that don’t recognize public domain. It wasn’t so much that they definitively said they couldn’t use the Unlicense, but rather, that it was too murky.
IANAL although sometimes I play one on TV. In my view, the Unlicense is just fine and individuals and big corps who avoid it are likely doing it because of an overly conservative risk profile. Particularly with respect to big corps, it’s easy to see how the incentive structure would push them to do conservative things with respect to the law for something like software licenses.
While the dual licensing scheme seems to satisfy all parties from a usage perspective, I can indeed confirm that this prevents certain big corps from contributing changes back to my projects. Because those changes need to be licensable under both the MIT and Unlicense. To date, I do not know the specific reasons for this policy.
I never really understood how dual-licensing works, can you explain a bit? Do users pick the license that they want or can they even cherry pick which clauses of each license they want to abide by?
AIUI, you pick one of the licenses (or cascade the dual licensing scheme). That’s how my COPYING
file is phrased anyway.
But when you contribute to a project under a dual licensing scheme, your changes have to be able to be licensed under both licenses. Otherwise, the dual license choice would no longer be valid.
As I state in the article, I don’t think the EPL is a “bad license.” Clojure Core uses the EPL for very good reasons — it’s just that most of those reasons are unlikely to apply to Random Clojure Library X.
EDIT: I had replied regarding The Unlicense, but I see other folks have done a more thorough job below, so I’m removing that blurb. Thanks all.
I should have expressed myself more clearly. I’ve heard people mention Clojure’s license as a downside to the language, and now that I’ve read your article I have an idea of what they’re talking about.
Unlicense
I recommend against using this license, because making the ambiguous license name relevant anywhere makes everyone’s life harder. It makes it hard to distinguish between “CC0’d” and “in license purgatory”:
“What is this code’s license?”
“It’s Unlicensed”
<Person assumes it’s not legally safe to use, because it’s unlicensed>
I wish that license would either rename or die.
OK, I have a question about this design.
First, the key is deterministically derived using the U2F device. (It’s always the same key.) That means the key could be stolen if you’re accidentally using a compromised SSH client, for instance. Unlike a key on a smart card or a Yubikey in PIV mode, where the root key never leaves the device.
Presumably to mitigate this risk, Github also requires a TOTP one-time token if you’re using U2F. You have to push the button on your device, it spits out a one-time token that GitHub can verify.
But then what value does U2F add in the first place, if you still need to also use TOTP?
Maybe I’m misunderstanding something here.
The key is generated via FIDO2, and it’s not deterministic. With FIDO2 (the successor to U2F, with backwards compatibility) for registration the key takes as parameters the relaying party name (usually ssh:// for ssh keys, https://yoursite.com for websites), a challenge for attestation, the wanted key algorithm, and a few extra optional parameters. The key responds with an KeyID, an arbitrary piece of data that the should be provided to the key when wanting to use it, the public key data, and the challenge signed with that key. This data usually holds the actual private key, encrypted with an internal private key of the security key, which is then decrypted to actually use it. It’s assumed that the KeyID is unique, therefore it should(and generally is) generated using some sort of secure RNG on the security key. The flow with ssh is quite simple, when you create an SSH key with a security key, a key gets generated on it, the KeyID gets stored as the private key file, and the public key file stores the returned public key. When connecting with SSH, a challenge is issued by the server you are authenticating to, which gets passed to the key with appropriate KeyID, and the response is sent back to the server. No need for any additional TOTP tokens, which are less secure.
First, the key is deterministically derived using the U2F device. (It’s always the same key.) That means the key could be stolen if you’re accidentally using a compromised SSH client, for instance. Unlike a key on a smart card or a Yubikey in PIV mode, where the root key never leaves the device.
How can this make sense? Surely the U2F device has access to a suitable CSPRNG.
Total outsider here, but my understanding is that Rust newcomers struggle with satisfying the compiler. That seems necessary because of the safety you get, so OK, and the error messages have a great reputation. I would want to design in possible fixes for each error which would compile, and a way to apply them back to source code given your choice. If that’s a tractable problem, I think it could help cut trial and error down to one step and give you meaningful examples to learn from.
Maybe add a rusty paperclip mascot…
Actually, a lot of the error messages do offer suggestions for fixes and they often (not always) do “just work”. It’s really about as pleasant as I ever would’ve hoped for from a low-level systems language.
That’s great! Is it exposed well enough to, say, click a button to apply the suggestion in an editor?
In some cases, yes. See https://rust-analyzer.github.io/
In a lot of cases, actually. It becomes too easy sometimes, because I don’t bother trying to figure out why it works.
Yeah, it seems to be. I often use Emacs with lsp-mode and “rust-analyzer” as the LSP server and IIRC, I can hit the “fix it” key combo on at least some errors and warnings. I’m sure that’s less true the more egregious/ambiguous the compile error is.
rusty paperclip mascot…
There is this but it doesn’t seem to have a logo, someone should make one!
In addition to reducing the load on the root servers, QNAME minimization improves privacy at each stage of a request.
quoting isc.org:
“Let’s say you want to visit a blog site at https://someblogname.bloghosting.com.pl. In order to determine which IP address to connect to to reach that link, your computer sends a request to your ISP’s resolver, asking for the full name - someblogname.bloghosting.com.pl, in this case. Your ISP (or whoever is running the network you are using) will ask the DNS root, and then the top-level domain (.pl in this case), and then the secondary domain (.com.pl), for the full domain name. In fact, all you are finding out from the root is “where is .pl?” and all you are asking .pl is “where is .com.pl?” Neither of these requests needs to include the full name of the website you are looking for to answer the query, but both receive this information. This is how the DNS has always worked, but there is no practical reason for this today.”
For BIND, it’s qname-minimization ( strict | relaxed | disabled | off ); for unbound, it’s qname-minimisation: yes.
About 51% of DNS resolvers do QNAME minimization now: https://dnsthought.nlnetlabs.nl/#qnamemin
+1 I am in the middle of switching to PowerDNS from BIND, so I thought I’d respond on how PowerDNS is doing.
For PowerDNS, both QNAME and aggressive caching are enabled by default now.
What is the encryption for? In general, encryption is there to preserve two properties: confidentiality and integrity. The integrity is already handled by DNSSEC and has the nice property that the response is the same for everyone (more on this later). The confidentiality matters only when it leaks secret information. The fact that some ISP’s user has looked for a specific domain name may well leak private information (e.g. have they been looking up a site that shares information critical of the government) that can have serious real-world consequences. The fact that some ISP’s user has looked up a domain in Poland or a domain with a .com ending is not, I would suggest, leaking any information that people would care about being leaked. If my anonymity set is all of my DNS server’s users and the only information that leaks is that I’ve looked up some .com domain, I fundamentally don’t care.
The desire to have the same response for everyone is driven by the fact that, to scale up to the performance, the root DNS resolvers that I know about (operated by VeriSign) pre-prepare packets in memory, update the destination addresses and the checksums, and then send the DMA request. They get phenomenal throughput and latency from this approach.
My slight worry about this argument for confidentiality comes from the fact that 97% of root DNS responses return NXDOMAIN. They are the result in typos in the domain. For example, if you omit the .com by accident and type example
instead of example.com
, then the root DNS will be queried with the second-level domain name, example
. In this case, any passive observer of the traffic knows that you’ve typed example
. That’s probably easy to work around in the querying servers by using encrypted DNS queries for any TLD that they haven’t seen before, but that only helps performance for the 3%. The TLDs typically have quite long TTLs, but when that expires you take a TLD completely off the Internet for users of the caching resolver if there’s a denial of service on the root servers.
The integrity is already handled by DNSSEC and has the nice property that the response is the same for everyone (more on this later).
DNSSEC suffers from complexity, architechtural fragility, and extremely low adoption. In fact, DNSSEC’s failings are one of the biggest arguments in favour of DoH and DoT.
like @david_chisnall said, they are mostly trying to solve different things. DNSSEC lets the large DNS providers(read roots) the ability to scale. DoH and DoT don’t. You are single-sourcing your DNS to an exact entity, just like with an ISP, except instead of your ISP knowing what you are looking up , it’s Google and Cloudfare(or whoever your DoH & DoT provider is, but they are the defaults). Who wouldn’t want to be Google/Cloudfare. They say they are not doing bad things with your information, but we don’t actually know that, we just have to trust them.
So you are avoiding your ISP from seeing your lookups, you could have done that with a VPN/tunnel just as easily, without any DoH or DoT. DoH and DoT don’t fix the problem of a nation state/ISP seeing where you are going, as SNI headers are still not encrypted(last I checked). Of course this is only HTTP problems, many such exist. Information leakage on the internet happens ALL over the place, and while DoH and DoT mostly can help, they are not magic bullets.
DoH and DoT don’t solve the problem of your DNS provider lying to you. DNSSEC can, in theory(provided it was fully deployed, which it clearly isn’t).
Before DoH and DoT, your actual OS was generally 100% in charge of where your DNS requests are going, but now it’s a guessing game as to who is answering a given DNS request. Web browsers have stolen control of that too, further proving that web browsers are Operating Systems in disguise.
The only reason DoH/DoT has gotten such adoption is because Web Browsers forced it on us.
I’m not against DoH or DoT, but they aren’t magical solutions to the problem(s), but obviously DNSSEC isn’t either.
The UDOO BOLT V8 has a Ryzen V1605B 4C/8T with Vega 8 graphics. The V3 has a 2C/4T Ryzen V1202B with Vega 3 graphics.
I hope they look at torrents. People like myself are dying to find some socially-valuable way of using symmetric gig home fiber connections and abundant storage/compute in the homelab. I’ve tried hosting Linux ISOs but my sharing ratio never goes above 1. Torrents could be a first-line cache before hitting the S3 bucket or whatever else, and I think they have an extremely cool intersection with the idea of reproducible builds itself. Heck, you could configure your PC to seed the packages you’ve installed, which would have a nice feedback loop between package popularity and availability!
Hmmm, looking at the cost breakdown they link, they use Fastly as a CDN/cache so most of their S3 costs are for storage, not transfer. They cite 30 TB of transfer a month out of the storage pool, vs 1500 TB of transfer per month served by the CDN. Looks like Fastly gives them the service for free (they estimate it would be €50k/month if they had to pay for it), so their bottleneck is authoritative storage.
Backblaze would be cheaper for storage, but it’s still an improvement of like 50%, not 500%.
Okay maybe this is getting a bit too architectural-astronaut, but why do you need authoritative storage for a reproducible build cache? If no peers are found then it’s your responsibility to build the package locally and start seeding it. Or there could even be feedback loops where users can request builds of a package that doesn’t have any seeders and other people who have set up some daemon on their server will have it pull down the build files, build it, and start seeding it. The last word in reproducible builds: make them so reproducible you can build & seed a torrent without downloading it from anybody else first!
This thread isn’t about a reproducible build cache. It includes things like older source dists which aren’t on upstream anymore, in which case the cache isn’t reproducible anymore
This seems highly unsecure to let random people populate the cache. It would be easy for a malicious actor to serve a different build.
They could host an authoritative set of hashes
This only works for content addresses derivations, which are a tiny minority. The rest of them are hashed on inputs, meaning that the output can change and the hash won’t, so what you propose wouldn’t work at the moment, not until CA derivations become the norm (and even then, it still wouldn’t work for old derivations)
You can’t corrupt torrents, they’re addressed by their own file hash; see https://en.wikipedia.org/wiki/Magnet_URI_scheme
It’s not corruption of the file in the torrent that’s the problem, but swapping in a malicious build output while the nix hash would stay the same. This is possible in the current build model (see my comment above), and what we rely on is trust in the cache’s signing key, which cannot be distributed.
There are projects like trustix which try to address this, but they’re dormant as far as I can tell.
I wouldn’t even bother trying to build NixOS in a CI/CD context with torrent as a primary storage backend.
Why not? Would it be fine with you if http mirrors were still available?
Maybe because there would be increased latency for every single file/archive accessed? The idea of a community-provided mesh cache is appealing though, if the latency issue is mitigated.
Then I’d use only http, changing nothing for the project in terms of costs.
CI/CD (and NixOS) should be reproducible, but having a requirement on torrents throws that out of the window and make it inpredictable. Yes, it will probably be fine most of the time, and sometimes be faster than everybody using the same HTTP cache.
But also, firewalling the CI/CD pipeline using torrents? That’s hard enough with CDNs…
There are different topologies for torrents. BitTorrent became popular for illegal file sharing (in spite of being a terrible design for that) and a lot of the bad reputation comes from that. In this model, all uploaders are ephemeral users, typically on residential connections. This introduces a load of failure modes. All seeders may go away before you finish downloading. One block may be present on only a slow seed that bottlenecks the entire network (Swarmcast was a much better design for avoiding this failure mode). Some seeders may be malicious and give you blocks slowly or corrupted.
In contrast, this kind of use (which, as I understand it, was the use case for which the protocol was originally designed) assumes at least one ‘official’ seed on a decent high-speed link. In the worst case, you download from that seed and (modulo some small differences in protocol overhead), you’re in no worse a situation than if you were fetching over HTTP. At the same time, if a seed is available then you can reduce the load on the main server by fetching from there instead.
For use in CI, you have exactly the same problems as any dependency where you don’t have a local cache (Ubuntu’s apt servers were down for a day a couple of months back and that really sucked for CI because they don’t appear to have automatic fail over and so the only way to get CI jobs that did apt updates to pass was to add a line to the scripts that patched the sources file with a different mirror). At the same time, it makes it trivial to provide a local cache: that’s the default behaviour for a BitTorrent client.