First of all, let’s remember that Github is a fully proprietary service. Using it to host the development of a free software makes no sense if you value freedom.
I’ve heard this argument too many times, and I grow increasingly tired of attempting to reason through it. How is this true… at all? GitHub lets me host practically unlimited code for free. If I value my software being free, this should be the only thing I am concerned with. GitHub is run by capitalists, shocker they’re doing a capitalism or two. There is literally no better way to spread your code to the world than uploading it to GitHub, partially because that’s where all the users are.
The bar for entry for GitHub is extremely low. I learned how to use GitHub’s desktop app almost a decade ago, far before I became comfortable with the git CLI. I’ve met too many people that are not technically adept yet still have a GitHub account and are capable of contributing to projects. I can’t say the same about GitLab, Codeberg, or SourceHut, even if I enjoy parts of those products more than GitHub.
By keeping your project on Github, you are encouraging new developers to sign up there, to create their own project there. Most importantly, you support the idea that all geeks/developers are somehow on Github, that it is a badge of pride to be there.
There’s over 100 million users. The evidence that all the geeks/developers are on GitHub has been weighed, and it is overwhelmingly in GitHub’s favor.
Good code is written when people are focused, thinking hard about a problem while having the time to grasp the big picture. Modern Github seems to be purposely built as a tool to avoid people thinking about what they do and discourage them from writing anything but a new JavaScript framework.
I think this beautifully illustrates the author’s perspective of those millions of users on GitHub: they’re writing bad code that doesn’t need to exist. Modern GitHub has almost exclusively devoted itself to streamlining the contribution process and encouraging communities to form around software projects. There are plenty of features I’m not a fan of, and some obvious changes I would make, but overall GitHub is incredibly easy to teach and use. I would love to say the same about any free alternative, but I can’t.
I avoided GitHub for a long time, for many of the arguments in the article. I eventually got an account about 10 years ago because I learned an important lesson: pick your battles. I don’t like the fact that GitHub is a proprietary dependency for a load of F/OSS projects. I don’t like that it distorts the F/OSS ecosystem by making CI free for some platforms and expensive for others. But when I moved projects there, I got more contributors and more useful contributions. It’s trivial for a user to submit a small fix to a project on GitHub without already having a deep engagement with the project. That is not true of anything else, in part because of network effects. I consider that more Free Software existing should be a primary goal of the Free Software movement and I’m willing to compromise and use proprietary tools to make this happen. At the same time, I’d encourage folks who have a problem with this to keep working on alternatives and come up with some way of funding them as a public service.
GitHub takes down code when requested. It is rather difficult to imagine Free Software properly destroying the system of copyrighted code when we are willing to let copyright complainants take down any code which threatens that system. I don’t think that forges ought to solve this problem entirely, but forges should not be complicit in the reaction against Free Software.
[O]verall GitHub is incredibly easy to teach and use. I would love to say the same about any free alternative, but I can’t.
I recommend that you try out GitLab. If nothing else, it will refine your opinion.
Yes, basically every public forge will take down code when (legally) requested. America has the DMCA process for this; I’m not particularly familiar with German law, but Codeberg, a very strong proponent of free software, also calls out in their TOS:
(4) Content that is deemed illegal in Germany (e.g. by violating copyright or privacy laws) will be taken offline and may lead to immediate account suspension.
Fighting the man and destroying copyright is a nice concept, but it’s a concept that needs to be pushed socially and legally, not just by ignoring the whole thing altogether.
Copyleft is only one approach to fighting copyright, and forges are only one approach to sharing code. It’s easy enough to imagine going beyond forges and sharing code in peer-to-peer ecosystems which do not respect takedown requests; I’m relatively confident that folks share exfiltrated corporate code from past “leaks” via Bittorrent, for example.
What are forges to do? Not accept DMCA requests? Free Software will continue to be incapable of taking down the copyright system, because it works within said system. If you want to change that system, political activism is going to do a lot more than GPL ever could.
I’ve tried GitLab, SourceHut, Gitea, and a few others, and while I enjoy different parts of those products, I couldn’t possibly call them “user-friendly” the same way I can GitHub. https://about.gitlab.com/ is solely an advertisement for they “DevSecOps” platform - this is, of course, really cool, but someone without the necessary background will not care about this. Even though a lot of this is just marketing, that marketing is important in painting an image for potential users. “The place for anyone from anywhere to build anything” speaks volumes while “DevSecOps Solution” silences the room.
I recommend that you try out GitLab. If nothing else, it will refine your opinion.
I don’t understand why GitLab gets all the love. Because as a user of their hosted service, it is really Just. The. Same. Git, pull requests, wiki, projects boards, discussions, CI.
If you want to host your own instance then yes of course .. you can’t do that with GitHub. But for the hosted version - I would say it is just as proprietary and locked-in as GitHub is.
I feel like every time I log into GitLab I find a new bug.
Don’t get me wrong; I’d still pick it over GitHub, but it’s far, far behind Gitea.
That’s not the bar that was set, though. All I’m suggesting is that GitLab is a free alternative to GitHub, and that (because GitLab is designed to clone GitHub’s UX) GitLab is as “incredibly easy to teach and use” as GitHub. I’ve used GitLab instances in the past, and believe this to be a reasonable and met bar. Neighboring comments recommend Gitea, which also seems like a fine free alternative, and it might also meet the bar; I’m not personally experienced enough with Gitea to have an opinion.
Economies of scale are hard to beat.
One login for all the projects on github.
Automatic back-links between issues lists across all repositories.
And I can get my data back: my commits are easy to clone. I get a copy of all my comments by email (not the default, but an available option).
Some stuff needs a graphql query download to expatriate. Have to admit I don’t do that regularly, so I’m trusting them there.
Cheers to folks that take on this battle against centralization. I’m inclined to put my energy elsewhere.
My habit is to take the descriptive name, such as directory service, abbreviate heavily (ds), and add some entropy: ds45
It’s easy to type, not too hard to remember, and if the service changes, you can either do a backronym or just don’t bother and consider the name wholly arbitrary.
Going the other way is much more tractable. Latex is tiring-complete.
https://www.w3.org/2004/04/xhlt91/ is what I cobbled together when faced with latex submission requirements. A quick search turns up several tools that look somewhat mature.
You can, using capability security, limit your vulnerability to code without reading it.
https://github.com/dckc/awesome-ocap https://en.m.wikipedia.org/wiki/Capability-based_security
I believe this is definitely the correct way to go about securing our devices. There is just too much code to audit, and to make matters worse, people routinely run code on their machines that is actively malicious (trying to spy on their users). If you consider that every line of code can contain a bug that may be abused, we’re in a world of hurt if we keep doing what we’re doing.
If you think about it, this is just a continuation of what we’ve been doing to secure things - remember back in the day when Windows ran everything at the same privilege level and any stupid application could cause a BSOD? Us Linux and Unix users were pretty smug about the better security it offered. But we’ve been mostly stagnant while Windows caught up, and Mac in one fell swoop caught up as well with the switch to Darwin/OS X.
At least the OpenBSD folks are still trying to push the envelope, and their pledge system comes somewhat close to capabilities. Unfortunately, it requires one to trust the software itself, whereas hard capabilities don’t require such trust. In this day and age the model where you can trust the software itself is not really sustainable anymore.
Racket has an “equal-always?” primitive that equates values that will stay the same even if mutated.
This is so fuckin cool. One of those functions that makes you think “huh, why hasn’t anyone else done this before?”
Not to mention https://p.hagelb.org/equal-rights-for-functional-objects
It’s even cooler when it’s the default equality predicate in your language rather than one of seven or twelve or whatever.
One of the big things missing, IMO, of the Semantic Web is the notion of context. A fact, any fact, is going to be true in some contexts, and false in other contexts. The classic example of this is something like “parallel lines do not intersect” which is true in flat space (2D or 3D), but not (for example) on the surface of a sphere.
The knowledge bases I worked with (briefly) had encoded facts like “Tim Cook is the CEO of Apple”. But of course, that is only true for a certain context of time. Before that was Steve Jobs from 1997 to 2011. But the dumps generated from Wikipedia metadata didn’t really have that with much consistency, nor any means to add and maintain context easily.
Context in general is needed all over the place for reasoning:
A bare fact is only as useful so far as you know what contexts it applies to.
I don’t know a good way to represent this, a graph may not be ideal. Many facts share a context too. In 2003, Steve Jobs was an employee at Apple. Ditto for Jony Ive in 2003. And in 2004, etc.
How do our own brains organize these facts and the contexts they go with?
There was some with on context. SPARQL lets you say what context you want to query, for example. I forget what it’s called there…
n3 also had quoting. It was kind of a mess, though.
You’ve seen Guha’s thesis and cyc, yes? Fun stuff.
But as noted above, machine learning got-3) has pretty much eclipsed symbolic reasoning on the one side. On the other, we have the stuff that’s no longer considered AI: type systems and sql.
I’ve read a bit about Cyc and OpenCyc, yes. I haven’t read the book Building Large Knowledge Based Systems by Lenat and Guha though.
I haven’t given up on the idea of probabilistic symbolic reasoning, but I realized I’m in the minority here.
I still imagine a system where, for example, you receive a document (such as news article) and it is translated into a symbolic representation of the facts asserted in the document. Using existing knowledge it can assign probabilistic truth values to the statements within the article. And furthermore be able to precisely explain the reasoning behind everything, because it could all be traced back to prior facts and assertions in the database.
I can’t help but think such a system ought to be more precise in reasoning as well as needing fewer compute resources. And being able to scale to much larger and more complicated reasoning tasks.
It’s a very good question. It’s too bad Windows doesn’t offer this, but there are operating systems that do. Genode, seL4, …
Previous discussion on lobste.rs. The creator, @slightknack, is also on lobste.rs!
Hey y’all, I was surprised to see this here today! I guess a few updates since the last time this was posted:
Currently we’re in the middle of a massive rewrite of the entire compiler pipeline. It currently lives in the big-refactor
branch, and we’re trying to get it done by July to create the next release.
Since then I’ve done a lot of research and thinking on traits and effect systems, and think I have a neat unified approach that I’d like to start prototyping. We’ve also been working on the way Passerine integrates with Rust, and we hope to provide a safe sandboxed parallelizable runtime with a high-level API that integrates well with existing Rust libraries.
We’ve also been rewriting the macro system to allow for compile-time evaluation. This will be much more powerful lisp-style procedural macro system. Lisp is a powerful programming language for manipulating lists, which is why lisp macros, that operate on lists, fit the language so well. Passerine aims to be a powerful programming language for manipulating ADTs, so by representing the language as an ADT to be operated on by macros, we hope to capture the same magic and power of lisp’s macro system. (Note that the current rule-based macro system will still be available, just implemented in terms of the new one.)
The majority of the discussion around the development of the language happens on our Discord server[1]. We have meetings the first Saturday of each month with presentations on PLT, compiler engineering, and other neat stuff.
I’ve been working on an experimental BEAM-style runtime called Qualm that has a custom allocator that supports vaporization (the memory management technique Passerine uses, essentially tagged pointer runtime borrow checking.) I’m not sure it’ll be ready for the next release (as it requires type layouts to be specified, and the typechecker is currently WIP), but it is a nice demo for what I think is possible for the language.
I’m here to answer any questions you may have about the language! I’m based in Europe so I might not see them till tomorrow morning, but don’t be afraid to ask, even if you think it’s a silly question.
[1]: I tried starting a Matrix channel after people on lobste.rs brought it up last time. After a grand total of zero users had joined six months later, I went ahead and scrapped it. I love Matrix, so I might consider bridging the server in the future.
I’m very curious about the differences between what Passerine does and what Perceus does in terms of both compile-time and runtime memory management!
(For context, I’m working on a programming language that currently uses Perceus.)
Sorry for the late response. I started writing something, but like 2k words later I realized it should probably be like a blog post, and not a random comment. I’ll summarize the gist of the argument, and link to the blog post later once I’ve finished it:
So, as I’m sure you know, Perceus is a form of reference counting (limited to inductive types) that drastically reduces the number of reference count instructions that are produced. This makes reference counting more efficient, but Perceus is still reference counting at its core.
Vaporization uses a single tag bit to keep track of whether a reference is currently immutable shared
or mutable owned
. This is a bit different than reference counting, as the number of shared references is not kept track of.
When passing an object reference to a function, we set the reference tag to immutable shared
if the reference is used again later in the caller’s scope. If this is the last-use of a value, we leave it as-is, allowing for efficient in-place updates in linear code. To update an object, the reference we have to it must be mutable owned
; if the reference is immutable shared
instead, the runtime will make a copy of the portion required. All references returned from functions must be mutable owned
; when a function returns, all other mutable owned
references tied to that stack frame are deallocated.
If effect, a mutable owned
reference is owned by a single stack frame; ownership can be passed up or down the call-stack on function call or return. When calling a function, we create a child stack frame that is guaranteed to have a shorter lifetime than the parent stack frame. Therefore, we can make as many immutable references as we’d like to data owned by parent stack frames, because all immutable references to that data will disappear when the child stack frame returns to the parent stack frame.
Both Perceus and Vaporization do not allow cyclic data structures to be created (without requiring users to manage circular references themselves). This operating assumption drastically limits the type of data structures that can exist (in terms of pointer graphs). Because object graphs only be trees rooted at the stack (e.g. anything trivially serializable to JSON), it’s very simple to keep track of when things should be pruned from the heap.
Someone much smarter than I am described Vaporization as “using the stack as a topological sort of the acyclic object graph.” I’m not sure whether this statement is fully correct, but I think it captures the gist of the idea.
So, to answer your question, here’s what Vaporizations does with respect to compile-time and runtime memory management:
At compile-time, we annotate the last use of every variable in a given scope. When generating code, if we encounter a non-last-use variable, we emit an instruction to set the tag of the reference to shared immutable
. (We also ensure that all closure captures are immutable).
At runtime, we update reference tags as annotated at compile-time, and we create deep copies of (portions of) objects as required when converting references from immutable shared
to mutable owned
.
If we know type layouts, it’s possible to inline the code responsible for copying data and updating reference tags such that there is no runtime dependency. This makes Vaporization suitable for both static and dynamic languages alike.
Hope this helps!
PS—Oh, I see you’re the author of Roc! I remember watching your talk “Outperforming Imperative with Pure Functional Languages” a while back, it was quite interesting!
Very helpful, thank you so much for the detailed explanation!
Also, glad you found the talk interesting. Feel free to DM me if you have any questions about Roc!
I’m interested in the run-time borrow checking idea. The part that makes sense to me is doing copy-on-write with refcounting: so you have pass-by-value semantics, but you can also do an in-place update when the refcount is 1.
But by “borrow checking”, do you mean you want to give the programmer a way to say: “destroy this value at the end of this scope; if I mess up and retain a reference to it, let me know”? As opposed to: “keep this alive as long as I have references to it”.
See my sibling answer for some more info on vaporization. We essentially use a single bit embedded in the pointer for the refcount, which can either be ‘1’ (mutable owned
) or ‘more than 1’ (immutable shared
).
All values not returned from a function and not passed in as parameters to a function will be destroyed at the end of a given scope. The only way to retain a reference to an object would be to return it, which makes it very hard to mess up and retain a reference to it.
While you’re open to big ideas, have you considered capability security?
One of the coolest things I’ve added to awesome-ocap lately is Newspeak, with a really cool module system:
Newspeak is an object-capability programming platform that lets you develop code in your web browser. Like Self, Newspeak is message-based; all names are dynamically bound. However, like Smalltalk, Newspeak uses classes rather than prototypes. The current version of Newspeak runs on top of WASM.
In general, we recommend regularly auditing your dependencies, and only depending on crates whose author you trust.
Or… Use something like cap-std to reduce ambient authority like access to the network.
My understanding is that linguistic-level sandboxing is not really possible. Capability abstraction doesn’t improve security unless capabilities are actually enforced at runtime, by the runtime.
To give two examples:
--allow-net
, no dependency will be able to touch the network. At the same time, there are no linguistic abstractions to express capabilities. (https://deno.land/manual/getting_started/permissions)Is there a canonical blog post explaining that you can’t generally add security to “allow-all” runtime by layering abstraction on top (as folks would most likely find a hole somewhere), and that instead security should start with adding unforgeable capabilities at the runtime level? It seems to be a very common misconception, cap-std is suggested as a fix in many similar threads.
Sandboxing is certainly possible, with some caveats.
You don’t need any runtime enforcement: unforgeable capabilities (in the sense of object capabilities) can be created with, for example, a private constructor. With a (package/module) private constructor, only your own package can hand out capabilities, and no one else is allowed to create them.
cap-std doesn’t help you ensure that deps are safe.
That is true, in the sense that no dependency is forced to use cap-std
itself. But, if we assumed for a second that cap-std
was the rust standard library, then all dependencies would need to go through it to do anything useful.
Nothing prevents a dep from, eg, using inline assembly to make a write syscall directly.
This can also be prevented by making inline assembly impossible to use without possesing a capability. You can do the same for FFI: all FFI function invokations have to take a FFI capability. With regards to the rust-specific unsafe
blocks, you can either do the same (capabilities) or compiler-level checks: no dependencies of mine can use unsafe
blocks unless I grant them explicit permission (through a compiler flag, for example).
Is there a canonical blog post explaining that you can’t generally add security to “allow-all” runtime by layering abstraction on top […] and that instead security should start with adding unforgeable capabilities at the runtime level?
I would go the other way, and recommend Capability Myths Demolished, which shows that object capabilities are enough to enforce proper security and that they can support irrevocability.
With a (package/module) private constructor, only your own package can hand out capabilities, and no one else is allowed to create them.
This doesn’t generally work-out in practice: linguistic abstractions of privacy are not usually sufficiently enforced by the runtime. In Java/JavaScript you often can use reflection to get the stuff you are not supposed to get. In Rust, you can just cast a number to a function pointer and call that.
I would sum up it as follows: languages protect their abstractions, and good languages make it impossible to accidentally break them. However, practical languages include escape hatches for deliberate circumventing of abstractions. In the presence of such escapes, we cannot rely on linguistic abstractions for security. Java story is a relevant case study: https://orbilu.uni.lu/bitstream/10993/38190/1/paper.txt.
Now, if you design a language with water-tight abstractions, this can work, but I’d probably call the result a “runtime” rather than a language. WASM, for example, can implement capabilities in a proper way, and Rust would run on WASM, using cap-std as n API for runtime. The security properties won’t be in cap-std, they’ll be in WASM.
This can also be prevented by making inline assembly impossible to use without possesing a capability
I don’t think this general approach would work for Rust. In Rust, unsafe
is the defining feature of the language. Moving along these lines would make Rust more like Java in terms of expressiveness, and probably won’t actually improve security (ie, the same class of exploits from the linked paper would work).
I would go the other way, and recommend Capability Myths Demolished
Thanks, going to read that, will report back if I shift my opinions!
EDIT: it seems that the paper is entirely orthogonal to what I am trying to say. The paper argues that cap model is better that ACL model. I agree with that! What I am saying is that you can’t implement the model on the language level. That is, I predict that even if Java used capability objects instead of security manager, it would have been exploitable more or less in the same way, as exploits breaking ACL would also break capabilities.
Go used to have a model where you could prohibit the use of package unsafe and syscall to try to get security. App Engine, for example, used this. But my understanding is that they ended up abandoning it as unworkable.
Your points are sharp. Note that there was an attempt to make Java capability-safe (Joe-E), and it ended up becoming E because taming Java was too much work. Note also that there was a similar attempt for OCaml (Emily), and it was better at retaining the original language’s behavior, because OCaml is closer than Java to capability-safety.
ECMAScript is almost capability-safe. There are some useful tools, and there have been attempts to define safe subsets like Secure ECMAScript. But you’re right that, just like with memory-safety, a language that is almost capability-safe is not capability-safe.
While you’re free to think of languages like E as runtimes, I would think of E as a language and individual implementations like E-on-Java or E-on-CL as runtimes.
MarkM’s thesis is great, but you’re not in a mood to dive straight into 200 pages of text, there’s lots of related stuff to warm up with https://github.com/dckc/awesome-ocap
Can any users familiar with both talk to a comparison of a Nix flake + direnv? Just trying to build my own mental model of Hermit
I’ve used direnv, and Nix, but not Nix flake. Hermit is more akin to asdf.
It differs from Nix primarily in that there’s no installation required at all when using a Hermit-enabled repo. You just clone it down and Hermit will auto-install itself and any packages in the repo as they’re used.
The FAQ has a section comparing it to different package managers including Nix.
Hermit seems to carve out a smaller scope. In particular, it doesn’t model the process of building tools–just downloading them. And it doesn’t try to manage low level dependencies like libc nor higher level stuff like recreating pypi, npm, and crates.io
And it doesn’t try to provide security beyond https. No hashes, signatures, etc.
This is mostly accurate, except there are opt-in SHA-256 hashes.
Sort of…
They share the issue that unlike apt where if you get an option wrong, some C code tells you that you got an option wrong, nix and guix just pass the wrong option down into interpreted code where you eventually get “string found where integer expected” or some such.
The difference is: guix’s error messages come from guile, a much more mature language runtime than nix.
I try nix once or twice a year, but I have to learn in all over again each time, and it’s rough.
I tried guix for the first time this past week, and even though I hit many issues, I found it much easier to diagnose and address them.
Full disclosure: I do have years of scheme/lisp experience, though none of it very recent. I have also done a non-trivial amount of Haskel/ocaml work. I like ocaml more than scheme. But I hate “programming” shell scripts.
In guix, I see much less occasion to drop down out of scheme to shell scripting.
The work with informal systems on the Agoric smart contracts kernel is my first substantive work with TLA+. Connecting the implementation with the formalization via tests generated by a model checker is my favorite part. We don’t have it running in ci yet, but here’s hoping!
Is there anyone working on some better (verifiable?) approaches to crypto contracts? Or is the SotA still “you write solidity very very carefully”? I can’t imagine this approach will survive very long with various services/companies getting hacked or imploding every few weeks. Or at least don’t expect it to grow out of the on/cross chain defi speculation and pyramid schemes without changes.
At agoric.com, we’re working on smart contracts based on object capabilities, which pretty much takes the onlyOwner confused deputy risk off the table.
Partitioning risks is pretty natural with ocaps. The Zoe API with offer safety takes another huge class of risks off the table.
External security reviews, including some formal verification, is in progress.
Is there anyone working on some better (verifiable?) approaches to crypto contracts?
I think the Libra Diem people are working on this
Move is a next generation language for secure, sandboxed, and formally verified programming. Its first use case is for the Diem blockchain, where Move provides the foundation for its implementation. However, Move has been developed with use cases in mind outside a blockchain context as well.
The question presumes a burden to argue against a feature that doesn’t exist. Surely the burden is on thise who would add a feature to argue that it’s required or at least cost-effective.
The first form impl didn’t support any attrs on the form elt, iirc. Then action was added so that the page could be at a different url than the query service. POST was originally motivated by limitations on URL / path lengths in servers, if I’m not mistaken…
Stepping back, POST is in some ways complete by itself (think of it as INVOKE rather than INSERT). GET is an optimization for caching and such.
What’s the argument for PUT or DELETE?
Although the original post was tongue-in-cheek, cap-std
would disallow things like totally-safe-transmute (discussed at the time), since the caller would need a root capability to access /proc/self/mem
(no more sneaking filesystem calls inside libraries!)
Having the entire standard library work with capabilities would be a great thing. Pony (and Monte too, I think) uses capabilities extensively in the standard library, which allows users to trust third party packages: if the package doesn’t use FFI (the compiler can check this) nor requires the appropriate capabilities, it won’t be able to do much: no printing to the screen, using the filesystem, or connecting to the network.
Yes. While Rust cannot be capability-safe (as explored in a sibling thread), this sort of change to a library is very welcome, because it prevents many common sorts of bugs from even being possible for programmers to write. This is the process of taming, and a tamed standard library is a great idea for languages which cannot guarantee capability-safety. The Monte conversation about /proc/self/mem still exists, but is only a partial compromise of security, since filesystem access is privileged by default.
Pony and Monte are capability-safe; they treat every object reference as a capability. Pony uses compile-time guarantees to make modules safe, while Monte uses runtime auditors to prove that modules are correct. The main effect of this, compared to Rust, is to remove the need for a tamed standard library. Instead, Pony and Monte tame the underlying operating system API directly. This is a more monolithic approach, but it removes the possibility of unsafe artifacts in standard-library code.
Yeah, I reckon capabilities would have helped with the security issues surrounding procedural macros too. I hope more new languages take heed of this, it’s a nice approach!
It can’t help with proc macros, unless you run the macros in a (Rust-agnostic) process-wide sandbox like WASI. Rust is not a sandbox/VM language, and has no way to enforce it itself.
In Rust, the programmer is always on the trusted side. Rust safety features are for protecting programs from malicious external inputs and/or programmer mistakes when the programmer is cooperating. They’re ill-suited for protecting against programs from intentionally malicious parts of the same program.
We might trust the compiler while compiling proc macros, though, yes? And the compiler could prevent calling functions that use ambient authority (along with unsafe rust). That would provide capability security, no?
No, we can’t trust the compiler. It hasn’t been designed to be a security barrier. It also sits on top of LLVM and C linkers that also historically assumed that the programmer is trusted and in full control.
Rust will allow the programmer to break and bypass language’s rules. There are obvious officially-sanctioned holes, like #[no_mangle]
(this works in Rust too) and linker options. There are less obvious holes like hash collisions of TypeId
, and a few known soundness bugs. Since security within the compiler was never a concern (these are bugs on the same side of the airtight hatchway) there’s likely many many more.
It’s like a difference between a “Do Not Enter” sign and a vault. Both keep people out, but one is for stopping cooperating people, and the other is against determined attackers. It’s not easy to upgrade a “Do Not Enter” sign to be a vault.
You can disagree with the premise of trusting the compiler, but I think the argument is still valid. If the compiler can be trusted, then we could have capability security for proc macros.
Whether to trust the compiler is a risk that some might accept, others would not.
But this makes the situation entirely hypothetical. If Rust was a different language, with different features, and a different compiler implementation, then you could indeed trust that not-Rust compiler.
The Rust language as it exists today has many features that intentionally bypass compiler’s protections if the programmer wishes so.
Between “do not enter” signs and vaults, a lot of business gets done with doors, even with a known risk that the locks that can be picked.
You seem to argue that there is no such thing as safe rust or that there are no norms for denying unsafe rust.
Rust’s safety is already often misunderstood. fs::remove_dir_all("/")
is safe by Rust’s definition. I really don’t want to give people an idea that you could ban a couple of features and make Rust have safety properties of JavaScript in a browser. Rust has an entirely different threat model. The “safe” subset of Rust is not a complete language, and it’s closer to being a linter for undefined behavior than a security barrier.
Security promises in computing are often binary. What does it help if a proc macro can’t access the filesystem through std::fs
, but can by making a syscall directly? It’s a few lines of code extra for the attacker, and a false sense of security for users.
Ok, let’s talk binary security properties. Object Capability security consists of:
There are plenty of formal proofs of the security properties that follow… patterns for achieving cooperation without vulnerability. See peer reviewed articles in https://gihub.com/dckc/awesome-ocap
This cap-std work aims to address #3. For example, with compiler support to deny ambient authority, it addresses std::fs.
Safe rust, especially run on wasm, is memory safe much like JS, yes? i.e. safe modulo bugs. Making a syscall requires using asm, which is not in safe rust.
Rust’s encapsulation is at the module level rather than object level, but it’s there.
While this cap-std and tools to deny ambient authority are not as mature as std, I do want to give people an idea that this is a good approach to building scalable secure systems.
I grant that the relevant threat model isn’t emphasized around rust the way it is around JS, but I don’t see why rust would have to be a different language to shift this emphasis.
I see plenty of work on formalizing safe rust. Safety problems seem to be considered serious bugs, not intentional design decisions.
In presence of malicious code, Rust on WASM is exactly as safe as C on WASM. All of the safety is thanks to the WASM VM, not due to anything that Rust does.
Safe Rust formalizations assume the programmer won’t try to exploit bugs in the compiler, and the Rust compiler has exploitable bugs. For example, symbol mangling uses a hash that has 1 in 2⁶⁴ chance of colliding (or less due to bday attack). I haven’t heard of anyone running into this by accident, but a determined attacker could easily compute a collision that makes their cap-approved innocent_foo()
actually link to the code of evil_bar()
and bypass whatever formally-proven safety the compiler tried to have.
Nix is often criticized for poor documentation. I think that’s because the UX sends the hapless user running for the docs so much more than other tools. The nix cli is a thin wrapper around a huge pile of interpreted code. If you get one of the flags wrong, you don’t get “the only options for that flag are A, B, and C”. You get “string found where int expected” in a stack trace.
I see quite a bit of interesting Ada/SPARK work.
I haven’t tried it out myself, but I appreciate the emphasis on safety and formal verification.
A user:sysadmin ratio of 1:1 has never been economical for email… not even before so much firepower went into spam. Now the economics are even worse.
Yeah, I’m always reading about the price. But my email server costs are:
per year.
With only 4 users (and I used to have more) that’s already breaking the price point of a lot of hosted solutions. I’m not actually doing this to save money, but I’m actually saving a little over Fastmail (last I checked) and I could be running on a cheaper box. The hosted Mailcow could actually be a little cheaper, for my case.
For what an anecdotal datapoint is worth, the last time I had to do any actual administrative work on my mail server outside of the occasional yum update
was…[checks etckeeper history]…four years ago. And realistically was maybe 30 minutes worth of work.
I’ more worried about the time I might have to spend when Google/Microsoft/… decide they don’t like my mails anymore, and I’m left figuring why, racing against the clock.
True, but I was mostly riffing off the “but it’s cheaper to let someone host it”. Only if nothing goes wrong and you keep on writing tickets and emails or be on the phone with support.
Of course my time is not free - but I choose to learn about stuff like email and not get too rusty. I actually do get rusty because I realistically don’t touch it for anything than simple security upgrades.
I still like Zulip after about 5 years of use, e.g. see https://oilshell.zulipchat.com . They added public streams last year, so you don’t have to log in to see everything. (Most of our streams pre-date that and require login)
It’s also open source, though we’re using the hosted version: https://github.com/zulip
Zulip seems to be A LOT lower latency than other solutions.
When I use Slack or Discord, my keyboard feels mushy. My 3 GHz CPU is struggling to render even a single character in the browser. [1]
Aside from speed, the big difference between Zulip and the others is that conversations have titles. Messages are grouped by topic.
The history and titles are extremely useful for avoiding “groundhog day” conversations – I often link back to years old threads and am myself informed by them!
(Although maybe this practice can make people “shy” about bringing up things, which isn’t the message I’d like to send. The search is pretty good though.)
When I use Slack, it seems like a perpetually messy and forgetful present.
I linked to a comic by Julia Evans here, which illustrates that feature a bit: https://www.oilshell.org/blog/2018/04/26.html
[1] Incidentally, same with VSCode / VSCodium? I just tried writing a few blog posts with it, because of its Markdown preview plugin, and it’s ridiculously laggy? I can’t believe it has more than 50% market share. Memories are short. It also has the same issue of being controlled by Microsoft with non-optional telemetry.
+1 on zulip.
category theory https://categorytheory.zulipchat.com/ rust-lang https://categorytheory.zulipchat.com/
These are examples of communities that moved there and are way easier to follow than discord or slack.
Zulip is light years ahead of everything else in async org-wide communications. The way the messages are organized makes it extremely powerful tool for distributed teams and cross-team collaboration.
The problems:
We used IRC and nobody except IT folks used it. We switched to XMPP and some of the devs used it as well. We switched to Zulip and everyone in the company uses it.
We self-host. We take a snapshot every few hours and send it to the backup site, just in case. If Zulip were properly federate-able, we could just have two live servers all the time. That would be great.
Is this actually a problem? I don’t think most people want federation, but easier SSO and single client for multiple servers gets you most of what people want without the significant burdens of federation (scaling, policy, etc.).
Sorry for a late reply.
It is definitely a problem. It makes it hard for two organizations to create shared streams. This comes up e.g. when an organization with Zulip for internal communications wants to contract another company for e.g. software development and wants them to integrate into their communications. The contractor needs accounts at the client’s company. Moreover, if multiple clients do this, the people working at the contracted company now have multiple scattered accounts at clients’ instances.
Creating stream shared and replicated across the relevant instances would be way easier, probably more secure and definitely more scalable than adding wayf to relevant SSOs. The development effort that would have to go into making the web client connect to multiple instances would probably be also rather high and it would not be possible to perform it incrementally. Unlike shared streams that might have some features disabled (e.g. custom emojis) until a way forward is found for them.
But I am not well versed in the Zulip internals, so take this with a couple grains of sand.
EDIT: I figure you might be thinking of e.g. open source projects each using their own Zulip. That sucks and it would be nice to have a SSO service for all of them. Or even have them somehow bound together in some hypothetical multi-server client. I would love that as well, but I am worried that it just wouldn’t scale (performance-wise) without some serious though about the overall architecture. Unless you are thinking about the Pidgin-style multi-client approach solely at the client level.
This is a little off topic, but Sublime Text is a vastly more performant alternative to VSCode.
Also off-topic: performant isn’t a word.
It is though. https://dictionary.cambridge.org/dictionary/english/performant
I feel like topic-first organization of chats is, which Zulip does, is the way to go.
VSCode’s telemetry can be disabled
https://code.visualstudio.com/docs/getstarted/telemetry#_disable-telemetry-reporting
It still sends some telemetry even if you do all that
https://github.com/VSCodium/vscodium/blob/master/DOCS.md#disable-telemetry
That page is a “dark pattern” to make you think you can turn it off, when you can’t.
In addition, extensions also have their own telemetry, not covered by those settings. From the page you linked:
I’ve spent several minutes researching that, and, from the absence of clear evidence that telemetry is still being sent if disabled (which evidence should be easy to collect for an open codebase), I conclude that this is a misleading statement.
The way I understand it, VS Code is a “modern app”, which uses a boatload online services. It does network calls to update itself, update extensions, search in the settings and otherwise provide functionality to the user. Separately, it collects gobs of data without any other purpose except data collection.
Telemetry disables the second thing, but not the first thing. But the first thing is not telemetry!
It took me awhile, but the source of my claim is from VSCodium itself, and this blog post:
https://www.roboleary.net/tools/2022/04/20/vscode-telemetry.html
https://github.com/VSCodium/vscodium/blob/master/DOCS.md#disable-telemetry
Also, in 2021, they apparently tried to deprecate the old setting and introduce a new one:
https://news.ycombinator.com/item?id=28812486
https://imgur.com/a/nxvH8cW
So basically it seems like it was the old trick of resetting the setting on updates, which was again very common in the Winamp, Flash, and JVM days – dark patterns.
However it looks like some people from within the VSCode team pushed back on this.
Having worked in big tech, this is very believable – there are definitely a lot of well intentioned people there, but they are fighting the forces of product management …
I skimmed the blog post and it seems ridiculously complicated, when it just doesn’t have to be.
So I guess I would say it’s POSSIBLE that they actually do respect the setting in ALL cases, but I personally doubt it.
I mean it wouldn’t even be a dealbreaker for me if I got a fast and friendly markdown editing experience! But it was very laggy (with VSCodium on Ubuntu.)
Yeah, “It still sends some telemetry even if you do all that” is exactly what VS Codium claim. My current belief is that’s false. Rather, it does other network requests, unrelated to telemetry.
That is an … interesting … design choice.
At the risk of belaboring the point, it’s a dark pattern.
This was all extremely common in the Winamp, Flash, and JVM days.
The thing that’s sad is that EVERYTHING is dark patterns now, so this isn’t recognized as one. People will actually point to the page and think Microsoft is being helpful. They probably don’t even know what the term “dark pattern” means.
If it were not a dark pattern, then the page would be one sentence, telling you where the checkbox is.
I’d say that most people haven’t been exposed to genuinely user-centric experiences in most areas of tech. In fact, I’d go so far as to say that most tech stacks in use today are actually designed to prevent the development of same.
The thing that feels new is how non-user-centric development tools are nowadays. And the possibility of that altering the baseline perception of what user-centric tech looks like.
Note: feels; it’s probably not been overly-user-centric in the past, but they were a bit of a haven compared to other areas of tech that have overt contempt for users (social media, mobile games, etc).
How would you do this differently? The same is true about any system with plugins, including, eg, Emacs and Vim: nothing prevents a plug-in from calling home, except for the goodwill of the author.
Kinda proves the point, tbh. To prevent a plugin from calling home, you have to actually try to design the plugin API to prevent it.
I think the question stands: how would you do it differently? What API would allow plugins to run arbitrary code—often (validly) including making network requests to arbitrary servers—but prevent them from phoning home?
Good question! First option is to not let them make arbitrary network requests, or require the user to whitelist them. How often does your editor plugin really need to make network requests? The editor can check for updates and download data files on install for you. Whitelisting Github Copilot or whatever doesn’t feel like too much of an imposition.
Capability security is a general approach. In particular, https://github.com/endojs/endo
For more… https://github.com/dckc/awesome-ocap
More fun: you have to design a plugin API that doesn’t allow phoning home but does allow using network services. This is basically impossible. You can define a plugin mechanism that has fine-grained permissions and a UI that comes with big red warnings when things want network permissions though and enforce policies in your store that they must report all tracking that they do.
Traditionally, this is prevented by repos and maintainers who patch the package if it’s found to be calling home without permission. And since the authors know this, they largely don’t add such functionality in the first place. Basically, this article: http://kmkeen.com/maintainers-matter/ (http only, not https).
We don’t necessarily need mandatory technical enforcement for this, it’s more about culture and expectations.
I think that’s the state of the art in many ecosystems, for better or worse. I’d say:
I don’t know anything about the VSCode ecosystem, but I imagine that there’s a way to deal with say plugins that start scraping everyone’s credit card numbers out of their e-mail accounts.
Every ecosystem / app store- type thing has to deal with that. My understanding is that for iOS and Android app stores, the process is pretty manual. It’s a mix of technical enforcement, manual review, and documented culture/expectations.
I’d also not rule out a strict sandbox that can’t make network requests. I haven’t written these types of plugins, but as others pointed out, I don’t really see why they would need to access the network. They could be passed the info they need, capability style, rather than searching for it all over your computer and network!
That’s exactly how it works: https://code.visualstudio.com/api/extension-guides/telemetry
Sure, but they don’t offer a “disable telemetry” setting.
What I’d do, would be to sandbox plugins so they can’t do any network I/O, then have a permissions system.
You’d still rely on an honour system to an extent; because plugin authors could disguise the purpose of their network operations. But you could at least still have a single configuration point that nominally controlled telemetry, and bad actors would be much easier to spot.
There is a single configuration point which nominally controls the telemetry, and extensions should respect it. This is clearly documented for extension authors here: https://code.visualstudio.com/api/extension-guides/telemetry#custom-telemetry-setting.