Threads for dckc

  1. 9

    Previous discussion on lobste.rs. The creator, @slightknack, is also on lobste.rs!

    1. 14

      Hey y’all, I was surprised to see this here today! I guess a few updates since the last time this was posted:

      • Currently we’re in the middle of a massive rewrite of the entire compiler pipeline. It currently lives in the big-refactor branch, and we’re trying to get it done by July to create the next release.

      • Since then I’ve done a lot of research and thinking on traits and effect systems, and think I have a neat unified approach that I’d like to start prototyping. We’ve also been working on the way Passerine integrates with Rust, and we hope to provide a safe sandboxed parallelizable runtime with a high-level API that integrates well with existing Rust libraries.

      • We’ve also been rewriting the macro system to allow for compile-time evaluation. This will be much more powerful lisp-style procedural macro system. Lisp is a powerful programming language for manipulating lists, which is why lisp macros, that operate on lists, fit the language so well. Passerine aims to be a powerful programming language for manipulating ADTs, so by representing the language as an ADT to be operated on by macros, we hope to capture the same magic and power of lisp’s macro system. (Note that the current rule-based macro system will still be available, just implemented in terms of the new one.)

      • The majority of the discussion around the development of the language happens on our Discord server[1]. We have meetings the first Saturday of each month with presentations on PLT, compiler engineering, and other neat stuff.

      • I’ve been working on an experimental BEAM-style runtime called Qualm that has a custom allocator that supports vaporization (the memory management technique Passerine uses, essentially tagged pointer runtime borrow checking.) I’m not sure it’ll be ready for the next release (as it requires type layouts to be specified, and the typechecker is currently WIP), but it is a nice demo for what I think is possible for the language.

      I’m here to answer any questions you may have about the language! I’m based in Europe so I might not see them till tomorrow morning, but don’t be afraid to ask, even if you think it’s a silly question.

      [1]: I tried starting a Matrix channel after people on lobste.rs brought it up last time. After a grand total of zero users had joined six months later, I went ahead and scrapped it. I love Matrix, so I might consider bridging the server in the future.

      1. 3

        I’m very curious about the differences between what Passerine does and what Perceus does in terms of both compile-time and runtime memory management!

        (For context, I’m working on a programming language that currently uses Perceus.)

        1. 3

          Sorry for the late response. I started writing something, but like 2k words later I realized it should probably be like a blog post, and not a random comment. I’ll summarize the gist of the argument, and link to the blog post later once I’ve finished it:

          So, as I’m sure you know, Perceus is a form of reference counting (limited to inductive types) that drastically reduces the number of reference count instructions that are produced. This makes reference counting more efficient, but Perceus is still reference counting at its core.

          Vaporization uses a single tag bit to keep track of whether a reference is currently immutable shared or mutable owned. This is a bit different than reference counting, as the number of shared references is not kept track of.

          When passing an object reference to a function, we set the reference tag to immutable shared if the reference is used again later in the caller’s scope. If this is the last-use of a value, we leave it as-is, allowing for efficient in-place updates in linear code. To update an object, the reference we have to it must be mutable owned; if the reference is immutable shared instead, the runtime will make a copy of the portion required. All references returned from functions must be mutable owned; when a function returns, all other mutable owned references tied to that stack frame are deallocated.

          If effect, a mutable owned reference is owned by a single stack frame; ownership can be passed up or down the call-stack on function call or return. When calling a function, we create a child stack frame that is guaranteed to have a shorter lifetime than the parent stack frame. Therefore, we can make as many immutable references as we’d like to data owned by parent stack frames, because all immutable references to that data will disappear when the child stack frame returns to the parent stack frame.

          Both Perceus and Vaporization do not allow cyclic data structures to be created (without requiring users to manage circular references themselves). This operating assumption drastically limits the type of data structures that can exist (in terms of pointer graphs). Because object graphs only be trees rooted at the stack (e.g. anything trivially serializable to JSON), it’s very simple to keep track of when things should be pruned from the heap.

          Someone much smarter than I am described Vaporization as “using the stack as a topological sort of the acyclic object graph.” I’m not sure whether this statement is fully correct, but I think it captures the gist of the idea.

          So, to answer your question, here’s what Vaporizations does with respect to compile-time and runtime memory management:

          • At compile-time, we annotate the last use of every variable in a given scope. When generating code, if we encounter a non-last-use variable, we emit an instruction to set the tag of the reference to shared immutable. (We also ensure that all closure captures are immutable).

          • At runtime, we update reference tags as annotated at compile-time, and we create deep copies of (portions of) objects as required when converting references from immutable shared to mutable owned.

          If we know type layouts, it’s possible to inline the code responsible for copying data and updating reference tags such that there is no runtime dependency. This makes Vaporization suitable for both static and dynamic languages alike.

          Hope this helps!

          PS—Oh, I see you’re the author of Roc! I remember watching your talk “Outperforming Imperative with Pure Functional Languages” a while back, it was quite interesting!

          1. 1

            Very helpful, thank you so much for the detailed explanation!

            Also, glad you found the talk interesting. Feel free to DM me if you have any questions about Roc!

        2. 2

          I’m interested in the run-time borrow checking idea. The part that makes sense to me is doing copy-on-write with refcounting: so you have pass-by-value semantics, but you can also do an in-place update when the refcount is 1.

          But by “borrow checking”, do you mean you want to give the programmer a way to say: “destroy this value at the end of this scope; if I mess up and retain a reference to it, let me know”? As opposed to: “keep this alive as long as I have references to it”.

          1. 1

            See my sibling answer for some more info on vaporization. We essentially use a single bit embedded in the pointer for the refcount, which can either be ‘1’ (mutable owned) or ‘more than 1’ (immutable shared).

            All values not returned from a function and not passed in as parameters to a function will be destroyed at the end of a given scope. The only way to retain a reference to an object would be to return it, which makes it very hard to mess up and retain a reference to it.

          2. 1

            While you’re open to big ideas, have you considered capability security?

            One of the coolest things I’ve added to awesome-ocap lately is Newspeak, with a really cool module system:

            Newspeak is an object-capability programming platform that lets you develop code in your web browser. Like Self, Newspeak is message-based; all names are dynamically bound. However, like Smalltalk, Newspeak uses classes rather than prototypes. The current version of Newspeak runs on top of WASM.

        1. 5

          In general, we recommend regularly auditing your dependencies, and only depending on crates whose author you trust.

          Or… Use something like cap-std to reduce ambient authority like access to the network.

          1. 8

            My understanding is that linguistic-level sandboxing is not really possible. Capability abstraction doesn’t improve security unless capabilities are actually enforced at runtime, by the runtime.

            To give two examples:

            • cap-std doesn’t help you ensure that deps are safe. Nothing prevents a dep from, eg, using inline assembly to make a write syscall directly.
            • deno doesn’t allow disk access by default. If you don’t pass --allow-net, no dependency will be able to touch the network. At the same time, there are no linguistic abstractions to express capabilities. (https://deno.land/manual/getting_started/permissions)

            Is there a canonical blog post explaining that you can’t generally add security to “allow-all” runtime by layering abstraction on top (as folks would most likely find a hole somewhere), and that instead security should start with adding unforgeable capabilities at the runtime level? It seems to be a very common misconception, cap-std is suggested as a fix in many similar threads.

            1. 2

              Sandboxing is certainly possible, with some caveats.

              You don’t need any runtime enforcement: unforgeable capabilities (in the sense of object capabilities) can be created with, for example, a private constructor. With a (package/module) private constructor, only your own package can hand out capabilities, and no one else is allowed to create them.

              cap-std doesn’t help you ensure that deps are safe.

              That is true, in the sense that no dependency is forced to use cap-std itself. But, if we assumed for a second that cap-std was the rust standard library, then all dependencies would need to go through it to do anything useful.

              Nothing prevents a dep from, eg, using inline assembly to make a write syscall directly.

              This can also be prevented by making inline assembly impossible to use without possesing a capability. You can do the same for FFI: all FFI function invokations have to take a FFI capability. With regards to the rust-specific unsafe blocks, you can either do the same (capabilities) or compiler-level checks: no dependencies of mine can use unsafe blocks unless I grant them explicit permission (through a compiler flag, for example).

              Is there a canonical blog post explaining that you can’t generally add security to “allow-all” runtime by layering abstraction on top […] and that instead security should start with adding unforgeable capabilities at the runtime level?

              I would go the other way, and recommend Capability Myths Demolished, which shows that object capabilities are enough to enforce proper security and that they can support irrevocability.

              1. 4

                With a (package/module) private constructor, only your own package can hand out capabilities, and no one else is allowed to create them.

                This doesn’t generally work-out in practice: linguistic abstractions of privacy are not usually sufficiently enforced by the runtime. In Java/JavaScript you often can use reflection to get the stuff you are not supposed to get. In Rust, you can just cast a number to a function pointer and call that.

                I would sum up it as follows: languages protect their abstractions, and good languages make it impossible to accidentally break them. However, practical languages include escape hatches for deliberate circumventing of abstractions. In the presence of such escapes, we cannot rely on linguistic abstractions for security. Java story is a relevant case study: https://orbilu.uni.lu/bitstream/10993/38190/1/paper.txt.

                Now, if you design a language with water-tight abstractions, this can work, but I’d probably call the result a “runtime” rather than a language. WASM, for example, can implement capabilities in a proper way, and Rust would run on WASM, using cap-std as n API for runtime. The security properties won’t be in cap-std, they’ll be in WASM.

                This can also be prevented by making inline assembly impossible to use without possesing a capability

                I don’t think this general approach would work for Rust. In Rust, unsafe is the defining feature of the language. Moving along these lines would make Rust more like Java in terms of expressiveness, and probably won’t actually improve security (ie, the same class of exploits from the linked paper would work).

                I would go the other way, and recommend Capability Myths Demolished

                Thanks, going to read that, will report back if I shift my opinions!

                EDIT: it seems that the paper is entirely orthogonal to what I am trying to say. The paper argues that cap model is better that ACL model. I agree with that! What I am saying is that you can’t implement the model on the language level. That is, I predict that even if Java used capability objects instead of security manager, it would have been exploitable more or less in the same way, as exploits breaking ACL would also break capabilities.

                1. 3

                  Go used to have a model where you could prohibit the use of package unsafe and syscall to try to get security. App Engine, for example, used this. But my understanding is that they ended up abandoning it as unworkable.

                  1. 2

                    Your points are sharp. Note that there was an attempt to make Java capability-safe (Joe-E), and it ended up becoming E because taming Java was too much work. Note also that there was a similar attempt for OCaml (Emily), and it was better at retaining the original language’s behavior, because OCaml is closer than Java to capability-safety.

                    ECMAScript is almost capability-safe. There are some useful tools, and there have been attempts to define safe subsets like Secure ECMAScript. But you’re right that, just like with memory-safety, a language that is almost capability-safe is not capability-safe.

                    While you’re free to think of languages like E as runtimes, I would think of E as a language and individual implementations like E-on-Java or E-on-CL as runtimes.

              2. 2

                porquoi no los dos?

              1. 4

                MarkM’s thesis is great, but you’re not in a mood to dive straight into 200 pages of text, there’s lots of related stuff to warm up with https://github.com/dckc/awesome-ocap

                1. 3

                  Can any users familiar with both talk to a comparison of a Nix flake + direnv? Just trying to build my own mental model of Hermit

                  1. 4

                    I’ve used direnv, and Nix, but not Nix flake. Hermit is more akin to asdf.

                    It differs from Nix primarily in that there’s no installation required at all when using a Hermit-enabled repo. You just clone it down and Hermit will auto-install itself and any packages in the repo as they’re used.

                    The FAQ has a section comparing it to different package managers including Nix.

                    1. 2

                      Hermit seems to carve out a smaller scope. In particular, it doesn’t model the process of building tools–just downloading them. And it doesn’t try to manage low level dependencies like libc nor higher level stuff like recreating pypi, npm, and crates.io

                      And it doesn’t try to provide security beyond https. No hashes, signatures, etc.

                      1. 1

                        This is mostly accurate, except there are opt-in SHA-256 hashes.

                    1. 3

                      Do any of these shortcomings also exist in Guix?

                      1. 5

                        Sort of…

                        They share the issue that unlike apt where if you get an option wrong, some C code tells you that you got an option wrong, nix and guix just pass the wrong option down into interpreted code where you eventually get “string found where integer expected” or some such.

                        The difference is: guix’s error messages come from guile, a much more mature language runtime than nix.

                        I try nix once or twice a year, but I have to learn in all over again each time, and it’s rough.

                        I tried guix for the first time this past week, and even though I hit many issues, I found it much easier to diagnose and address them.

                        Full disclosure: I do have years of scheme/lisp experience, though none of it very recent. I have also done a non-trivial amount of Haskel/ocaml work. I like ocaml more than scheme. But I hate “programming” shell scripts.

                        In guix, I see much less occasion to drop down out of scheme to shell scripting.

                      1. 1

                        The work with informal systems on the Agoric smart contracts kernel is my first substantive work with TLA+. Connecting the implementation with the formalization via tests generated by a model checker is my favorite part. We don’t have it running in ci yet, but here’s hoping!

                        1. 1

                          Is there anyone working on some better (verifiable?) approaches to crypto contracts? Or is the SotA still “you write solidity very very carefully”? I can’t imagine this approach will survive very long with various services/companies getting hacked or imploding every few weeks. Or at least don’t expect it to grow out of the on/cross chain defi speculation and pyramid schemes without changes.

                          1. 1

                            At agoric.com, we’re working on smart contracts based on object capabilities, which pretty much takes the onlyOwner confused deputy risk off the table.

                            Partitioning risks is pretty natural with ocaps. The Zoe API with offer safety takes another huge class of risks off the table.

                            External security reviews, including some formal verification, is in progress.

                            1. 1

                              Is there anyone working on some better (verifiable?) approaches to crypto contracts?

                              I think the Libra Diem people are working on this

                              Move is a next generation language for secure, sandboxed, and formally verified programming. Its first use case is for the Diem blockchain, where Move provides the foundation for its implementation. However, Move has been developed with use cases in mind outside a blockchain context as well.

                            1. 2

                              The question presumes a burden to argue against a feature that doesn’t exist. Surely the burden is on thise who would add a feature to argue that it’s required or at least cost-effective.

                              The first form impl didn’t support any attrs on the form elt, iirc. Then action was added so that the page could be at a different url than the query service. POST was originally motivated by limitations on URL / path lengths in servers, if I’m not mistaken…

                              Stepping back, POST is in some ways complete by itself (think of it as INVOKE rather than INSERT). GET is an optimization for caching and such.

                              What’s the argument for PUT or DELETE?

                              1. 8

                                Although the original post was tongue-in-cheek, cap-std would disallow things like totally-safe-transmute (discussed at the time), since the caller would need a root capability to access /proc/self/mem (no more sneaking filesystem calls inside libraries!)

                                Having the entire standard library work with capabilities would be a great thing. Pony (and Monte too, I think) uses capabilities extensively in the standard library, which allows users to trust third party packages: if the package doesn’t use FFI (the compiler can check this) nor requires the appropriate capabilities, it won’t be able to do much: no printing to the screen, using the filesystem, or connecting to the network.

                                1. 3

                                  Yes. While Rust cannot be capability-safe (as explored in a sibling thread), this sort of change to a library is very welcome, because it prevents many common sorts of bugs from even being possible for programmers to write. This is the process of taming, and a tamed standard library is a great idea for languages which cannot guarantee capability-safety. The Monte conversation about /proc/self/mem still exists, but is only a partial compromise of security, since filesystem access is privileged by default.

                                  Pony and Monte are capability-safe; they treat every object reference as a capability. Pony uses compile-time guarantees to make modules safe, while Monte uses runtime auditors to prove that modules are correct. The main effect of this, compared to Rust, is to remove the need for a tamed standard library. Instead, Pony and Monte tame the underlying operating system API directly. This is a more monolithic approach, but it removes the possibility of unsafe artifacts in standard-library code.

                                  1. 3

                                    Yeah, I reckon capabilities would have helped with the security issues surrounding procedural macros too. I hope more new languages take heed of this, it’s a nice approach!

                                    1. 4

                                      It can’t help with proc macros, unless you run the macros in a (Rust-agnostic) process-wide sandbox like WASI. Rust is not a sandbox/VM language, and has no way to enforce it itself.

                                      In Rust, the programmer is always on the trusted side. Rust safety features are for protecting programs from malicious external inputs and/or programmer mistakes when the programmer is cooperating. They’re ill-suited for protecting against programs from intentionally malicious parts of the same program.

                                      1. 2

                                        We might trust the compiler while compiling proc macros, though, yes? And the compiler could prevent calling functions that use ambient authority (along with unsafe rust). That would provide capability security, no?

                                        1. 5

                                          No, we can’t trust the compiler. It hasn’t been designed to be a security barrier. It also sits on top of LLVM and C linkers that also historically assumed that the programmer is trusted and in full control.

                                          Rust will allow the programmer to break and bypass language’s rules. There are obvious officially-sanctioned holes, like #[no_mangle] (this works in Rust too) and linker options. There are less obvious holes like hash collisions of TypeId, and a few known soundness bugs. Since security within the compiler was never a concern (these are bugs on the same side of the airtight hatchway) there’s likely many many more.

                                          It’s like a difference between a “Do Not Enter” sign and a vault. Both keep people out, but one is for stopping cooperating people, and the other is against determined attackers. It’s not easy to upgrade a “Do Not Enter” sign to be a vault.

                                          1. 3

                                            You can disagree with the premise of trusting the compiler, but I think the argument is still valid. If the compiler can be trusted, then we could have capability security for proc macros.

                                            Whether to trust the compiler is a risk that some might accept, others would not.

                                            1. 3

                                              But this makes the situation entirely hypothetical. If Rust was a different language, with different features, and a different compiler implementation, then you could indeed trust that not-Rust compiler.

                                              The Rust language as it exists today has many features that intentionally bypass compiler’s protections if the programmer wishes so.

                                              1. 1

                                                Between “do not enter” signs and vaults, a lot of business gets done with doors, even with a known risk that the locks that can be picked.

                                                You seem to argue that there is no such thing as safe rust or that there are no norms for denying unsafe rust.

                                                1. 3

                                                  Rust’s safety is already often misunderstood. fs::remove_dir_all("/") is safe by Rust’s definition. I really don’t want to give people an idea that you could ban a couple of features and make Rust have safety properties of JavaScript in a browser. Rust has an entirely different threat model. The “safe” subset of Rust is not a complete language, and it’s closer to being a linter for undefined behavior than a security barrier.

                                                  Security promises in computing are often binary. What does it help if a proc macro can’t access the filesystem through std::fs, but can by making a syscall directly? It’s a few lines of code extra for the attacker, and a false sense of security for users.

                                                  1. 1

                                                    Ok, let’s talk binary security properties. Object Capability security consists of:

                                                    1. Memory safety
                                                    2. Encapsulation
                                                    3. No powerful globals

                                                    There are plenty of formal proofs of the security properties that follow… patterns for achieving cooperation without vulnerability. See peer reviewed articles in https://gihub.com/dckc/awesome-ocap

                                                    This cap-std work aims to address #3. For example, with compiler support to deny ambient authority, it addresses std::fs.

                                                    Safe rust, especially run on wasm, is memory safe much like JS, yes? i.e. safe modulo bugs. Making a syscall requires using asm, which is not in safe rust.

                                                    Rust’s encapsulation is at the module level rather than object level, but it’s there.

                                                    While this cap-std and tools to deny ambient authority are not as mature as std, I do want to give people an idea that this is a good approach to building scalable secure systems.

                                                    I grant that the relevant threat model isn’t emphasized around rust the way it is around JS, but I don’t see why rust would have to be a different language to shift this emphasis.

                                                    I see plenty of work on formalizing safe rust. Safety problems seem to be considered serious bugs, not intentional design decisions.

                                                    1. 1

                                                      In presence of malicious code, Rust on WASM is exactly as safe as C on WASM. All of the safety is thanks to the WASM VM, not due to anything that Rust does.

                                                      Safe Rust formalizations assume the programmer won’t try to exploit bugs in the compiler, and the Rust compiler has exploitable bugs. For example, symbol mangling uses a hash that has 1 in 2⁶⁴ chance of colliding (or less due to bday attack). I haven’t heard of anyone running into this by accident, but a determined attacker could easily compute a collision that makes their cap-approved innocent_foo() actually link to the code of evil_bar() and bypass whatever formally-proven safety the compiler tried to have.

                                  1. 3

                                    Nix is often criticized for poor documentation. I think that’s because the UX sends the hapless user running for the docs so much more than other tools. The nix cli is a thin wrapper around a huge pile of interpreted code. If you get one of the flags wrong, you don’t get “the only options for that flag are A, B, and C”. You get “string found where int expected” in a stack trace.

                                    1. 1

                                      Are genode people interested in writing new code in Rust instead of C++?

                                      1. 2

                                        You’d have to port Rust, then ask them.

                                        1. 3

                                          Someone did a drive-by port, but the language moved too much and it had to be removed.

                                        2. 1

                                          I see quite a bit of interesting Ada/SPARK work.

                                          I haven’t tried it out myself, but I appreciate the emphasis on safety and formal verification.

                                        1. 5

                                          A user:sysadmin ratio of 1:1 has never been economical for email… not even before so much firepower went into spam. Now the economics are even worse.

                                          1. 3

                                            Yeah, I’m always reading about the price. But my email server costs are:

                                            • 120 EUR for the vserver
                                            • ~10 EUR per domain, in this case 3 important ones

                                            per year.

                                            With only 4 users (and I used to have more) that’s already breaking the price point of a lot of hosted solutions. I’m not actually doing this to save money, but I’m actually saving a little over Fastmail (last I checked) and I could be running on a cheaper box. The hosted Mailcow could actually be a little cheaper, for my case.

                                            1. 5

                                              What you didn’t factor into cost:

                                              • Your time.
                                              1. 4

                                                For what an anecdotal datapoint is worth, the last time I had to do any actual administrative work on my mail server outside of the occasional yum update was…[checks etckeeper history]…four years ago. And realistically was maybe 30 minutes worth of work.

                                                1. 6

                                                  I’ more worried about the time I might have to spend when Google/Microsoft/… decide they don’t like my mails anymore, and I’m left figuring why, racing against the clock.

                                                2. 1

                                                  True, but I was mostly riffing off the “but it’s cheaper to let someone host it”. Only if nothing goes wrong and you keep on writing tickets and emails or be on the phone with support.

                                                  Of course my time is not free - but I choose to learn about stuff like email and not get too rusty. I actually do get rusty because I realistically don’t touch it for anything than simple security upgrades.

                                            1. 33

                                              There’s a huge cultural problem around dependencies and a total lack of discipline for what features get used in most Rust projects.

                                              sled compiles in 6 seconds flat on my laptop, despite being a fairly complex Rust embedded database. Most database compile times are measured in minutes, even if they are written in C, C++ or Go.

                                              I feel that slow compilation times for a library are totally disrespectful to any users, who probably just want to solve a relatively simple issue by bringing in your library as a dependency anyway. But in the Rust ecosystem it’s not uncommon at all for a simple dependency that could have probably been written in 50 lines of simple code to pull in 75 dependencies to get the job done that you need it for. Pretty much all of my friends working in the Rust blockchain space have it especially bad since they tend to have to pull in something like 700 dependencies and spend 3 minutes just for linking for one commonly used dependency.

                                              Things I avoid to make compilation fast:

                                              • proc macros - these are horrible for compile times
                                              • build.rs same as above, also causes friction with tooling
                                              • deriving traits that don’t get used anywhere (side note, forcing users to learn your non-std trait is just as bad as forcing them to learn your non-std macro. It introduces a huge amount of friction into the developer experience)
                                              • generics for things I only use one concrete version of internally. conditional compilation is very easy to shoot yourself in the foot with but sometimes it’s better than generics for testing-only functionality.
                                              • dependencies for things I could write in a few dozen lines myself - the time saved for a project that I sometimes compile hundreds of times per day and have been building for over 4 years is a huge win. Everybody has bugs, and my database testing tends to make a lot of them pop out, but I can fix mine almost instantly, whereas it takes a lot of coordination to get other people to fix their stuff.

                                              Also, CI on every PR tends to finish in around 6 minutes despite torturing thousands of database instances with fault injection and a variety of other tests that most people only run once before a big release.

                                              Developer-facing latency is by far one of the most effective metrics to optimize for. It keeps the project feeling fun to hack on. I don’t feel dread before trying out my changes due to the impending waiting time. Keeping a project nice to hack on is what keeps engineers hacking on it, which means it’s also the most important metric for any other metrics like reliability and performance for any project that you hope to keep using over years. But most publicly published Rust seems to be written with an expiration date of a few weeks, and it shows.

                                              1. 11

                                                My take-away from the article is that open source allows different people to play different roles: the original dev got it working to their own satisfaction. Another user polished off some cruft. Everybody wins.

                                                I feel that slow compilation times for a library are totally disrespectful to any users …

                                                If someone write software to solve a problem and shares it as open source, I don’t consider it disrespectful regardless of the code quality. Only if someone else is compelled to use it or is paying for it would the developer have any obligation, IMO.

                                                1. 10

                                                  Same experience here. When I started writing durduff, I picked clap for parsing CLI arguments. After a while. I used cargo-deps to generate a graph of dependencies and it turned out that the graph was dominated by the dependencies pulled in by clap. So I switched to getopts. It cut the build times by something like 90%.

                                                  Another example: atty term_size depends on libc, but when you look at its source code, it has a lot of code duplicated with libc. The constants which need to be passed to the ioctl, with conditional compilation, because they have different values on different platforms. It seems to be a common theme: wrapping libraries, while still duplicating work. I replaced atty term_size with a single call to libc (in the end I stopped doing even that). One less dependency to care about.

                                                  That said, I still think that waiting a couple seconds for a project as small as durduff to compile is too much. It also slows down syntastic in vim: it’s really irritating to wait several seconds for text to appear every time I open a rust file in vim. It’s even worse with bigger projects like rustlib.

                                                  As for avoiding generics: I use them a lot for testing things in isolation. Sort of like what people use interfaces for in Go. With the difference that I don’t pay the runtime cost for it. I’m not giving this one up.

                                                  BTW Thank you for flamegraph-rs! Last weekend it helped me find a performance bottleneck in durduff and speed the whole thing up three-fold.

                                                  EDIT: I got crates mixed up. It was term_size, not atty that duplicated code from libc.

                                                  1. 11

                                                    it turned out that the graph was dominated by the dependencies pulled in by clap.

                                                    Did you try disabling some or all of Clap’s default features?

                                                    With respect to the OP, it’s not clear whether they tried this or whether they tried disabling any of regex’s features. In the latter case, those features are specifically intended to reduce compilation times and binary size.

                                                    Another example: atty depends on libc, but when you look at its source code, it has a lot of code duplicated with libc. The constants which need to be passed to the ioctl, with conditional compilation, because they have different values on different platforms. It seems to be a common theme: wrapping libraries, while still duplicating work

                                                    Huh? “A lot”? Looking at the source code, it defines one single type: Stream. It then provides a platform independent API using that type to check whether there’s a tty or not.

                                                    I replaced atty with a single call to libc. One less dependency to care about.

                                                    I normally applaud removing dependencies, but it’s likely that atty is not a good one to remove. Unless you explicitly don’t care about Windows users. Because isatty in libc doesn’t work on Windows. The vast majority of the atty crate is specifically about handling Windows correctly, which is non-trivial. That’s exactly the kind of logic that should be wrapped up inside a dependency.

                                                    Now, if you don’t care about Windows, then sure, you might have made a good trade off. It doesn’t really look like one to me, but I suppose it’s defensible.

                                                    That said, I still think that waiting a couple seconds for a project as small as durduff to compile is too much. It also slows down syntastic in vim: it’s really irritating to wait several seconds for text to appear every time I open a rust file in vim.

                                                    It takes about 0.5 seconds for cargo check to run on my i5-7600 after making a change in your project. Do you have syntastic configured to use cargo check?

                                                    1. 4

                                                      Did you try disabling some or all of Clap’s default features?

                                                      I disabled some of them. It wasn’t enough. And with the more fireworky features disabled, I no longer saw the benefit of clap over getopts, when getopts has less dependencies.

                                                      Huh? “A lot”? Looking at the source code, it defines one single type: Stream. It then provides a platform independent API using that type to check whether there’s a tty or not.

                                                      Ok, I got crates mixed up. It was term_size (related) that did that, when it could just rely on what’s already in libc (for unix-specific code). Sorry for the confusion.

                                                      I normally applaud removing dependencies, but it’s likely that atty is not a good one to remove. Unless you explicitly don’t care about Windows users. Because isatty in libc doesn’t work on Windows.

                                                      Yes, I don’t care about Windows, because reading about how to properly handle output to the windows terminal and output that is piped somewhere else at the same time left me with the impression that it’s just too much of a pain.

                                                      It takes about 0.5 seconds for cargo check to run on my i5-7600 after making a change in your project. Do you have syntastic configured to use cargo check?

                                                      I’ll check when I get back home.

                                                      1. 21

                                                        I no longer saw the benefit of clap over getopts, when getopts has less dependencies.

                                                        Well, with getopts you start out of the gate with a bug: it can only accept flags and arguments that are UTF-8 encoded. clap has OS string APIs, which permit all possible arguments that the underlying operating system supports.

                                                        You might not care about this. But I’ve had command line tools with similar bugs, and once they got popular enough, end users invariably ran into them.

                                                        Now, I don’t know whether this bug alone justifies that extra weight of Clap. Although I do know that Clap has to go out of its way (with additional code) to handle this correctly, because dealing with OS strings is hard to do in a zero cost way.

                                                        Yes, I don’t care about Windows, because reading about how to properly handle output to the windows terminal and output that is piped somewhere else at the same time left me with the impression that it’s just too much of a pain.

                                                        I think a lot of users probably expect programs written in Rust to work well on Windows. This is largely because of the work done in std to provide good platform independent APIs, and also because of the work done in the ecosystem (including myself) to build crates that work well on Windows.

                                                        My argument here isn’t necessarily “you should support Windows.” My argument here is, “it’s important to scrutinize all costs when dropping dependencies.” Particularly in a conversation that started with commentary such as “lack of discipline.” Discipline cuts both ways. It takes discipline to scrutinize all benefits and costs for any given technical decision.

                                                        1. 15

                                                          Just wanted to say thanks for putting in the effort to support Windows. ripgrep is one of my favorite tools, and I use it on Windows as well as Linux.

                                                          1. 1

                                                            I checked and I have a line like this in my .vimrc:

                                                            let g:rust_cargo_check_tests = 1
                                                            

                                                            That’s because I was annoyed that I didn’t see any issues in test code only to be later greeted with a wall of errors when compiling. Now I made a small change and cargo check --tests took 5 seconds on my AMD Ryzen 7 2700X Eight-Core Processor.

                                                            Well, with getopts you start out of the gate with a bug: it can only accept flags and arguments that are UTF-8 encoded. clap has OS string APIs, which permit all possible arguments that the underlying operating system supports.

                                                            I’ll reconsider that choice.

                                                    2. 4

                                                      But in the Rust ecosystem it’s not uncommon at all for a simple dependency that could have probably been written in 50 lines of simple code to pull in 75 dependencies to get the job done that you need it for.

                                                      I don’t think I have experienced this. Do you have an example of this on crates.io?

                                                      1. 1

                                                        Not quite to that degree, but I’ve seen it happen. Though I’ve also seen the ecosystem get actively better about this – or maybe I just now have preferred crates that don’t do this very much.

                                                        rand does this to an extent by making ten million little sub-crates for different algorithms and traditionally including them all by default, and rand is everywhere, so I wrote my own version. num also is structured that way, though seems to leave less things on by default, and deals with a harder problem domain than rand.

                                                        The main example of gratuitous transitive dependencies I recall in recent memory was a logging library – I thought it was pretty_env_logger but can’t seem to find it right now. It used winconsole for colored console output, which pulls in the 10k lines of cgmath, which pulls in rand and num both… so that it can have a single function that takes a single Vector2D.

                                                        …sorry, this is something I find bizarrely fun. I should probably make more time for it again someday.

                                                      2. 1

                                                        Minimizing your dependencies has further advantages like making the overall system easier to understand or avoiding library update problems.

                                                        Part of this is simply insufficient tooling.

                                                        Rebuilding all your dependencies should be rare. In practice, it happens way too often, e. g. frequently on every CI run without better build systems. That is madness. You can avoid it by e.g. using Nix or bazel.

                                                        I terms of timing, I’d also love to understand why linking is quite slow - for me often the slowest part.

                                                        But all in all, bloated compile times in dependencies would not be a major decision factor for me in choosing a library. Bloated link times or bloated compile times of my own crates are, since they affect my iteration speed.

                                                        That said, I think if you are optimizing the compile time of your crate, you are respecting your own time and that of your contributors. Time well spent!

                                                        1. 1

                                                          deriving traits that don’t get used anywhere

                                                          I take it you mean custom traits, and not things like Default, Eq/Ord, etc?

                                                          1. 9

                                                            check this out:

                                                            #[derive()]
                                                            pub struct S { inner: String }
                                                            

                                                            (do this in your own text editor)

                                                            1. 2dd (yank lines), 400p (paste 400 times)
                                                            2. %s/struct\ S/\=printf("struct S%d", line('.')) name all of those S’s to S + line number
                                                            3. time cargo build - 0.10s on my laptop
                                                            4. %s/derive()/derive(Debug)
                                                            5. time cargo build - 0.58s on my laptop
                                                            6. %s/derive(Debug)/derive(Debug, PartialOrd, PartialEq)
                                                            7. time cargo build - 2.46s

                                                            So, yeah, that deriving actually slows things down a lot, especially in a larger codebase.

                                                            1. 3

                                                              This is particularly annoying for Debug because either you eat the compile time or you have to go derive Debug on various types every time you want to actually debug something. Also if you don’t derive Debug on public types for a library then users of the library can’t do it themselves.

                                                              In languages like Julia and Zig that allow reflection at specialization time this tradeoff doesn’t exist. Eg in zig:

                                                              pub fn debug(thing: var) !void {
                                                                  const T = @TypeOf(thing);
                                                                  if (std.meta.trait.hasFn("debug")(T)) {
                                                                      // use custom impl if it exists
                                                                      thing.debug();
                                                                  } else {
                                                                      // otherwise reflect on type
                                                                      switch (@typeInfo(T)) {
                                                                          ...
                                                                      }
                                                                  }
                                                              }
                                                              

                                                              This function will work on any type but will only get compiled for types to which it is actually applied in live code so there’s no compile time overhead for having it available. But the reflection is compiled away at specialization time so there is no runtime overhead vs something like derive.

                                                              1. 2

                                                                Some numbers from a fairly large project:

                                                                $ cargo vendor
                                                                $ rg --files | grep -E '*.rs' | xargs wc -l | sort -n | tail -n 1
                                                                 1234130 total
                                                                $ rg --files | grep -E '*.rs' | xargs grep -F '#[derive' | grep -o -E '\(|,' | wc -l
                                                                22612
                                                                

                                                                If we extrapolate from your example a minimum of 2ms extra compile time per derive, this is adding >45s to the compile time for a debug build. But:

                                                                $ cargo clean && time cargo build
                                                                Finished dev [unoptimized + debuginfo] target(s) in 20m 58s
                                                                
                                                                real	20m58.636s
                                                                user	107m34.211s
                                                                sys	10m57.734s
                                                                
                                                                $ cargo clean && time cargo build --release
                                                                Finished release [optimized + debuginfo] target(s) in 61m 25s
                                                                real	61m25.930s
                                                                user	406m27.001s
                                                                sys	11m30.052s
                                                                

                                                                So number of dependencies and amount of specialization are probably the low hanging fruit in this case.

                                                                1. 1

                                                                  Doh, bash fail.

                                                                  $ rg --files -0 | grep -zE '\.rs$' | wc -l --files0-from=- | tail -n 1
                                                                  2123768 total
                                                                  $ rg --files -0 | grep -zE '\.rs$' | xargs -0 cat | grep '\[derive' | grep -oE '\(|,' | wc -l
                                                                  22597
                                                                  

                                                                  Same conclusion though.

                                                                2. 2

                                                                  Experiment independently reproduced, very nice results. I never realized this was significantly expensive!

                                                                  1. 1

                                                                    Thank you for this reply. It’s absolutely beautiful. You made an assertion, and this backs it up in a concise, understandable, and trivially reproducible way.

                                                                3. 0

                                                                  Without build.rs how are create authors going to mine Bitcoin on your computer?

                                                                  1. 2

                                                                    With a make install target, just like the olden days.

                                                                1. 1

                                                                  Do Google, Facebook, Apple, and co plan to be business in, say, 25 or 30 years? Do the billionaires plan to live that long?

                                                                  Or do they care about anyone who does? I’ve had a good run, but I’ve got kids and I’d hate to be the last generation that makes the place better for our kids – or heck, even livable.

                                                                  How long till the insurance underwriters stop playing along with the charade?

                                                                  It’s taking too long to get a government that represents the people rather than the corporations and the richest, but eventually it’s going to impact them too, no? Do they really expect to escape to Mars? Can they keep their heads in the sand much longer? I read something persuasive that climate change is in the hands of a few dozen billionaires. Many of us are less than six degrees of separation away, no?

                                                                  Climate change impacts the developing world harder/sooner. The first world has some $7 trillion in oil in the ground, and they’re going to be reluctant to write it off. Eventually I suspect the developing world is going to pick up sticks and stones and the other side is going to use that as an excuse to hit back and then where are we?

                                                                  p.s. confession: as to the headline question: I spent some time in the rchain.coop community as a way of doing something with my angst about this stuff. I suppose there’s some chance it’s part of the solution, but it’s an awfully long shot.

                                                                  1. 3

                                                                    I have a process for taking python scripts from prototype to production that I think strikes a better balance.

                                                                    I wrote it up: Practical production python scripts

                                                                    the commit log makes a reasonable summary:

                                                                    • only run when invoked as script
                                                                    • ocap discipline: explicit access to argv, stdout
                                                                    • quick-n-dirty arg parsing
                                                                    • factor out fizzbuzz() from main()
                                                                    • module doctest of usage
                                                                    1. 1

                                                                      Thanks for sharing. I have added a link to your post at the bottom of mine.