1. 2

    This is from 2015 so I guess there wasn’t a lot of uptake?

    1. 4

      Loads of stuff was using a restricted subset of JSON similar to this from the very beginning. I’d bet most JSON APIs use exclusively UTF-8, no invalid unicode, no duplicate keys, make sure to use strings for numbers which may not be representable in 64-bit floats, send binary data as base64-encoded strings, follow the must-ignore policy, etc. It’s just nice to have this stuff standardized, so that people can validate that their JSON parsers accept everything in I-JSON and protocol designers can ensure that they only output I-JSON.

      I’d bet that if we analyzed all JSON web traffic, we’d find that a huge amount of it follows I-JSON (though probably mostly unintentionally).

      1. 1

        given that all valid i-json is valid json, you’ll might never know :-)

      1. 7

        The slogan I’ve heard for this is Testing is a Software Design Activity. That is, it’s not just about correctness – the more important goal could be figuring out the natural structure of your software.

        Googling gives this blog post:

        https://toranbillups.com/blog/archive/2011/06/14/Test-Driven-Development-Is-A-Design-Activity/

        Though I don’t know where I heard it from first. I tend to agree, especially with the bits about mock objects. I wrote a few years ago that testing calcifies interfaces, which means you should be careful about what interfaces you test against, because those will last:

        http://www.oilshell.org/blog/2017/06/24.html#free-tests

        If you find yourself changing lots of tests whenever you refactor, that slows you down and indicates that you tested against unstable interfaces, which makes the tests less compelling as an indicator of correctness.

        I tend to think of example code for the API as the better test. That is, I try not to use mock objects because those are sort of a “cheat” around the fact that your code isn’t modular. If you test without mock objects it often leads you to a Unix-style, loosely coupled, modular design. So in that sense I agree that testing and modularity are closely related. They are about interface and protocols .

        1. 4

          This is a good subject for an article, and looks like good info, but the opening sentence makes me think it’s going to be bad. There’s some truth there but it lacks a lot of subtlety – slow and fast are relative to the application, etc. IMO a better opening sentence would be something like: “I hit this performance wall in my Python code, tried to speed it up with C, and was surprised”. What happened here? Why is this slow? etc.

          FWIW the way I think of it is that Python is 10-50x slower than native code, usually closer to 10x. So it’s about 1 order of magnitude. You have around 9 orders of magnitude to play on a machine; a distributed system might make that 11 to 14. Lots of software is 1 to 3 orders of magnitude too slow, regardless of language.

          Also, the most performance intensive Python app I ended up optimizing in C++ was mainly for reasons of memory, not speed. You get about an order of magnitude increase there too.

          1. 10

            Your estimate of of 10x seems WAY too low in my experience. It obviously depends on the use case, I/O bound programs will obviously be much closer because you’re not actually waiting for the language, but number crunching code tends to be closer to 100x than 10x.

            I just did a quick test, a loop with 100000000 function calls in C and Python. The C loop ran in 0.2 seconds; the Python program in 17.2 seconds. That’s an 86x difference. (Yes, the C code was calling a function from another TU, using a call instruction. The compiler didn’t optimize away the loop.)

            I also implemented the naive recursive factorial function in both C and Python. The C version calculated fib(40) in 0.3 seconds. The Python version calculated fib(40) in 42.5 seconds. That’s an 142x difference.

            I implemented the function to do 1 + 2 + 3 + ... + n (basically factorial but for addition) in C and Python, using the obvious iterative method. C did the numbers up to 1000000000 in 0.41 seconds. Python did it in one minute 38 seconds. That’s a 245x difference.

            Don’t get me wrong, Python is a fine language. I use it a lot. It’s fast enough for most things, and tools like numpy lets you do number crunching fairly quick in Python (by doing the number crunching in C instead of in Python). But Python is ABSOLUTELY a slow language, and depending on what you’re doing, rewriting your code from Python to C, C++ or Rust is likely to make your code hundreds of times faster. I have personally experienced, many times, that my Python code is anlayzing some large dataset in hours while C++ would’ve done it in seconds or minutes.

            You have around 9 orders of magnitude to play on a machine

            This is very often false. Games often spend many milliseconds on physics simulation; one order of magnitude is the difference between 60 FPS and 6 FPS. Non-game physics simulations can often take minutes; you don’t want to slow that down by a couple of orders of magnitude. Analyzing giant datasets can take hours; you really don’t want to slow that down by a few order of magnitude.

            1. 5

              Sure, I said 10 - 50x, but you can say 10 - 100x or 10 - 200x if you want. I measured exactly the fib use case at over 100x here:

              https://www.oilshell.org/release/0.8.10/benchmarks.wwz/mycpp-examples/

              Those are microbenchmarks though. IME 10-50x puts you more in the “centroid” of the curve. You can come up with examples on the other side too.

              I’d say it’s closer to 10x for the use cases that people actually use Python for. People don’t use it to write 60 fps games, because it is too slow for that in general.

              But this is all besides the point… If the post had included the subtleties that you replied with, then I wouldn’t quibble. My point is that making blanket statements without subtlety distracts from the main point of the article, which is good.

              1. 5

                But my point is that the subtleties aren’t required, because (C)Python just is a slow language. It doesn’t have to be qualified. Its math operations are slow, its function calls are slow, its control flow constructs are slow, its variable lookups are slow, it’s just slow by almost any metric compared to JITs and native code. If the article had started with a value judgement, like “Python is too slow”, I would agree with you, but “Python is slow” seems extremely defensible as a blanket statement.

                1. 5

                  Well I’d say it’s not a useful statement. OK let’s concede it’s slow for a minute – now what? Do I stop using it?

                  A more helpful statement is to say what it’s slow relative to, and what it’s slow for.

                  To flip it around, R is generally 10x slower than Python (it’s pretty easy to find some shockingly slow code in the wild; it can be optimized if you have the right knowledge). It’s still the best tool for many jobs, and I use it over Python, even though I know Python better. The language just has a better way of expressing certain problems.

              2. 4

                There’s actually another dimension of slowness that I see people often forget about when making comparisons like this. Due to the GIL, you’re essentially limited to a single core when your Python code is running, but a C/Rust/Go/Haskell program can use all the available cores within a single process. This means that in addition to the x10-x100 speed up you get from using those languages, you have another x10-x100 room for vertical scaling within a single process, for a combined x100-x10000. Of course, you can run multiple Python processes on the same hardware, or run them across a cluster of single core instances, but you’re not in a single process anymore. Which means it’s much harder to share memory and you have new inter-process architectural challenges which might limit what you can do.

                For example, if I write my web application backend in Haskell, I can expect to vertically scale it quite a lot and depending on the use case, I might even decide to stick with a single process model permanently, where I know that all the requests will arrive at the same process, so I can take advantage of the available memory for caching and I can keep ephemeral state local to the process, greatly simplifying my programming model. Single-process concurrency is much simpler than distributed concurrency after all. If I wrote that backend in Python, I would have to design for multi-process from the start, because SQLAlchemy would very soon bottleneck the GIL while generating my SQL queries…

            1. 7

              I find inheritance very useful for structuring my own code. However where it seems to falls down is when you make a big framework class and let other people inherit from it. Code that uses this mechanism across module boundaries doesn’t seem to grow gracefully.

              Also, I find the most natural uses of inheritance have 1 to 3 virtual/overridden methods and maybe 5-10 overall. If you see 5 overridden methods and 20-30 overall, then IMO that’s a design smell.


              I think the Go designers said that “the power of of an interface is inversely proportional to the number of methods it has”. I find that to be true of inherited methods.

              You can use inheritance like interfaces in small- / medium- codebases, where you are doing global refactoring. (It’s definitely true that some large codebases and ecosystems have horrible inheritance hierarchies that can never be changed, and that’s something you really want to avoid IMO.)

              Go interfaces are nice but they also have a surprisingly tricky runtime implementation. Inheritance is simpler (and yes performs better!)

              1. 1

                One thing I realized with a little more thinking is that I no longer seem to use inheritance hierarchies more than 1 level deep. When you have something 2 levels deep it seems better to factor it into composition and inheritance, or composition only. This feels like “interfaces” a bit, since interfaces don’t really have a hierarchy.

                Also I’d say that implementation inheritance probably only happens in 20% or less of the code. I don’t think the whole program should be architected around inheritance… It’s just one tool, and composition is the “default”.

              1. 12

                I got a lot out of reading DJB’s daemontools a number of years ago.

                https://cr.yp.to/daemontools.html

                https://github.com/daemontools/daemontools/tree/master/src

                It’s good if you want to see how to write simple and reliable C code in a very careful and minimalist way. This paper has some thoughts on DJB’s style of secure C coding:

                https://blog.acolyer.org/2018/01/17/some-thoughts-on-security-after-ten-years-of-qmail-1-0/

                DJB also notably uses shell and C together to minimize privilege.

                I think you can start at any file with a main(), as it is a small collection of utilities, loosely joined. The overall design is as important as the code.


                Another good read is CPython. There are definitely things I don’t like about it, but it’s been well maintained by a small-ish group of people for 30 years now, which is incredible.

                It’s not a project where one person does everything. I think that’s a good contrast to DJB’s style, which is more about keeping everything small so that one person can vouch for correctness and security.

                It’s obviously important to the world, which makes it worth reading. But I would also say that the code is significantly easier to read than its contemporaries: Perl, Ruby, PHP, R, and arguably Lua. (I have looked at all of them to varying degrees, as well as many other language implementations)

                It’s extremely modular and globally coherent. Seeing how PyObject and PyTypeObject work together actually taught me a lot about the Python language, even after I had programmed in it for ~15 years!

                I’m not sure you can start in one place by reading CPython; I think it’s easier to write your own Python-C extension, and that may have give you a hint of how the interpreter works. It’s very simple, open to extension, and dynamically typed. C sort of lends itself to this dynamically typed architecture which tends to “grow well”. There are a lot of things about CPython that could be more optimal locally, but I think it has a lot of global coherence and that’s one reason why it has lasted.


                Another good read is xv6, which is the modernized source for v6 Unix, and taught at MIT. It’s extremely easy to compile and modify, which is rare for an OS. I added a command line tool to it and ran it in QEMU, and it was easy (I think it also taught me how to run QEMU :) ). It’s good for understanding where C and Unix came from.


                As for Python code, I got a lot of out of this, but it’s NOT easy to read. It’s just small. If you know Python well then it’s fun to figure out the puzzle of how they did it: http://www.tinypy.org/

                There’s also a Python bytecode compiler in Python here that is interesting because it’s very short and Lispy:

                https://github.com/darius/tailbiter

                It definitely reminded me that you can write Python with a Lisp accent :) :) Very cool and short.

                accompanying article: https://codewords.recurse.com/issues/seven/dragon-taming-with-tailbiter-a-bytecode-compiler

                1. 8

                  Another good read is CPython

                  Seconded. The CPython implementation is quite straightforward. It doesn’t use too many tricks to improve its speed, which means the code is easier to read than hyper bummed implementations. Speaking of which, the same is certainly true for Scheme48 which was also written for clarity and with simplicity in mind.

                  I found SBCL to be a treasure trove of solid code as well, since here too most of the system is implemented in Lisp itself. It’s a bit more complex to navigate as it’s very big, but I found it very valuable to study when I was reading up on bignum implementations.

                  1. 1

                    I agree and like that it’s straightforward, though one exception is ceval.c, the main bytecode interpreter loop. It is really long and full of macros and and obscure control flow. Not very readable IMO, which is why I started hacking on the Python versions.

                    I think some code generation could simplify things (even though it also adds another level of indirection). That is not too uncommon for bytecode interpreters; I think one of the JS Engines like JavaScriptCore uses a Ruby DSL to express the bytecode instruction set. Apparently PHP has a whole lot of indirection and code generation there but I haven’t looked closely. I think bytecode loops are just awkward for C! (although I guess no other language really does better, including C++ as far as I can tell)

                    1. 2

                      Ruby itself also uses codegen in its bytecode interpreter: https://github.com/ruby/ruby/blob/master/insns.def

                      1. 1

                        Oh yeah I think I have peeked at that file before! Definitely looks cleaner than how CPython has done it.

                  2. 2

                    If you’re a little intimidated to read CPython yourself and would like a ‘guided tour’, Philip Guo has an excellent set of lectures where he just goes through the code piece-by-piece. He goes from ‘CPython is just a bunch of .c and .h files’ to ‘you create an iterator from a generator object by calling PyObject_SelfIter, which just increments a ref counter and returns itself.’

                    1. 1

                      I neglected to mention a shell codebase :) Aboriginal Linux is defunct but its goal was to be the smallest Linux system that can rebuild itself. (In that sense it’s similar to recent Guix bootstrapping efforts.)

                      http://landley.net/aboriginal/

                      And it’s all written in shell. It’s like a mini-Linux from scratch. Linux from Scratch is also worthwhile though it takes forever to do, whereas Aboriginal is small.

                      So in a sense I think Aboriginal gives you a better idea of how to build Linux from scratch – how to build and configure a kernel, and what’s in user space and how to assemble it. It also gave me more of an idea of how embedded devs think and code which is considerably different than server side / desktop / web / etc. developers.

                      It’s much clearer than say Debian, which is a bunch of shell-make-custom-tool-package-manager gobbledygook. Aboriginal is pure shell. It’s closer to a program than a bunch of scripts grown over time.

                    1. 47

                      Some of them seem to have concluded that the superiority of print debugging is some kind of eternal natural law.

                      This looks like a strawman to me. I think that everyone would prefer a interactive, powerful, time-machine debugger. Whenever I use print debugging is when it is not worth the trouble of dealing with actually existing debuggers. I’m proficient enough in GDB, but not in PDB, so I print debug whenever I have to deal with python.

                      Saying “but debuggers could be better” is not that interesting. My reaction is “99% of everything in computing - hypertext, screen sharing, data transfer etc. could be better, but that is not the position from which I get to decide what to use.”

                      1. 5

                        This looks like a strawman to me. I think that everyone would prefer a interactive, powerful, time-machine debugger.

                        I definitely know people who sneer at the mere concept of a debugger. Or even print debugging. They just want to read and understand the code directly.

                        1. 15

                          Beware of bugs in the above code; I have only proved it correct, not tried it.

                          – Knuth

                          1. 11

                            The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.

                            – Kernighan

                            1. 7

                              Not to knock Kernighan, but he said that decades ago. We should have figured out something more effective while being just as easy to use and just as free by now.

                              1. 7

                                What has dramatically changed with debuggers over these decades? I’ve used interactive debugger in XCode recenty: it could as well be Turbo Pascal.

                                1. 7

                                  The problem is Unix debuggers haven’t really caught up to Turbo Pascal. My experiences with 90’s Visual C++ have been nicer than gdb.

                                  1. 3

                                    The author of the post is explicitly advertising reversible debugging with rr. I guess he didn’t do it that well :)

                                    When you hit a crash or invalid state, you can go backwards and find what caused it.

                                    As I understand it, GDB does have reversible debugging, but it’s inefficient. rr works on x86 and Linux and is efficient. It’s impressive, but so far I’m getting by with regular GDB and IntelliJ. I would like to get in the flow with rr in the future.

                                    1. 1

                                      In my experience GDB’s reverse debugging is not inefficient, it just doesn’t work well. It cannot handle SSE instructions, doesn’t record the whole process tree, has issues with side effects etc. The two things I like about it when compared to RR is that you can choose to only “record” (it’s not actually a recording, as far as I understand it just forks the debugee) a small part of the program rather than the whole execution and that it works on more architectures than just x86_64. But these two advantages do not outweigh all the disadvantages it has when compared to RR, in my opinion.

                                2. 4

                                  Sometimes our thoughts lead to questions, and answering those questions leads to more thoughts and more questions. Being able to answer many questions without having to rerun the entire program each time leads to more thoughts with less effort, leading to more effective debugging.

                                  Debuggers are great is what I’m saying

                                  1. 3

                                    Things like rr, which is a recent thing, doesn’t work on my AMD hardware. So for the rest of us most debuggers give you the info in the reverse order… usually you need to work your way back to the problem so they don’t avoid rerunning the program.

                                    1. 3

                                      Things like rr, which is a recent thing, doesn’t work on my AMD hardware.

                                      Yeah, incompatibility with AMD hardware is a serious drawback. However, initial support for AMD CPUs has been merged in the past year, so thinks are improving on that front.

                                    2. 1

                                      I agree about debuggers. I think Kernighan and Pike on debugging (and more) has been informative to me. I use a debugger almost daily, but I think sometimes a well placed print statement is all one needs.

                                3. 2

                                  In a perfect (statically-analyzed) world debugging would be a last resort. I use print statements to contextualize myself far less often in a TypeScript or Haskell codebase than I do in say, a Python codebase where the input/output types of a function are not always documented, even at the library level.

                                  1. 1

                                    It this because the don’t understand debuggers or because they have “transcended” them?

                                    1. 4

                                      It’s because they find debuggers cause them to focus on the small problem of what is happening instead of the big-picture design problem that resulted in the bad behaviour. I think

                                1. 4

                                  I’d say this is among the first 10 papers you should read in distributed systems: it’s short, communicates an important idea, and is highly cited; the author did a lot of important work in the field, and won a Turing Award a few years ago:

                                  Time, clocks, and the ordering of events in a distributed system

                                  https://scholar.google.com/scholar?cluster=2180534073950948375&hl=en&as_sdt=0,5

                                  https://lamport.azurewebsites.net/pubs/time-clocks.pdf

                                  I’d also recommend “Fallacies of Distributed Computing” for getting in the right mindset for building: https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing

                                  Papers that make connections to adjacent fields are also good, like operating systems:

                                  Your computer is already a distributed system. Why isn’t your OS?

                                  https://www.usenix.org/legacy/event/hotos09/tech/full_papers/baumann/baumann.pdf

                                  For the adjacent field of programming languages, I would look up the original MapReduce and Spark papers (and follow their citations if you like). Everyone who works on distributed systems has to understand something about programming languages. If you don’t then you get languages written in YAML (not really joking about that)

                                  Computer networking is another adjacent field; I don’t have a good paper recommendation there, but maybe someone else does. Maybe a textbook will cover the connections. I try to read about the things that “won” (like TCP/IP) and also a little about the things that didn’t.

                                  1. 17

                                    Unfortunately, OpenRC maintenance has stagnated: the last release was over a year ago.

                                    I don’t really see this as a bad thing.

                                    1. 12

                                      Also, wouldn’t the obvious choice be to pick up maintenance of OpenRC rather than writing something brand new that will need to be maintained?

                                      1. 10

                                        There is nothing really desirable about openrc and it simply does not support the required features like supervision. Sometimes its better to start fresh, or in this case with the already existing s6/s6-rc which is build on a better design.

                                        1. 3

                                          There is nothing really desirable about openrc

                                          I’d say this is a matter of opinion, because there’s inherent value in simplicity and systemd isn’t simple.

                                          1. 5

                                            But why compare the “simplicity” to systemd instead of something actually simple, openrcs design choices with its shell wrapping instead of a simple supervision design and a way to express dependencies outside of the shell script is a lot simpler. The daemontool like supervision systems simply have no boilerplate in shell scripts and provide good features like tracking pids without pid files and therefor reliably signaling the right processes, they are able to restart services if they get down and they provide a nice and reliable way to collect stdout/stderr logs of those services.

                                            Edit: this is really what the post is about, taking the better design and making it more user friendly and implementing the missing parts.

                                        2. 3

                                          the 4th paragraph

                                          This work will also build on the work we’ve done with ifupdown-ng, as ifupdown-ng will be able to reflect its own state into the service manager allowing it to start services or stop them as the network state changes. OpenRC does not support reacting to arbitrary events, which is why this functionality is not yet available.

                                          also, the second to last graf

                                          Alpine has gotten a lot of mileage out of OpenRC, and we are open to contributing to its future maintenance while Alpine releases still include it as part of the base system, but our long-term goal is to adopt the s6-based solution.

                                          so, they are continuing to maintain OpenRC while alpine still requires it, but it doesn’t meet their needs, hence they are designing something new

                                        3. 3

                                          I was thinking the same thing.

                                          I have no sources, but when was the last time OpenBSD or FreeBSD had a substantial change to their init systems?

                                          I don’t know enough to know why there’s a need to iterate so I won’t comment on the quality of the changes or existing system.

                                          1. 12

                                            To my knowledge, there’s serious discussion in the FreeBSD community about replacing their init system (for example, see this talk from FreeBSD contributor and previous Core Team member Benno Rice: The Tragedy of systemd).

                                            And then there’s the FreeBSD-based Darwin, whose launchd is much more similar to systemd than to either BSD init or SysVinit to my knowledge.

                                            1. 4

                                              this talk from FreeBSD Core Team member Benno Rice: The Tragedy of systemd).

                                              This was well worth the watch/listen. Thanks for the link.

                                            2. 8

                                              I believe the last major change on FreeBSD was adding the rc-order stuff (from NetBSD?) that allowed expressing dependencies between services and sorting their launch order so that dependencies were fulfilled.

                                              That said, writing a replacement for the FreeBSD service manager infrastructure is something I’d really, really like to do. Currently devd, inetd, and cron are completely separate things and so you have different (but similar) infrastructure for running a service:

                                              • At system start / shutdown
                                              • At a specific time
                                              • In response to a kernel-generated event
                                              • In response to a network connection

                                              I really like the way that Launchd unifies these (though I hate the fact that it uses XML property lists, which are fine as a human-readable serialisation of a machine format, but are not very human-writeable). I’d love to have something that uses libucl to provide a nice composable configuration for all of these. I’d also like an init system that plays nicely with the sandboxing infrastructure on FreeBSD. In particular, I’d like to be able to manage services that run inside a jail, without needing to run a service manager inside the jail. I’d also like something that can set up services in Capsicum sandboxes with libpreopen-style behaviour.

                                              1. 1

                                                I believe the last major change on FreeBSD was adding the rc-order stuff (from NetBSD?) that allowed expressing dependencies between services and sorting their launch order so that dependencies were fulfilled.

                                                Yep, The Design and Implementation of the NetBSD rc.d system, Luke Mewburn, 2000. One of the earlier designs of a post-sysvinit dependency based init for Unix.

                                                1. 1

                                                  I’ve been able to manage standalone services to run inside a jail, but it’s more than a little hacky. For fun a while back, I wrote a finger daemon in Go, so I could keep my PGP keys available without needing to run something written in C. This runs inside a bare-jail with a RO mount of the homedirs and not much else and lots of FS restrictions. So jail.conf ended up with this in the stanza:

                                                  finger {
                                                          # ip4.addr, ip6.addr go here; also mount and allow overrides
                                                          exec.start = "";
                                                          exec.stop = "";
                                                          persist;
                                                          exec.poststart = "service fingerd start";
                                                          exec.prestop = "service fingerd stop";
                                                  }
                                                  

                                                  and then the service file does daemon -c jexec -u ${runtime_user_nonjail} ${jail_name} ${jail_fingerd} ...; the tricky bit was messing inside the internals of rc.subr to make sure that pidfile management worked correctly, with the process finding handling that the jail is not “our” jail:

                                                  jail_name="finger"
                                                  jail_root="$(jls -j "${jail_name}" path)"
                                                  JID=$(jls -j ${jail_name} jid)
                                                  jailed_pidfile="/log/pids/fingerd.pid"
                                                  pidfile="${jail_root}${jailed_pidfile}"
                                                  

                                                  It works, but I suspect that stuff like $JID can change without notice to me as an implementation detail of rc.subr. Something properly supported would be nice.

                                                2. 2

                                                  I think the core issue is that desktops have very different requirements than servers. Servers generally have fixed hardware, and thus a hard-coded boot order can be sufficient.

                                                  Modern desktops have to deal with many changes like: USB disks being plugged in (mounting and unmounting), Wi-Fi going in and out, changing networks, multiple networks, Bluetooth audio, etc. It’s a very different problem

                                                  I do think there should be some “server only” init systems, and I think there are a few meant for containers but I haven’t looked into them. If anyone has pointers I’d be interested. Desktop is a complex space but I don’t think that it needs to infect the design for servers (or maybe I’m wrong).

                                                  Alpine has a mix of requirements I imagine. I would only use it for servers, and its original use case was routers, but I’m guessing the core devs also use it as their desktops.

                                              1. 16

                                                I think its speed is one of the thing which makes apk (and therefore alpine) so well suited to containers.

                                                It used to be that the slowness of apt wasn’t a huge issue. You would potentially have to let apt spin in the background for a few minutes while upgrading your system, and, once in a blue moon when you need a new package right now, the longer-than-necessary wait isn’t a huge issue. But these days, people spin up new containers left right and center. As a frequent user of Ubuntu-based containers, I feel that apt’s single-threaded, phase-based design is frequently a large time cost. It’s also one of the things which makes CI builds excruciatingly slow.

                                                1. 4

                                                  distri really can’t happen fast enough… the current state of package management really feels stuck in time.

                                                  1. 1

                                                    I feel like speed could be a non issue if the repository state was “reified” somehow. Then you could cache installation as a function, like

                                                    f(image state, repo state, installation_query) -> new_image_state
                                                    

                                                    This seems obvious but doesn’t seem like the state of the art. (I know Nix and guix do better here, but I also need Python/JS/R packages, etc.)

                                                    The number of times packages are installed in a container build seems bizarre to me. And it’s not even that; right now every time I push to a CI on Travis and sourcehut it installs packages. It seems very inefficient and obviously redundant. I guess all the CI services run a package cache for apt and so forth, but I don’t think that is a great solution. I use some less common package managers like CRAN, etc.

                                                    1. 2

                                                      Part of it is no doubt that hosted CI platforms don’t do a great job of keeping a consistent container build cache around. You usually have to manually manage saving and restoring the cache to some kind of hosted artifact repository, and copying it around can add up to a nontrivial chunk of your build time.

                                                      At my previous job, that was a big part of our motivation for switching to self-hosted build servers: with no extra fussing, the build servers’ local Docker build caches would quickly get populated with all the infrequently-changing layers of our various container builds.

                                                    2. 1

                                                      This sounds reasonable, until you realise it means that containers are constantly being rebuilt rather than just persisted and loaded when needed.

                                                      1. 3

                                                        Yeah, but they are. Look at any popular CI - TravisCI, CircleCi, builds.sr.ht, probably many, many others. They all expect you to specify some base image (usually Debian, Ubuntu or Alpine), a set of packages you need installed on top of the base, and some commands to run once the packages are installed. Here’s an example of the kind of thing which happens for every commit to Sway: https://builds.sr.ht/~emersion/job/496138 - spin up an Alpine image, install 164 packages, then finally start doing useful work.

                                                        I’m not saying it’s good, but it’s the way people are doing it, and it means that slow package managers slow things down unreasonably.

                                                        1. 2

                                                          If you’re rebuilding your OS every time you want to test or compile your application, it’s not the package manager making it slow, no matter what said package manager does.

                                                        2. 1

                                                          Persistence can be your enemy in testing environments.

                                                          1. 2

                                                            Sure re-deploy your app, but rebuild the OS? I understand everybody does it all the time (I work in the CI/CD space), but that doesn’t mean it’s a good idea.

                                                      1. 76

                                                        Imagine you lived in a country that has a strange tradition where children, once a year, go to stranger’s homes and are given candy. Then, someone, in order to study the safety of this tradition, decide to give out candies laced with a mild and provably non-lethal toxin to children. This someone has a fool-proof plan to inform the children’s parents before anyone gets hurt. Not all parents tests candies for toxins, but enough do – since things like this can happen and parents of this country takes safety reasonably seriously. One parent detected this toxin in the children’s candies. All the parents are informed and said candies were thrown out. No harm, no foul?

                                                        Imagine you lived in a country where no neighbors can be trusted. Imagine you worked in a low trust environment. Imagine stopping the OSS model because none of the contributors can be trusted.

                                                        That’s not the kind of world we want to operate in.

                                                        1. 32

                                                          I think this sums up why I felt a bit sick about that whole story. It undermines the community and is essentially antisocial behaviour disguised as research. Surely they could have found a way to prove their point in a more considerate way.

                                                          1. 8

                                                            Surely they could have found a way to prove their point in a more considerate way.

                                                            Could you propose some alternative approaches? As the saying goes, POC || GTFO, so I suppose the best way to prove something’s vulnerability is a harmless attack against it.

                                                            The kernel community appears to assume good faith in every patch they receive from random people across the Internet, and this time they get mad when the researchers from UMN prove this wishful assumption to be false. On the other hand, cURL goes to great lengths to prevent the injection of backdoors. The kernel is clearly more fundamental than any userland utilities, so either the cURL developers are unnecessarily cautious against supply chain attacks, or the kernel hackers are overly credulous.

                                                            1. 16

                                                              Another possible approach is to ask the lead maintainers if you can perform such an experiment. Linux has a large hierarchy and I think the top level maintainers pull huge patch sets as a bundle.

                                                              If they had permission to use an unrelated e-mail address then it could be pretty much as good. Honestly I would think a umn.edu address would give more credence to a patch, since it seems like its from someone a reputable institution.

                                                              Of course they might not agree, in which case you don’t have consent to do the research.

                                                              1. 18

                                                                This. You ask for permission. Talk to the kernel maintainers, explain your research and your methods, and ask if they want to participate. You can do things like promise a maximum number of bogus patches and a timeframe where they may occur, so people know they won’t get deluged with crap for the rest of time. You could even make a list of email addresses the patches will come from ahead of time and hand it to someone trustworthy involved in the kernel project who won’t be reviewing those patches directly, so once the experiment is over they can easily revert all the bad patches even if the researcher is hit by a bus in the mean time. It’s not that hard to conduct this sort of research ethically, these researchers just didn’t do it.

                                                                1. 6

                                                                  That’s a fair point, but I want to point out that the non-lead reviewers are still unknowingly participate in the research, so that’s still not super ethical to them. Doing so merely shifts the moral pressure to the lead maintainers, who need to decide whether or not to “deceive” the rest of the community.

                                                                  But yeah, only lead reviewers can revert commits and have enough influence in the tech world, so getting their permission is probably good enough.

                                                                  1. 6

                                                                    A top comment in a cousin thread on HN suggests, that with proper procedure, AFAIU actually all reviewers could be informed. The trick seems to be to then wait some long enough time (e.g. weeks or more) and send the patches from diverse emails (collaborating with some submitters outside your university). There should be also some agreed upon way of retracting the patches. The comment claims that this is how it’s done in the industry, for pen testing or some other “wargames”.

                                                                2. 5

                                                                  In the subsystems that I’ve contributed to, I imagine that it would be possible to ask a maintainer for code review on a patchset, with phrasing like, “I am not suggesting that this be merged, but I will probably ask you to consider merging it in the future.” After the code review is given, then the deception can be revealed, along with a reiterated request to not merge the patches.

                                                                  This is still rude, though. I don’t know whether it’s possible to single-blind this sort of study against the software maintainers without being rudely deceptive.

                                                                  1. 2

                                                                    I think you could ask them if you can anonymously submit some patches sometime over the next few months and detail how some of them will contain errors that you will reveal before merging.

                                                                    They might say no, but if they say yes it’s a reasonably blind test, because the maintainer still won’t know which patches are part of the experiment and which are not.

                                                                    Another way to do it would be to present the study as something misleading but also do it in private and with compensation so that participants are not harmed. Say you just want to record day-in-the-life stuff or whatever and present them with some patches.

                                                                    Finally, you could look at patches historically and re-review them. Some existing patches will have been malicious or buggy and you can see if a more detailed review catches things that were missed.

                                                              2. 17

                                                                This research was clearly unethical, but it did make it plain that the OSS development model is vulnerable to bad-faith commits. I no longer feel what was probably a false sense of security, running Linux. It now seems likely that Linux has some devastating back doors, inserted by people with more on their minds than their publication records.

                                                                1. 15

                                                                  This is something every engineer, and every human needs to be aware at some point. Of course, given enough effort, you can fool another human into doing something wrong. You can send anthrax spores via mail, you can fool drivers to drive off a cliff by carefully planted road signs, you can fool a maintainer into accepting a patch with a backdoor. The reason it doesn’t happen all the time is that most people are not in fact dangerous sociopath having no problem causing real harm just to prove their point (whatever that is).

                                                                  The only societal mechanism we have for rare incidents such as this one is that they usually get eventually uncovered either by overzealous reviewers or even by having caused some amount of harm. That we’re even reading about patches being reverted is the sign that this imperfect mechanism has in fact worked in this case.

                                                                2. 2

                                                                  This country’s tradition is insanely dangerous. The very fact that some parents already tested candy is the evidence that there was some attempts to poison children in the past — and we don’t know how many of these attempts actually succeeded.

                                                                  So, if we assumed that the public outcry from this event lead to all parents testing all the candy, or changing tradition altogether, then doing something like this would result in more overall good than evil.

                                                                  1. 10

                                                                    Meanwhile in real life, poisoned Hallowe’en candy is merely an urban legend: According to Snopes, “Police have never documented actual cases of people randomly distributing poisoned goodies to children on Halloween.”

                                                                    The very fact that some parents already tested candy is the evidence that there was some attempts to poison children in the past

                                                                    Not really. Again in the real world, hospitals run candy testing services in response to people’s fears, not actual risks. From the same Snopes article: “Of several contacted, only Maryland Hospital Center reported discovering what seemed to be a real threat — a needle detected by X-ray in a candy bar in 1988. … In the ten years the National Confectioners Association has run its Halloween Hot Line, the group has yet to verify an instance of tampering”.

                                                                1. 1

                                                                  Has anyone embedded a WASM engine, whether Wasm3 or otherwise? I’d be interested in hearing experiences.

                                                                  1. 4

                                                                    I’ve used wasmtime from Rust a couple times, though not for anything large. I made a language that compiled to wasm and used wasmtime to run its test cases. The interface is Rust-unsafe but pretty easy: load a wasm module, look up a function by name, give it a signature, and then you can just call it like any other function. Never got complicated enough to do things like pass pointers around though, or make host functions accessible to the wasm code.

                                                                    1. 4

                                                                      I worked on a production “hosted function” system first in go with life then rewrote to rust with lucet. Was very nice to work with

                                                                      1. 3

                                                                        Microsoft Flight Simulator (2020) somewhat embeds a Wasm engine. Addons are written in Wasm and compiled to native code using inNative, a LLVM frontend for Wasm, which solves problems with multi-platform support and restrictions on JIT compilation on consoles. The PC builds embed inNative’s JIT, which uses LLVM’s JIT interface, for faster development edit/test cycles.

                                                                        1. 3

                                                                          That’s sensational! I had been wondering when we were going to improve on LUA for modding.

                                                                          I’m also somewhat keen on the idea of sandboxing native libraries that are called from high-level languages; if the performance overhead can be brought suitably low, I would really like to be relatively safe from library segfaults (especially for image processing).

                                                                        2. 1

                                                                          What do you mean by embedded. Like, in an app or on some hw device?

                                                                          1. 4

                                                                            Just in a C++ app :) Actually I have an idea to embed WASM in https://www.oilshell.org/ . One use case is to solve some bootstrapping problems with dev tools. For example, if you have a tool written in a native language like a parser generator, then it’s somewhat of a pain for people to either build those, or for the maintainer to distribute binaries for them (especially if they change often).

                                                                            So it seems natural to write a shell script and call out to an “embedded” arch-independent binary in those cases. (Though this probably won’t happen for a long time.)

                                                                            (BTW the work on wasm3 seems very cool, I looked at the code a bit, and hope to learn more about WASI)

                                                                            1. 1

                                                                              I think wasm3 is perfect for this scenario. Especially if you realize that wasm “plugins” can be written in a variety of languages. C/C++, Rust, TinyGo, AssemblyScript, Swift…

                                                                              1. 1

                                                                                Yes the polyglot nature is very natural for shell :) How stable is WASI now?

                                                                                Is it easy to compile and run some C code like this with wasm3 ? Can I just use clang and musl libc or are there some other tools? Any examples to start from? I have run wasm in the browser but I didn’t compile any C.

                                                                                int main(char** argv, int argc) {
                                                                                   read(0, 1024);  // read from stdin
                                                                                   write(2, argv[0]);  // write to stderr
                                                                                
                                                                                   char *p = getenv("PATH");
                                                                                   write(1, p);
                                                                                   return 0;
                                                                                }
                                                                                

                                                                                So I want to call main directly; so I guess I need a wasm stub that calls it?

                                                                                I think I want to provide only a argv/ENV/stdin/stdout/stderr interface to simulate a sandboxed C program. I’m not sure binary blobs loaded into the shell to be able to read and write arbitrary files. The files should be opened in the shell, like this:

                                                                                my-wasm-program.wasm <input.txt >output.txt
                                                                                

                                                                                This also has some bearing on incremental computation like Make, e.g. knowing the inputs and outputs precisely from shell, rather than having to analyze C code or WASM code.

                                                                                1. 1

                                                                                  This is exactly what you want. You can compile C to Wasi easily using wasienv. Also, it’s a matter of runtime configuration, not to allow FS access. Std in/out are open by default, but can also be blocked.

                                                                                  1. 2

                                                                                    Hm so how do I embed it in an application and use the C API? I looked at the README.md, the doc/ folder, and this header:

                                                                                    https://github.com/wasm3/wasm3/blob/main/source/wasm3.h

                                                                                    I don’t see any C code examples?

                                                                                    In contrast the Python binding has an example in the README:

                                                                                    https://github.com/wasm3/pywasm3

                                                                                    1. 2

                                                                                      Good idea. I’ll create some kind of tutorial ;)

                                                                                      1. 0

                                                                                        I don’t see any C code examples?

                                                                                        Check out this: https://github.com/wasm3/wasm3/blob/main/docs/Cookbook.md

                                                                          1. 3

                                                                            Do you think they will ever let wasm access the dom api

                                                                            1. 6

                                                                              The day WASM can access the DOM directly is the day the last line of JavaScript ever will be written. I kid, but also not totally :-)

                                                                              1. 8

                                                                                I don’t see how they’re going to solve the GC problem. If you have DOM manipulation by untrusted code sent over the network, then you really want GC.

                                                                                And once you add GC to WASM it’s basically like the JVM, and that means that it’s better for certain languages than others. It’s already is biased toward certain languages (C and Rust due to lack of GC), but I think it will be even more so with GC. Because GC requires rich types, knowledge of pointers, etc. and right now WASM has a very minimal set of types (i32, i64, f32, f64).

                                                                                1. 4

                                                                                  Could the browser just kill the tab process if it exceeds some memory threshold? I don’t understand why GC is necessary

                                                                                  1. 3

                                                                                    Unfortunately that would limit the browser to roughly content-only pages, in which case you don’t need WASM. Think Google Maps (and pages that embed Google Maps), Protonmail, games, etc. And anything that uses a “SPA” architecture, which is for better or worse increasingly common.

                                                                                    All those are long-lived apps and need GC. GC is a global algorithm, spanning languages. Web browsers use GC for C++ too, when JS (or in theory WASM) hold references to DOM objects: https://trac.webkit.org/wiki/Inspecting%20the%20GC%20heap

                                                                                    1. 1

                                                                                      I see, so the concern isn’t with a rogue WASM app causing bad performance in the other browser tabs, it is about being unable to write the performant WASM app at all without GC?

                                                                                      1. 1

                                                                                        If you didn’t have GC, a browser tab could allocate all the memory on your computer, and many would! The GC is necessary to reclaim memory so it can be used by other tabs / programs.

                                                                                        It’s very common to allocate in a loop. That’s no problem in Python and JavaScript because the GC will pause in the middle of the loop and take care of it. In C, you usually take care to reuse the allocation, which is what makes the code “more detailed”.

                                                                                        I have some first hand experience with this because I wrote a shell which does not deallocate anything. That’s not good enough! :) It may actually be OK for small scripts, but there are long running programs in shell too. You can write a loop that reads every line of a file, etc.

                                                                                        So I’m writing a garbage collector to fix that: http://www.oilshell.org/blog/2021/03/release-0.8.8.html#the-garbage-collector-works-on-a-variety-of-examples

                                                                                        Right now the practice for WASM is to either write in C or Rust and manually deallocate – or ship a GC over the network with the program, which isn’t ideal for a number of reasons.

                                                                                        1. 3

                                                                                          But you can easily consume all the memory anyway just by making an array and continually growing it, or a linked list, or whatever. So what’s the difference?

                                                                                          Like, wouldn’t it be enough for the WASM code to have some way of telling the browser’s GC its refcount for each object it’s holding on to so it doesn’t get GCed out from under it?

                                                                                          1. 1

                                                                                            That’s done with weak refs for guest objects owned by js objecs, but there’s nothing to handle the other direction afaik.

                                                                                    2. 3

                                                                                      I have noticed recent versions of Safari (at least on arm64) do this. First you get a little warning underneath the tab bar saying “This tab is using significant amounts of memory — closing it may improve responsiveness” (paraphrased). It doesn’t actually go ahead and kill it for you for quite some time, but I’ve noticed that e.g. on putting the computer to sleep and waking it up again, a tab so-marked gets reloaded. It is a little annoying, but it doesn’t come up very often to begin with.

                                                                                    3. 2

                                                                                      Agreed, it’s a tricky thing, particularly given how including a GC or other runtime niceties in compiled code bloats the downloaded asset for each site. So I also can’t imagine that they intend to do nothing.

                                                                                      1. 5

                                                                                        Yeah I haven’t been following closely, but it seems like the WASM GC+types enhancements are bigger than all of WASM itself to date. (e.g. there are at least 2 WASM complete interpreters that are all of 3K lines of C code; that would no longer be possible)

                                                                                        It’s possible to do, but it’s not a foregone conclusion that it will happen, or be good!

                                                                                        I’d also say that manipulating the DOM is inherently dynamic, at least with the way that web apps are architected today. I say that because (1) DOM elements are often generated dynamically and (2) the values are almost all strings (attributes like IDs and classes, elements, contents of elements, CSS selectors, etc.).

                                                                                        Writing that kind of code in a statically typed language is not likely to make it any better or safer. You’d probably want something other than the DOM in a more static language. I’d also go as far as to say that JS is better than most dynamic languages at these kinds of tasks, simply because it was designed for it and has libraries/DSLs for it. Python or Lua in the browser sounds good until you actually try to rewrite the code …

                                                                                      2. 1

                                                                                        Why can’t GC be optional? “You can turn on the GC, but then you have to comply with this additional set of requirements about using rich types, informing the GC of pointers, etc.”

                                                                                        Edit: this actually seems like it must work since it is essentially the existing “ship a GC over the network” solution, except the you don’t have to actually pay the bandwidth to ship it over the network because it’s already in the browser. Unless I’m missing something, which I definitely could be!

                                                                                        1. 1

                                                                                          Here’s an idea:

                                                                                          • you can hold references to DOM nodes. Accessing properties requires going through accessor functions that null-coalesce basically

                                                                                          • if you just have a reference, it can get GC’d out from under you (accessors will guard from the dereference tho)

                                                                                          • however, the reference can be reference counted. You can increment the reference count, decrement it (see the Python FFI). You can of course memory leak like this. But making it optional means you can also try and be clever.

                                                                                          • no handling of cyclical issues. You wanna memory leak? Go for it. Otherwise implement GC youself

                                                                                          Reference counted GC doesn’t involve stop the world stuff, and since you likely won’t be linking DOM elements together cycles would be much rarer.

                                                                                          1. 1

                                                                                            WASM already has a feature called “reference types”, which allows WASM programs to contain garbage collected references to DOM objects. Chrome and Firefox added support for this feature last summer. I don’t know all the details, but you can google it.

                                                                                        2. 2

                                                                                          I thought you could already do this. What about this package https://github.com/koute/stdweb for accessing the DOM using Rust that is compiled to WASM?

                                                                                          1. 3

                                                                                            That basically bridges to JS to do the actual DOM manipulation. I don’t remember the exact details.

                                                                                        3. 2

                                                                                          Misread this as “the doom API”

                                                                                          …I’m fairly sure doom has been ported to wasm, anyhow

                                                                                          1. 1

                                                                                            I mean, you already can? You just have to add the bindings yourself. But that makes sense because not every wasm blob will want the same access or in the same way…

                                                                                          1. 3

                                                                                            I think this is why interoperability is really important. Otherwise we get everyone re-inventing the wheel.

                                                                                            It’s better to have 10 hero projects that work together and add up to 10x the value, rather than have 10 hero projects that do the same thing in slightly different, incompatible ways!

                                                                                            It’s definitely true that you can go faster / write fewer bugs with your own code. Kinda unfortunate too but I don’t know a way around it. I think this is especially true in open source where you’re doing it for fun and not being paid.

                                                                                              1. 27

                                                                                                I’d recommend a NUC here. I’ve tried using an RPi 1, and then an RPi 3 as desktops, but both were painful compared to a NUC, which was drama-free. I’ve never had any problems with mainstream Linux on mine. IIRC, it comes with either SATA or M.2.

                                                                                                1. 4

                                                                                                  I’ve also used an Intel compute stick when traveling. It has the added benefit of not needing an hdmi cable.

                                                                                                  1. 2

                                                                                                    It has its benefits, but it was slow when it came out five years ago… I used one for a conference room and it really is disappointing. A NUC would have been better. Harder to lose if you do take it traveling, too.

                                                                                                  2. 3

                                                                                                    I agree with this: If you don’t want a laptop, a very small form factor PC is a better choice than a more barebones SBC for use as a general-purpose PC. The NUC is great, though there’s some similar alternatives on the market too.

                                                                                                    I have a Zotac ZBOX from a little while ago. It has a SATA SSD, Intel CPU and GPU, and works great in Linux. In particular it has two gigabit NICs and wifi, which has made it useful to me for things like inline network traffic diagnosis, but it’s generally useful as a Linux (or, presumably, Windows) PC.

                                                                                                    The one I own has hdmi, displayport, and vga, making it compatible with a wide selection of monitors. That’s important if you’re expecting to use random displays you find wherever you’re going to. It also comes with a VESA bracket so it can be attached to the back of some computer monitors, which is nice for reducing clutter and cabling.

                                                                                                    1. 2

                                                                                                      Never heard of a NUC before now but I can agree that trying to use an RPi as a desktop is unpleasant.

                                                                                                      1. 1

                                                                                                        Yeah the Pi CPUs are very underpowered, it’s not even a fair comparison. They’re different machines for different purposes. I would strongly recommend against using a Pi as your primary Linux development machine.

                                                                                                        I think this is the raspberry Pi 4 CPU, at 739 / 500:

                                                                                                        https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A72+4+Core+1500+MHz&id=3917

                                                                                                        And here’s the one in the NUC I bought for less than $500, at 7869 / 2350 :

                                                                                                        https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-8260U+%40+1.60GHz&id=3724

                                                                                                        So it’s it’s 4-5x faster single-threaded, and 10x faster overall !!! Huge difference.

                                                                                                        One of them is 1500 Mhz and the other one is 1600 Mhz, but there’s a >10x difference in computer. So never use clock speed to compare CPUs, especially when the architecture is different!

                                                                                                      2. 2

                                                                                                        Yeah I just bought 2 NUCs to replace a tower and a mini PC. They’re very small, powerful, and the latest ones seem low power and quiet.

                                                                                                        The less powerful NUC was $450, and I got portable 1920x1080 monitor for $200, so it’s much cheaper than a laptop, and honestly pretty close in size! And the CPU is good, about as powerful as the best desktop CPUs you could get circa 2014:

                                                                                                        https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-8260U+%40+1.60GHz&id=3724

                                                                                                        old CPU which was best in class in a tower in 2014: https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-4790+%40+3.60GHz&id=2226

                                                                                                        (the more powerful one was $800 total and even faster: https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-10710U+%40+1.10GHz&id=3567 although surprisingly not that much faster)

                                                                                                        This setup, along with a keyboard and trackball, is very productive for coding. I’m like the OP and don’t like using a laptop. IMO the keyboard and monitor shouldn’t be close together for good posture.

                                                                                                        In contrast the tower PC in 2014 was $700 + ~$300 in upgrades, and the monitor from ~2006 was $1000 or more. Everything is USB-C too on the NUC/monitor setup which is nice.

                                                                                                        I guess my tip is to not upgrade your PC for 7-10 years and you’ll be pleasantly surprised :) USB-C seems like a big improvement.

                                                                                                        1. 4

                                                                                                          Yeah I just bought 2 NUCs to replace a tower and a mini PC. They’re very small, powerful, and the latest ones seem low power and quiet.

                                                                                                          NUCs are great machines, but they are definitely not quiet. Because of their blower-style fan, they become quite loud as soon as the CPU is just a bit under load. Audio proof: https://www.youtube.com/watch?v=rOkyFLrPc3E&t=341s

                                                                                                          1. 2

                                                                                                            So far I haven’t had a problem, but it’s only been about 3 weeks.

                                                                                                            The noise was the #1 thing I was worried about, since I’m sensitive to it, but it seems fine. For reference I replaced the GPU fan in my 2014 Dell tower because it was ridiculously noisy, and I have a 2012 era Mac Mini clone that is also ridiculously noisy when idle. The latter honestly 10x louder than the NUC when idle, and I have them sitting side by side now.

                                                                                                            The idle noise bothers me the most. I don’t have any usage patterns where you are running with high CPU for hours on end. Playing HD video doesn’t do much to the CPU; that appears to be mostly GPU.

                                                                                                            I’m comparing against a low bar of older desktop PCs, but I also think Macbook Airs have a similar issue – the fan spins really loud when you put them under load. For me that has been OK. (AdBlock goes a long way on the Macbooks, since ads code in JS is terrible and often pegs the CPU.)


                                                                                                            I think the newer CPUs in the NUCs are lower power too. Looking at the CPU benchmarks above, the 2014 Dell i7 is rated a 84 W TDP. The 2020 i5 is MORE powerful, and rated 10 W TDP down and 25 W TDP up.

                                                                                                            I’m not following all the details, but my impression is that while CPUs didn’t get that much faster in the last 7 years, the power usage went down dramatically. And thus the need to spin up fans, and that’s what I’ve experienced so far.

                                                                                                            I should start compiling a bunch of C++ and running my open source release process to be sure. But honestly I don’t know of any great alternative to the NUCs, so I went ahead and bought a second one after using the first one for 3 weeks. They’re head and shoulders above my old PCs in all dimensions, including noise, which were pretty decent at the time.

                                                                                                            I think the earlier NUCs had a lot of problems, but it seems (hopefully) they’ve been smoothed out by now. I did have to Google for a few Ubuntu driver issues on one of them and edit some config files. The audio wasn’t reliable on one of them until I manually changed a config with Vim.

                                                                                                        2. 1

                                                                                                          I have also been using a NUC for a year now, and it works well. A lot of monitors also allow you to screw the NUC to its back, decluttering your desk.

                                                                                                          Just watch out, it has no speakers of it’s own!

                                                                                                        1. 46

                                                                                                          To me Rust’s strings “clicked” when I realized that C also has two kinds of strings:

                                                                                                          1. char* from malloc that you must call free() on.
                                                                                                          2. char* not from malloc, or pointing to a middle of someone’s allocation, that you must never call free() on.

                                                                                                          Rust uses String for case 1, and &str for case 2.

                                                                                                          C uses char* for both, giving illusion that they’re interchangeable, but they’re not — you’re going to leak or crash if you mix them up.

                                                                                                          1. 1

                                                                                                            Yeah C++ added string_view for the case where you don’t own it, and string is something you must deallocate. But it’s still hard to use and people say string_view combined with certain coercions is a recipe for leaks or UAF.

                                                                                                            1. 1

                                                                                                              Indeed, welcome after, C++. To me, string_view is the end to the age old code review fights over whether function arguments should be const std::string& or const char* (when you don’t need a mutable string). As a replacement for these, I don’t think string_view is any more of a recipe for leaks or use-after-free than these alternatives already were, so I wouldn’t say it’s hard to use. Just s/const std::string&/std::string_view/g on your code base, pretty much, for a pretty safe & easy performance win.

                                                                                                          1. 3

                                                                                                            There are 3 lobst.ers links referenced in this announcement! Thanks for all the stories :)

                                                                                                            1. 10

                                                                                                              I wonder about several courses of action.

                                                                                                              We could double down on the concept of copyleft. Treating corporations as people has led to working-for-hire, a systematic separation of artists from their work. We could not just extend the Four Freedoms to corporations, but also to artists, by insisting that corporations are obligated to publish their code; they could stop publishing their code only if they stopped taking that code from artists and claiming it as their own. A basic transitional version of this is implemented by using licenses like AGPLv3, and it does repel some corporations already.

                                                                                                              We could require decentralization. Code is currently delivered in a centralized fashion, with a community server which copies code to anybody, for free. This benefits corporations because of the asymmetric nature of their exploitation; taking code for free is their ideal way of taking code. Systems like Bittorrent can partially remedy the asymmetry by requiring takers to also be sharers.

                                                                                                              We could work to make languages easier to decompile and analyze. This partially means runtime code auditing, but it also means using structured language primitives and removing special cases. In the future, corporations could be forced to choose between using unpleasant eldritch old languages or using languages that automatically publish most of their API and logic in an auditable and verifiable manner to any requestor. And they’ll do anything we recommend, as long as it comes with an ecosystem; look at how many folks have actively embraced C, PHP, JS, etc. over the years.

                                                                                                              1. 15

                                                                                                                I don’t think that there is anything that can be accomplished by messing around with licenses and in fact trying to keep those sane and not too exotic is the one good thing big tech has done in my opinion.

                                                                                                                What’s missing is something different that can break the “get vc money”, “acquire users”, “build a moat”, “return money to vc” dance. I personally have no idea what that could be. The one thing I know, is that it produces unlovable software and that there are enough people out there that can do better with a fraction of the money.

                                                                                                                I also don’t think that the answer lies in more Free software zealotry.

                                                                                                                1. 14

                                                                                                                  What’s missing is something different that can break the “get vc money”, “acquire users”, “build a moat”, “return money to vc” dance. I personally have no idea what that could be.

                                                                                                                  I would encourage you to look into the model of worker-owned cooperatives.

                                                                                                                  1. 8

                                                                                                                    I would encourage you to look into the model of worker-owned cooperatives.

                                                                                                                    I’ve seen this work for consultancies. Everyone involved has to actually be of like mind though, and that can be harder than it appears to actually manifest and sustain.

                                                                                                                    1. 4

                                                                                                                      Why would that change the quality of the software? In order to do it right, you’d need money coming in and a small enough team, and if you have a small team and money coming in you can probably make quality software, no matter who holds the company shares.

                                                                                                                    2. 8

                                                                                                                      I appreciate your thoughts. I agree that things could be better.

                                                                                                                      As the saying goes, “if there were no copyright, then there would be no need for copyleft.” (Who first said this? I think it’s from an RMS essay.) The current focus on copyleft is because of the judo which licensing allows us to leverage against corporations. As another saying goes, “corporations are golems before the law;” licenses are made of the same substance as corporations, and form effective weapons.

                                                                                                                      There are two implicit ideas in your post. I don’t fully understand them and I don’t want to put words in your mouth or construct strawmen, so instead I’ll have to be vague. The first idea starts with compensating folks for producing code. Since compensation requires a market of purchasers, and software can be copied at marginal cost, there is a natural misalignment of incentives: The producers want to be paid for the labor of production, which tends to increase; but the purchasers want to only pay for the cost of copying, which tend to decrease.

                                                                                                                      The second idea is that the politics of programming languages are important. On one hand, anybody can create a programming language. On the other hand, there are tendencies for big things to get bigger, including communities expressing themselves with popular languages. But on the gripping hand, every new language is built from the tools of old languages. Rather, it’s a question of which possible new languages we choose to build, and which lessons we choose to learn from the past, and those choices are political since people write expressive code in order to communicate with each other.

                                                                                                                      The answer to breaking the cycle of capitalism involves either democratizing ownership of the cloud hardware, or democratizing the development and maintenance of the cloud software. The only thing which keeps the capitalists in control is the ownership of property. Free Software is optional but hard to avoid if we want to do anything about it.

                                                                                                                      1. 3

                                                                                                                        The current focus on copyleft is because of the judo which licensing allows us to leverage against corporations. As another saying goes, “corporations are golems before the law;” licenses are made of the same substance as corporations, and form effective weapons.

                                                                                                                        True, but I think that this is a war that is being played on many more levels and IMO, as effective as a license can be, Free software is losing the war on all other fronts. One example is branding. “Open Source” is a much more popular term than “Free software” and surely big tech has helped make that happen.

                                                                                                                        My point is that big tech has already learned how to win against licenses and it’s through marketing and a myriad of other activities. The FSF from my perspective has no chance at beating that unless it becomes willing to rebuild itself from the ground up, and we’re seeing that that’s not the case.

                                                                                                                        The answer to breaking the cycle of capitalism involves either democratizing ownership of the cloud hardware, or democratizing the development and maintenance of the cloud software.

                                                                                                                        Democratizing the development of software IMO has little to do with capitalism nowadays, and a lot more with being competent at shaping up communities around “principled” software projects and by keeping software simple and clean, so that new generations can quickly ramp up and fight against bad software. I leave it to you to judge how GNU is doing in that regard.

                                                                                                                        1. 4

                                                                                                                          “Open Source” is a much more popular term than “Free software” and surely big tech has helped make that happen.

                                                                                                                          The change happened a little earlier, but it’s not surprising that corporations would endorse a corporate-friendly bastardization of a community-grown concept. That’s what it means to be exploitative.

                                                                                                                          My point is that big tech has already learned how to win against licenses and it’s through marketing and a myriad of other activities.

                                                                                                                          You have no evidence with which to support this assertion. I intentionally linked to my pile of evidence that corporations systematically avoid certain licenses, including licenses which cover popular software with ecosystems of users. As I have previously explained:

                                                                                                                          The goal is to enumerate those licenses which are well-known, as a matter of folklore and experience, to designate Free Software which corporations tend to avoid. None of the information that I linked in my answer is new information, but it is cited and sourced so that folks cannot deny the bulk of the thesis for lack of evidence.

                                                                                                                          The FAANGs are indebted to GNU/Linux, for example, and while they have made efforts to get rid of GNU userlands, they are not yet ready to get rid of Linux. As I said at the beginning of the thread, we asked corporations to use C, and they used C; they chose irrationally because they’re not actually capable of technical evaluations, and this shackled them to our kernel of choice.

                                                                                                                          Democratizing the development of software IMO has little to do with capitalism nowadays…

                                                                                                                          This will have to be where we agree to disagree. You have pointed out twice, in your own words, that capitalism matters to producing software. First, you noted that the current cycle is driven by venture capitalists; these are the same sorts of capitalists that, a century ago, were funding colonial projects and having political cartoons made about them. Second, you surely would admit that it’s not possible to develop software without development hardware, which forms operating capital; whoever owns the computers has great control over software developers.

                                                                                                                      2. 6

                                                                                                                        There are other capitalistic software endeavors that are considerably more gentle than the current VC insanity. For example the SQLite folks. I’m not even sure it’s incorporated. But dr. Hipp is definitely doing it for the money.

                                                                                                                        Edit: looked it up, SQLite is incorporated as a limited partnership of a small number of people. Corporate contribution results in a very well defined boundary of “what you get”. The “funny” thing about SQLite is that it’s unlicensed. SQLite is In the public domain.

                                                                                                                        1. 1

                                                                                                                          But dr. Hipp is definitely doing it for the money

                                                                                                                          Eh this is very simplistic, and I’m not sure how you can be so certain that you know the motives of others. Do you know him?

                                                                                                                          If you look around, you can find various origin stories behind sqlite, which shed some light on the matter. Any project, and particularly large/long-lived ones, are going to have a mix of motivations, and they can change over time. Money could be one reason but it’s certainly not the whole story.

                                                                                                                          1. 2

                                                                                                                            Dr. Hipp has said so himself in some of his lectures. So, sure, he could be lying or saying it for dramatic effect, but I’m going to take him at his word, and I find it hard to believe that “making money” is zero percent of his incentive for doing it. That money wasn’t the only motivation is exactly my point. Just that “making money” does not automatically taint a project, in fact in many cases it’s a good signal that you are at least building something that someone wants. We are just living in times where other societal superstructures make it so that the type of capitalism that Loris is talking about is favored. My personal take is that it’s ironic that some of the factors that brought about what we have now were concieved of specifically to restrict or “strategically guide” capitalism, and have either spectacularly backfired or had some gnarly unintended, but perfectly predictable, if you were listening to the right people, consequences.

                                                                                                                        2. 2

                                                                                                                          My theory is that saying the problem is “profit motive” is almost right - the fundamental problem is trying to sell anything other than “what the user wants”, and receiving money from anywhere except directly from the user.

                                                                                                                          For instance, the “try free” button mentioned in the article is usually from someone trying to fund software development with cloud-services revenue. Cloud services revenue is not the software (or rather, it’s the software plus some other stuff), so they need to maintain the not-software that is not necessarily what the users need, and that distracts and gets in the way.

                                                                                                                          Ads/tracking, open core, all fall into the fundamental problem of prioritizing not-software over software.

                                                                                                                          So basically, I’m saying the future is patreon or liberapay or a libre app store.

                                                                                                                          There are two main ways we can make this happen:

                                                                                                                          1. We make paying for Free Software more convenient. There’s a lot of low-hanging fruit here. For instance, open up F-Droid on your phone, and look for an app called Phonograph. It’s GPL3, and offers a paid version ($5) called Phonograph Pro. P Pro is available from the github (if you compile it yourself) or the Google Play store, but not from F-Droid. F-Droid doesn’t support purchasing Free Software nor conditionally-available binaries, see. Selling Free Software is about selling convenience, so we damn well better make it convenient to buy Free Software. But more than that, it’s hardto figure out who or where to give money or even if it’s possible. I like Mesa, if I want to give them money I should be able to do so before the random impulse wears off.

                                                                                                                          And to go even further, if we’re ambitious, in the long term we should try to handle identity and payment on the desktop (which come to think of it is too long for this paragraph or post, I’ll gladly elaborate though) so as to make it easier in the long term for people to pay. 2. We should foster an attitude of “if you like it, put money towards it. anything.” Because IIRC, currently only 0.01%ish of users donate money. That is insanely low.

                                                                                                                          This is super weird and tightropey, since freedoms aren’t supposed to be conditional and realistically Free Software is fundamentally tied to voluntarism, and we really don’t want to make room for people to justify proprietary software by saying “well you ought to be paying anyway, and as long as you’re paying you’re not losing anything anyway”.

                                                                                                                          So, we need people to voluntarily pay within an order of magnitude or two of what the proprietary alternatives receive. I don’t see how anyone can sustainably compete on quality with Google, unless their revenue is at least 1% of Google’s. I just don’t see a primarily volunteer-programmer project ever scaling that high.

                                                                                                                          1. 1

                                                                                                                            Yeah, I’m with you there. I’m searching for an alternative path as well for https://arbor.chat. It takes money to grow your software, but there has to be a better model for funding than the traditional one. We’re thinking we might establish a nonprofit that accepts donations, but also provides a hosted set of infrastructure with a sourcehut-style subscription. I’d love to talk more about this kind of thing with anyone who is interested.

                                                                                                                            1. 2

                                                                                                                              As both an owner in a free software small business and also a small-time investor with a software freedom bent, I’m very interested in these kinds of topics and more collaboration between the people/projects/companies trying to find the way.

                                                                                                                              1. 1

                                                                                                                                I found this to be an interesting approach: https://squidfunk.github.io/mkdocs-material/insiders/

                                                                                                                                It seems like it’s working for them. For a theme, it seems to have quite a bit of financial support.

                                                                                                                                1. 1

                                                                                                                                  Thanks for sharing that! I’m not yet sure how I feel about the approach taken, but it’s certainly a very interesting data point.

                                                                                                                            2. 7

                                                                                                                              We could double down on the concept of copyleft. Treating corporations as people has led to working-for-hire, a systematic separation of artists from their work. We could not just extend the Four Freedoms to corporations, but also to artists, by insisting that corporations are obligated to publish their code; they could stop publishing their code only if they stopped taking that code from artists and claiming it as their own. A basic transitional version of this is implemented by using licenses like AGPLv3, and it does repel some corporations already.

                                                                                                                              Doubling-down on the concept of copyleft is basically the agenda of the free software movement, which is the thing that @kristoff states is a “disaster on too many fronts and its leadership has failed so badly that I don’t even want to waste words discussing it”. I don’t think it’s obvious that the free software movement has failed - certainly not so obvious that it’s not worth words discussing it. But certainly it’s the case that lots of software is not published under copyleft licenses, some free and some non-free. If the free software movement is a failure so long as anyone at all is publishing non-free software or even free but non-copyleft software, then sure, it’s a failure so far; but that seems like an awfully stringent requirement for success.

                                                                                                                              We could require decentralization. Code is currently delivered in a centralized fashion, with a community server which copies code to anybody, for free. This benefits corporations because of the asymmetric nature of their exploitation; taking code for free is their ideal way of taking code. Systems like Bittorrent can partially remedy the asymmetry by requiring takers to also be sharers.

                                                                                                                              We already have this. Redis is a BSD-licensed piece of free software whose source code is publicly-available here on GitHub. Anyone can legally fork this and redistribute it, without asking anyone’s permission and without even doing all that much work. If GitHub deplatforms the project for any reason, it’s very easy to set up alternative git hosting on some other service. If someone really doesn’t like the fact that the official redis website has too big of a try free button, nothing is stopping them from setting up a website for their own fork of redis that doesn’t have that button.

                                                                                                                              We could work to make languages easier to decompile and analyze. This partially means runtime code auditing, but it also means using structured language primitives and removing special cases. In the future, corporations could be forced to choose between using unpleasant eldritch old languages or using languages that automatically publish most of their API and logic in an auditable and verifiable manner to any requestor. And they’ll do anything we recommend, as long as it comes with an ecosystem; look at how many folks have actively embraced C, PHP, JS, etc. over the years.

                                                                                                                              A lot of organizations using unpleasant eldrich old languages are stable and stodgy ones that have been around for decades, and aren’t necessarily even for-profit corporations. MUMPS is primarily used by hospitals, and COBOL has plenty of use in banks and government bureaucracies. A lot of the reason for this is that these organizations have software requirements that don’t change very much, and have made the trade-off that having a software stack that few people understand is better than updating that software stack and risking introducing bugs. Corporations that haven’t gotten big and institutional yet have more incentives to use newer technology stacks - and if they refuse to anyway and that choice contributes to the company failing in the marketplace, whatever, it’s just one more of many failed companies.

                                                                                                                              1. 1

                                                                                                                                We already have [decentralization]. Redis is a BSD-licensed piece of free software whose source code is publicly-available here on GitHub. Anyone can legally fork this and redistribute it, without asking anyone’s permission and without even doing all that much work. If GitHub deplatforms the project for any reason, it’s very easy to set up alternative git hosting on some other service.

                                                                                                                                This is the “we have food at home” fallacy. To use words more carefully: GitHub is the “community server” from which “code is currently delivered in a centralized fashion”. You are saying that if one point of centralization vanishes, then the community can establish another. Yes, but it takes time and effort, and the community is diminished in the meantime; removing those centralized points is damage to the communities.

                                                                                                                                A properly-decentralized code-delivery service would not be so fragile. It would not have any Mallory who could prevent a developer from obtaining code, save for those folks in control of the network topology. (A corollary is that network topologies should be redundantly connected and thickly meshed, with many paths, to minimize the number of natural Mallory candidates.) Any developer who wanted to use a certain library would only need to know a cryptographic handle in order to materialize the code.

                                                                                                                                Note that these services would only work as long as a majority of participants continue to share-alike all code. So corporations have a dilemma: Do they join in the ecosystem and contribute proportional resources to maintaining the service while gaining no control over it, or do they avoid the ecosystem and lose out on using any code which relies upon it? Of course they could try to cheat the network, but cryptography is a harsh mistress and end-to-end-encrypted messages are black boxes.

                                                                                                                                1. 3

                                                                                                                                  This is the “we have food at home” fallacy. To use words more carefully: GitHub is the “community server” from which “code is currently delivered in a centralized fashion”. You are saying that if one point of centralization vanishes, then the community can establish another. Yes, but it takes time and effort, and the community is diminished in the meantime; removing those centralized points is damage to the communities.

                                                                                                                                  GitHub isn’t the community server. There is no the community. Lots of separate open-source projects with their own communities exist, and they can individually choose to host the authoritative version of their code on whatever git platform they want, whether that’s GitHub, Gitlab, Gitea, the ssh-based hosting built into git, or some other option.

                                                                                                                                  I agree that if a given open-source project deliberately chooses to host their code and issues and documentation and so on on GitHub, rather than on a platform that they have control over, they are vulnerable to community disruption and damage if GitHub decides to stop serving them. And insofar as GitHub is popular, lots of projects exist that are making this choice. I agree that this is a bad idea, and that these projects shouldn’t do this. Personally, I no longer host my own open-source code on GitHub, and I only interact with it in order to contribute to projects that do use it.

                                                                                                                                  But individually getting a lot of separate organizations to switch away from a useful-but-nonfree software platform to a free one that maybe doesn’t have as much UI polish as the nonfree choice is a hard collective action problem (it’s actually pretty much the same problem as getting people to switch from Mac OS or Windows to Linux on their desktop computers). You can’t compel large numbers of people to value freedom from GitHub’s disruptive product choices over the value they currently get from GitHub. You can’t compel a bunch of different people to do the work to switch off GitHub all at once.

                                                                                                                                  A properly-decentralized code-delivery service would not be so fragile. It would not have any Mallory who could prevent a developer from obtaining code, save for those folks in control of the network topology. (A corollary is that network topologies should be redundantly connected and thickly meshed, with many paths, to minimize the number of natural Mallory candidates.) Any developer who wanted to use a certain library would only need to know a cryptographic handle in order to materialize the code.

                                                                                                                                  Radicle is a great idea, I’m a fan. If some project currently using GitHub as their authoritative git repo decided to switch to Radicle and abandon their GitHub-based infrastructure, I think that would be great.

                                                                                                                              2. 3

                                                                                                                                One route that has been under-explored is to pay for software distribution.

                                                                                                                                On some level software has the same issue as music, copying it is super easy. It doesn’t matter if the source is open or not if the distribution is made convenient enough that people are willing to pay for it.

                                                                                                                                1. 2

                                                                                                                                  I like to call this model “libre-non-gratis” and there have been a small but strong set of examples over the years. Conversations (android app) is one currently active example

                                                                                                                              1. 3

                                                                                                                                Related: Problems With the test Builtin: What Does -a Mean?

                                                                                                                                The POSIX spec did indeed make things cleaner; the last section quotes it and gives some style advice.

                                                                                                                                1. 4

                                                                                                                                  zsh has also long tried this, but the main issues is that people still want to write bash scripts ;-)

                                                                                                                                  1. 2

                                                                                                                                    I think the osh/oil split is a good (less cumbersome?) way to try to manage this need.

                                                                                                                                    1. 1

                                                                                                                                      Does it still work now? How do I opt-in to this in zsh?

                                                                                                                                      1. 5

                                                                                                                                        I believe it is even on by default! setopt NO_SH_WORD_SPLIT

                                                                                                                                        the oil @split function is written $=variable

                                                                                                                                        argv(){print ${(qq)@};}

                                                                                                                                        1. 1

                                                                                                                                          Thank you.

                                                                                                                                      2. 1

                                                                                                                                        The HN thread has some detail on that: https://news.ycombinator.com/item?id=26686201

                                                                                                                                        zsh doesn’t split, but it still omits empty strings, unless you opt out of that.