1. 3

    If the string is smaller than your chunk size, you can still do chunk-at-a-time processing by doing aligned loads and masking off the excess data. Loading out of bounds memory only crashes if you actually cross a page boundary and hit an unmapped page. Page size and load sizes are pow2, so aligned loads never cross page boundaries.

    Naturally, there is no way to express this in system programming languages since the UB police arrived, but compilers seem to leave you alone if you use intrinsics.

    1. 2

      You could keep valgrind happy by rounding your allocations up and memset()ing the last few bytes to 0.

      1. 2

        Note that this depends on your architecture and is not true on CHERI (or any other environment that gives byte-granularity memory safety). We had to disable the transforms in LLVM that assumed this was possible.

        It’s also undefined behaviour in C to read past the end of an object, so you need to be quite careful that your compiler doesn’t do unexpected things.

      1. 4

        It’s odd that this doesn’t mention crypto.getRandomValues(new Uint8Array(32)) as a way to get a seed.

        Otherwise very nice and I like that it explicitly explains RNGs as Moore machines. :)

        1. 1

          Thanks for spotting this. I originally had some rules for myself that restricted any crypto functions that provide entropy (as well as Math.random) so I could talk about entropy in general + browser entropy sources.

          This got lost in the editing process and I’ve added a note 👍

          1. 2

            Thanks. I will admit that using getRandomValues in a blog post titled “Without Math.random” might feel like cheating… but I think the stuff about how PRNGs can be implemented was the most interesting portion anyway. <3

        1. 2

          Is there a fork which deliberately ignores SI’s decision to pretend that “cycle” is unitless, makes 1 radian = 1/2pi cycles, and makes Hz = 1/cycle? :)

          Edit: a very small thing, but I like the title. Whimsical and amusing without being clickbaity.

          1. 7

            I think this starts well and speaks to some real problems, but I’m less than convinced by the conclusion. Usernames that can be typed are useful, and in several ways. I’ve attempted to sort these examples in ascending order of elementarity.

            • They can be roundtripped through clipboards and email/notepad/etc apps
            • Legacy software understands them
            • They can be processed by generic tools
            • They’re compatible with assistive technology
            • They can be roundtripped through physical writing
            • Humans can remember and copy them

            I’m not sure where to put it, but a point that’s relevant to me is that it’s much faster for me to type things that don’t involve roundtrips. There are a few reasons for that, and some of them—like the fact that most interfaces are pretty laggy—are eminently fixable, but one feels fundamental: I’m usually typing a few words ahead of what I’m reading back, if I’m reading back at all. Using an interactive username selector, even a hypothetical perfect one, breaks the pipeline, so to speak.

            1. 7

              The way Discord (and Blizzard) address both use cases is mentioned offhand in the article. IMO it’s really nice. You pick a friendly username (e.g. “Richard”). This does not have to be unique. The system automatically generates a few decimal digits to make it unique (e.g. “Richard#123456”). The sequence of digits is usually 4 digits long, but can be longer if a given friendly username is common.

              In “local” contexts like a chat channel, the friendly username is almost always unique so you can identify me by just that.

              In “global” contexts like adding a friend, you’re asked for the full globally-unique handle with the digits too.

              1. 9

                The Blizzard system is also great because other users’ numbers are always hidden, and you can’t send someone a friend request without either playing a game with them or knowing their secret number. That makes it a really nice spam prevention tool as well – you have this secret part of your identity that is both a disambiguator for your username but also a small secret that lets you be selective about who gets to send you messages. I’m a huge fan of the system and miss it when I play other social games!

                1. 1

                  I forgot that aspect. Yeah, that’s nice.

                2. 2

                  It’s good that the globally-unique handles are human- and print-friendly, but a little disappointing to me that Discord’s UI for mentioning people is still roundtrippy.

                  1. 2

                    Oof yeah I got this part not right - Discord’s UI for mentions doesn’t actually work quite as nicely with local name uniqueness like I was thinking.

                    I believe that in principle it could but it isn’t quite that nice.

              1. 1

                I wonder if you could get a few bits improvement in the error bounds by using a fused multiply add to get more precision at the range reduction step?

                1. 25

                  A few things that were missed: POSIX / the Single Unix Specification require vi. That’s a big part of why it’s ubiquitous, every *NIX system that aims for POSIX compliance needs some vi clone. *BSD shipped nvi, because it’s small and they have a separation of base system and third-party things, so people who want a richer editor can install one (including vim) and only pay a tiny cost for nvi being installed. Linux distros picked vim. I have no idea why this was (early Linux distro creators liked it? They assumed everyone would want a richer vi clone?), but once it became the standard Linux vi, everyone learned it. I was hoping that the article would give some insight into this.

                  The first shell and editor that you use seem very sticky. I still use bash and vim on pretty much every system, even when they’re not the default on a system (neither is on FreeBSD). The SSH factor is still the main thing keeping me in vim. If the VS Code remote extension weren’t so tied to x86/Linux, I’d probably use that, but since it doesn’t work with ARM, MIPS, FreeBSD, and so on, I can’t rely on it everywhere and so I don’t want to have to switch between tools.

                  1. 5

                    I also find this post very sparse on information. For example, it doesn’t really make any sense of the path of punch cards to line editors. Line editors existed because early terminals were not “glass terminals” as we have them now, but hard copy terminals in which you interacted with a printout, one line at a time appearing as you typed in the commands, at which point it makes sense you don’t have a full-screen editor; there wasn’t even a screen.

                    1. 3

                      Another reason was that other options (such as Emacs) used to cost a lot of money and effort to get installed, because there were no simple package-distribution mechanisms as we now have.

                      1. 12

                        And if you go even further back, to the days when it was really just “vi” instead of “vim”, Emacs was FAR more resource-intensive, which mattered a lot on shell hosts you were sharing with a bunch of other users. When I was in college, we joked it stood for Eight Megs And Constantly Swapping. Obviously not an issue at all these days but that was one of the things that steered me away from Emacs when I was first learning to use UNIX.

                        1. 5

                          When I studied at KTH in Sweden in the early 90s the Sun dickdiskless workstations we used had 2 windows open when an X session started: a shell (I believe it might have been tcsh but this was before I actually cared about these things) and an Emacs session. All documentation regarding how to edit text assumed Emacs.

                          I guess my Emacs usage started there. I played with Emacs-like editors on the 386 I had at time (running Windows 3.1) just to keep the keybindings.

                          I know enough vi to be able to edit a local apt-source list to install Emacs ;)

                          1. 1

                            That probably confirms the point David was trying to make: defaults stick with us.

                            Lovely “typo” and how you left it in btw :)

                          2. 2

                            I was actually thinking of the pre-vim times, AFAIK it as already possible to distribute the Emacs source via FTP when vim started to become popular.

                            1. 1

                              emacs - eight megs and constantly swapping

                              1. 3

                                My first PC had 2 megs so I couldn’t run Emacs on it - nor install Debian, for that matter.

                          3. 2

                            *BSD shipped nvi, because it’s small and they have a separation of base system and third-party things, so people who want a richer editor can install one (including vim) and only pay a tiny cost for nvi being installed.

                            One other minor point - *BSD ships with nvi because the original vi (although originall Berkeley-developed) contained encumbered AT&T copyrighted source code. nvi is a re-implementation of vi by Keith Bostic, based partially on elvis, and included initially with 4BSD.

                            1. 2

                              The first shell and editor that you use seem very sticky. I still use bash and vim on pretty much every system, even when they’re not the default on a system (neither is on FreeBSD).

                              I’d imagine you’re half-right about this. If only vi is available on some system, many choose to install vim. But I think it is less common even to notice that the shell is not bash.

                              1. 4

                                The first thing i do after installing Void linux is changing from dash to bash and vi to kakoune.

                                Dash is just unusable as a day to day interactive shell. It is probably not meant to.

                                After learning kakoune i can’t stand using any type of vi.

                                1. 3

                                  How many years did you use vi?

                                  I’ve tried to use kakoune but there are many vi(m)-isms which stop me. It’s not the plugins; I use like 2.

                                  For me, vim-sneak is the biggest one for movement. Can kakoune do similar movement?

                                  1. 3

                                    I don’t know. I’ve been using Linux since Mandrake Linux almost 18 years ago and always used vim because it was the default. I just recently started programming so i never used vim intensely in the past, it was always simple tasks so i never acquired too much knowledge in it. Just wrote some bash scripts.

                                    When i tried kakoune it simply felt more natural for me and the instant feedback and multiple cursors is wonderful. It’s when i started learning programming.

                                    I don’t know about vim-sneak. Looking briefly at the website, it seems that you perform a search and it highlights the results and you can jump around? If i search for “as” i can jump to the third instance by typing ‘3n’. If i hold shift it selects all the instances in the way. So 3 + shift + n will select the next 3 instances.

                                    1. 2

                                      I used Vim daily for like twenty years before trying Kakoune, and after a couple of months (which, I admit, were a bit awkward and frustrating) I can’t go back.

                                      I don’t know of any equivalent to vim-sneak (although there is an easymotion plugin) but because of the way Kakoune deals with selections instead of cursors movement has a very different feel — it’s not just about moving the cursor to the right place, but also moving the anchor (the other end of the selection) to the right place at the same time. When I started out, it felt a little bit like playing that Snake game on old Nokia phones, always being conscious of where your tail was.

                                      My suspicion is that a Kakoune equivalent of vim-sneak wouldn’t be nearly as useful as it is in Vim. On the other hand, it’s pretty easy to build Kakoune plugins, and that might be a decent tutorial project.

                                    2. 2

                                      As far as I remember the ksh in OpenBSD was surprisingly usable as a daily driver despite having only like 3 features: line editing, tab completion of command names and tab completion of file names.

                                      1. 0

                                        This is a tangent but for anyone with a passing knowledge of Greek or French etymology “kakoune” is an unfortunate name.

                                        1. 4

                                          Here, broaden your linguistic horizons a bit.

                                          1. 1

                                            Interesting. Is “{en|in}close” specific to how the editor works? I thought it was a modal editor with alternative verb/noun nomenclature than vim.

                                            As much as I’d like to learn Japanese, learning French was hard enough, and I can’t unsee kakoune as a (ungrammatical) gloss on “the shitty one”.

                                            1. 5

                                              Kakoune’s major innovation relative to vi(m) is specifying the object before the action, which also neatly enables (SublimeText-style) multi-select. You enclose some text and then operate on it. Japanese puts nouns before verbs too.

                                              But on the shell, it’s usually just called kak. So, yeah, kind of a cuss… kek!

                                          2. 2

                                            My understanding is that the primary author of Kakoune is from French-speaking New Caledonia, where the word means “a hard punch”. I don’t know if that’s a meaning borrowed from some Polynesian language, or a novel invention, but I’m sure the author was aware of the variant reading you’re thinking of.

                                        2. 1

                                          I think that depends on what the other shell is. OS X used to ship with tcsh as the default (and the rest of the BSD family still does). Installing bash was one of the first things I did on a Mac and apparently this behaviour was so common that Apple switched to bash as the default shell some time around 10.3ish. Solaris, last time I used it, shipped with an incredibly minimal POSIX shell by default, so bash was one of the first things I installed. If I use a system where the default shell is zsh, for example, I typically don’t care.

                                        3. 1

                                          Linux distros picked vim.

                                          Not all of them! My beloved Slackware defaults vi to elvis and vim is a separate command. I actually kinda like elvis - it has a kind of wysiwyg html mode that is somewhat cool among other little thing. Lightweight too.

                                          Though vim i my main editor for bigger jobs.

                                        1. 7

                                          I agree with Drew’s general sentiment here, but note that linkers can only remove code at the function level, and the number of functions a module uses is not a great indicator of the amount of underlying code.

                                          As an example, I maintain a small C runtime library. printf() is a common function for programs to use. But since linking is at the function level, there’s no way to remove code such as format specifiers that the program is not using. Since it doesn’t know what the output device is, code for all potential output devices needs to be included. Since my C runtime runs on Windows, that means character encoding support for UTF-16, UTF-8 and others, as well as VT processing code, including low level console calls.

                                          I’d expect the same general effect to be present in other libraries, including UI libraries. Even if the program knows it’s not going to perform certain operations on a window, the library is going to create an entire window with all of the data structures to support those operations. Things like C++ are particularly evil because once an object with virtual function pointers is loaded, the compiler is going to resolve those function pointers and all of their dependencies whether they are ever called or not.

                                          At $WORK this drives me crazy, because we have common static libraries that, when used, can add 300Kb-3Mb of code into a program, even if one or two functions are used.

                                          1. 9

                                            You have a good point. The library’s interface basically needs to be designed from the beginning for dead code elimination. One thing I like about newer languages like Rust and Zig, with their powerful compile-time metaprogramming features, is that you can often do this kind of design without sacrificing developer convenience. I suppose the same is true of modern C++ as well. The reason why printf is such a perfect counter-example is that C doesn’t have the language features to allow the developer convenience of printf without sacrificing dead code elimination.

                                            This reminds me of the last time I played with wxWidgets. A statically linked wxWidgets hello-world program on Windows was about 2.5 MB. I didn’t dig very deeply into this, but it seems that at least part of the problem is that wx’s window procedure automatically supports all kinds of features, such as printing and drag-and-drop, regardless of whether you use them. I suppose a toolkit designed for small statically linked executables would require the application developer to explicitly enable support for these things. And the window procedure, instead of having a giant switch statement, would do something like looking up the message ID in a map and dispatching to a callback. So when an application enabled support for, say, drag and drop, the necessary callbacks would be added to that map.

                                            1. 2

                                              Rust’s formatting machinery isn’t very easy to do DCE on either. https://jamesmunns.com/blog/fmt-unreasonably-expensive/

                                              The formatting machinery has to make the unfortunate call of either heavy monomorphization or heavy dynamic dispatch. If your executable is going to inevitably makes lots of calls to the formatter, the dynamic dispatch approach will result in less code duplication, but it makes it harder to do dead code elimination…

                                              1. 1

                                                Tangentially, it is very noticeable in the JS ecosystem that some libs have a lot of effort put into making tree shakers succeed at eliminating their code. By default, not so much.

                                              2. 3

                                                I agree with Drew’s general sentiment here, but note that linkers can only remove code at the function level, and the number of functions a module uses is not a great indicator of the amount of underlying code.

                                                I don’t think that’s the case if you compile with ‘-flto’. I’d assume the code generator is free to inline calls and remove things that can be stripped at the call site.

                                                1. 2

                                                  BTW, ‘-flto’ is one of the great reasons to use static linking. It can turn suboptimal APIs (those using enum values for setters/getters, like glGet()) into something decent by removing the jump tables.

                                                  1. 1

                                                    Totally agree that link time code generation is a huge improvement in terms of the amount of dead code elimination that can occur. But at the same time, note the real limitations: it can inline getters and setters, and strip code out from a function call with a constant argument of a primitive data type, but can it strip code from printf? What happens with virtual function pointers - is it going to rearrange in memory structures when it notices particular members are never accessed? The real challenge linking has is the moment it hits a condition it can’t resolve with certainty, then all of the dependencies of that code get brought in.

                                                    Put another way, instead of looking at what the linker can and can’t do, look at what actually happens. How large is a statically linked hello world program with Qt? Gtk? wxWidgets? Today, it’s probably fair to ask about a statically linked electron program, which won’t strip anything because the compiler can’t see which branches that dynamically loaded HTML or JS are going to use. What would get really interesting is to use a coverage build and measure the fraction of code that actually executes, and I’ll bet with conventional UI toolkits that number is below 10%.

                                                    It really looks to me that the size and complexity of code is increasing faster that the linker’s ability to discard the code, which is the real reason all systems today are using dynamic linking. Drew’s points about the costs are legitimate, but we ended up dynamically linking everything because in practice static linking results in a lot of dead code.

                                                    1. 2

                                                      Well, printf() is one of those bad APIs that postpone to runtime what could be determined at edit or compile time. But what’s the overhead of printf() in something like musl?

                                                      $ size a.out
                                                         text	   data	    bss	    dec	    hex	filename
                                                        14755	    332	   1628	  16715	   414b	a.out
                                                      

                                                      I think I can afford printf() and its dependencies being statically-linked.

                                                      1. 1

                                                        Can you afford it with UI libraries? Printf is an example of what can happen - it’s not the only case.

                                                        1. 3

                                                          Many GUI programs out there bundle a private copy of QT (or even chrome, via electron). Because they do it as a .so, theydo it without dead code elimination.

                                                          And as we tend towards snaps and flatpacks for packaging open source applications, the practice is spreading through the open source application world.

                                                          So, empirically, it seems like we decided we could afford it. Static linking just makes it cheaper.

                                                2. 1

                                                  It’s true that linking to some symbols can have an outsized effect on dead code elimination, stdio being the (in)famous case, but on the whole this is the exception rather than the rule.

                                                1. 18

                                                  Worth reading to the end just for the totally evil code snippet.

                                                  It was kind of foreshadowed to be evil when the author named it “skynet.c” I guess.

                                                  1. 4

                                                    Reminds me of the Java-code we used to see around 2000.

                                                    With a RuntimeException try-catch at the top and then just print it and continue like nothing happened.

                                                    How much bad bugs, data corruption and weirdness did that practice cause?

                                                    1. 1

                                                      How is that any different from kubernetes and “just restart it”? Its mostly the same practice ultimately, though with a bit more cleanup between failures.

                                                      1. 2

                                                        I guess it depends on whether you keep any app state in memory. If you’re just funnelling data to a database maybe not much difference.

                                                    2. 2

                                                      Now I start to wonder, how the correct code should look like (as opposed of jumping 10 bytes ahead).

                                                      Read DWARF to figure out next instruction?

                                                      Embed a decompiler to decode the faulty opcode length?

                                                      1. 4

                                                        Increment the instruction pointer until you end up at a valid instruction (i.e., you don’t get SIGILL), of course ;)

                                                        1. 6

                                                          I have code that does this by catching SIGILL too and bumping the instruction pointer along in response to that. https://github.com/RichardBarrell/snippets/blob/master/no_crash_kthxbai.c

                                                          1. 2

                                                            Brilliant. I’m simultaneously horrified and amused.

                                                          2. 1

                                                            SIGILL

                                                            That’d be a pretty great nerdcore MC name.

                                                          3. 1

                                                            If you want to skip the offending instruction, à la Visual Basics “on error resume next”, you determine instruction length by looking at the code and then increment by that.

                                                            Figuring out the length requires understanding all the valid instruction formats for your CPU architecture. For some it’s almost trivial, say AVR has 16 bit instructions with very few exceptions for stuff like absolute call. For others, like x86, you need to have a fair bit of logic.

                                                            I am aware that the “just increment by 1” below are intended as a joke. However I still think it’s instructive to say that incrementing blindly might lead you to start decoding at some point in the middle of an instruction. This might still be a valid instruction, especially for dense instruction set encodings. In fact, jumping into the middle of operands was sometimes used on early microcomputers to achieve compact code.

                                                          4. 2

                                                            Here’s a more correct approach: https://git.saucisseroyale.cc/emersion/c-safe

                                                            1. 1

                                                              Just don’t compile it with -pg :)

                                                            1. 10

                                                              It should also be mentioned that the .io TLD’s nameservers themselves are poorly run and fall off the internet way more often that most TLDs. I believe offhand that if you want to offer a four-ish-nines SLA, using .io for your domain name would have by itself blown your downtime budget over the last few years. e.g. https://hackernoon.com/stop-using-io-domain-names-for-production-traffic-b6aa17eeac20

                                                              1. 3

                                                                Although if you’re in tech I would highly advise mentioning that you need more money if there’s an open office in your next interview (somewhere around 20% as that’s what the estimates say they’re saving in the short term).

                                                                I really like the sound of this (surprise!), but I don’t see how you could make a plausible argument for it. The employer is using an open office to save costs, so paying you more negates the benefit to the employer. Why would they do that? Particularly as open offices are now the norm (at least in my experience) you’re not likely to have a BATNA on this front.

                                                                1. 2

                                                                  The argument is: “Cost-saving to the detriment of your employee’s is not reasonable, so you can be cost neutral and I will suffer through, or you can deal me out with some form of shared office or personal space”.

                                                                  Basically, why should the employer unilaterally save on costs? they’re already trying to get you for as cheap as they can, this is a negotiation tactic.

                                                                  I know it costs them roughly 20%, so you can pass on the savings to me or I will pass on your “opportunity”. We have a lot of power in tech right now, we should absolutely be using this to improve working conditions, especially as that will actually make some of us more productive.

                                                                  1. 2

                                                                    So the argument depends on the employer thinking you’re >20% more valuable than the next applicant. I could see that working in some markets. It seems more plausible to me if I were already an employee and the office layout was changed to open plan.

                                                                    Is this a common attitude to negotiations in the US? It seems uncomfortably confrontational to me. I wonder if it is a cultural thing. Or I am terrible at negotiating and am imposter-syndroming my way out of non-trivial chunks of cash.

                                                                    1. 1

                                                                      I’m not from the US, so I wouldn’t know if it’s a common attitude or not.

                                                                      I understand why you’d think it’s confrontational. And it can be if it’s coming to blows.

                                                                      I’m just trying to convey that they should not think that they can get away with ‘saving’ on you, if they’re willing to pay upwards of $3k for a device that makes you more productive, but put you in a meatgrinder to save a buck it’s a little bit hypocritical; the point you’d be making is that there is an added cost for them, it’s not just all savings.

                                                                      1. 2

                                                                        if they’re willing to pay upwards of $3k for a device that makes you more productive, but put you in a meatgrinder to save a buck it’s a little bit hypocritical

                                                                        If you want to be overly charitable: some people can’t get their head around the fact that open-plan offices are not a net cost savings. Maybe they just never had anyone point this out to them, or maybe it just bounced off their brain. Your bringing this up in negotiation is making them aware of a cost (to themselves) that they’d previously incorrectly assumed was zero.

                                                                        1. 1

                                                                          I’m not from the US, so I wouldn’t know if it’s a common attitude or not.

                                                                          Sorry for the assumption.

                                                                  1. 2

                                                                    The £ cost sounds negligible when compared to the time cost. £25/mo for something that you spend 28hr/week on is a rounding error.

                                                                    I’d say when you’re spending that much time on something, any way in which you can throw money at it to improve the impact is a no brainer.

                                                                    1. 23

                                                                      10 selling points, no downside, that’s fishy. No mention of the lack of static typing, for example. No mention of Kotlin or Scala. That sounds like someone who’s a bit too enthusiastic to acknowledge the shortcomings that any tool has.

                                                                      1. 11

                                                                        The lack of static typing as a downside is, and will always be, subjective. As Rich Hickey famously says, “What’s true of every bug? 1) it passed your unit tests 2) it passed the type checker.”

                                                                        Spec, as a structural testing mechanism, is I believe generally understood to be a preferable alternative to static typing, not an afterthought bolted on to appease devs that demand static typing.

                                                                        1. 10

                                                                          I don’t understand what the quote is trying to say. “Every bug” is still fewer bugs if type bugs are ruled out by a type checker.

                                                                          1. 3

                                                                            My reading is that static typing isn’t a silver bullet. That’s an over simplification. See the video below for a bit more context surrounding that particular quote, and maybe some rationale behind clojure’s approach to typing and the role that Spec plays.

                                                                            https://www.infoq.com/presentations/Simple-Made-Easy/

                                                                          2. 5

                                                                            It’s amusing to me that people accept this as a critique of static typing in general rather than a critique of certain hilariously-bad type systems like that of C or Java.

                                                                            I am trying to think of the last time I encountered a bug in a piece of software written in a language with a not-terrible static type system, but … I can’t think of one.

                                                                            1. 4

                                                                              Awesome. I’ll take your word for it. What is a language with a not-terrible type system? Haskell?

                                                                              1. 4

                                                                                Right.

                                                                                I mean obviously part of the reason I haven’t run into many bugs is that there just aren’t as many programs written using good type systems. Programs written in Haskell or OCaml or Elm still have bugs, but they only have certain classes of bugs. Just because you can’t get rid of every bug doesn’t mean it’s pointless to get rid of the ones you can, and Rich’s punchy quote seems to imply that this line of reasoning is invalid.

                                                                                1. 2

                                                                                  I see what you’re saying. And I agree that, on the surface, that quote seems like it’s dumping on static typing entirely, but I don’t think that’s the case. In the talk in which Hickey drops that quote he expands a on it a bit, to a degree that my simply slapping it into a response doesn’t do justice. You’re the Leiningen guy, you know Clojure, you’re presumably familiar with the talk.

                                                                                  My takeaway was that static typing, like unit tests, catches bugs. Certain classes of bugs (as you mention above). However, in some complex and complicated systems, those classes of bugs aren’t the primary concern. I’ve never worked in that kind of system, but I like what I’ve seen of clojure and I haven’t found this particular line of reasoning disagreeable.

                                                                                  1. 4

                                                                                    You’re the Leiningen guy, you know Clojure, you’re presumably familiar with the talk.

                                                                                    Yep, I’ve seen the talk. It feels to me like he already decided he doesn’t like static typing because of bad experiences with Java and uses that experience to make straw-man arguments against all static type systems in general. I can imagine the existence of a system for which “bugs that no type system can catch” are the primary concern, but I’ve never encountered such a system myself. (I have worked with plenty of systems where the cost of the type system is higher than the benefit of squashing those bugs, but that’s a very different argument than Rich’s.)

                                                                                    1. 3

                                                                                      Yep, I’ve seen the talk. It feels to me like he already decided he doesn’t like static typing because of bad experiences with Java and uses that experience to make straw-man arguments against all static type systems in general

                                                                                      There seems be at least some evidence that he knows Haskell pretty well, he just doesn’t really publicize it.

                                                                                      I think it’d be really funny if he keynoted ICFP with a talk on algebraic effects and never mentions the talk ever again. Will never happen, but a guy can dream.

                                                                                      1. 3

                                                                                        I’m pretty skeptical that anyone who “knows Haskell pretty well” would produce the “Maybe Not” talk. More generally, my experience is that anyone willing to invest the time and energy into learning Haskell tends to be more tuned-in to the actual tradeoffs one makes when using the language, and it’s clear that his emotionally-framed talking points overlap very little with what actual Haskell programmers care or think about. Of course, it could be the case that he does know the language pretty well and simply talks about it like a frustrated novice to frame emotional talking points to appeal to his Clojure true-believers, but this seems far-fetched.

                                                                                        1. 3

                                                                                          IMO the “Maybe Not” talk gets more flak than it deserves. Function subtyping is a valid data representation and Maybe Int -> Int can be represented as a subtype of Int -> Maybe Int. Haskell chooses not to allow that representation, and it is a way in which the type system is arguable incomplete (in the “excludes valid programs” sense.)

                                                                                          1. 4

                                                                                            You’ll have to work hard to convince me that Rich Hickey was arguing on the level of critiquing the “function sub-typing” capabilities of Haskell’s type system vs. the more prosaic “static typing bad!” bullshit he falls back on again and again. Stated straightforwardly, his argument is basically “Maybe is bad because it means you will break the expectations of the caller when you need to introduce it to account for optionality.” And, I suppose it is in fact terrible when you don’t have a type-checker and depend on convention and discipline instead. So, Rich Hickey 1, static typing 0, I guess?

                                                                                            As to “Maybe Not” getting more flak than it deserves…yeah, we’ll have to agree to disagree. (And I’ll note here I’m really surprised that you in particular are taking this position considering how often I see you on here deeply engaging with such a broad variety of academic computer science topics, without needing to use strawmen or appeal to emotion to argue a position.)

                                                                                            For example, “Maybe/Either are not type system’s or/union type.” Okay. How do you even argue with that? I don’t even really understand what he’s trying to assert. Does he not believe the Curry-Howard correspondence is valid? For that matter, which type system? Will I get lambasted by the apologists for not understanding his super subtle point, yet again? Meh.

                                                                                            Someone who was honestly and deeply engaged with the ideas he spends so much time critiquing wouldn’t be babbling nonsense like “you would use logic to do that, you wouldn’t need some icky category language to talk about return types” or “type system gook getting into spec”…god forbid!

                                                                                            I’ll give him this though: Rich Hickey is doing a great job of convincing the people who already agree with him that static typing is bad.

                                                                                            (Edited to fix some links and a quote)

                                                                                            1. 3

                                                                                              As to “Maybe Not” getting more flak than it deserves…yeah, we’ll have to agree to disagree. (And I’ll note here I’m really surprised that you in particular are taking this position considering how often I see you on here deeply engaging with such a broad variety of academic computer science topics, without needing to use strawmen or appeal to emotion to argue a position.)

                                                                                              Thank you! I should note that just because I deeply engage with a lot of different CS topics doesn’t mean I won’t flail around like an idiot occasionally, or even often.

                                                                                              For example, “Maybe/Either are not type system’s or/union type.” Okay. How do you even argue with that? I don’t even really understand what he’s trying to assert. Does he not believe the Curry-Howard correspondence is valid? For that matter, which type system? Will I get lambasted by the apologists for not understanding his super subtle point, yet again? Meh.

                                                                                              Let me see if I can explain this in a less condescending way than the talk. Let’s take the type Maybe Int, which I’ll write here as Maybe(ℤ). From a set theory perspective, this type is the set {Just x | x ∈ ℤ} ∪ {Nothing}. There is an isomorphism from of the maybe type to the union type ℤ ∪ {Nothing}. Let’s call this latter type Opt(ℤ). Opt(ℤ) is a union type in a way that Maybe(ℤ) is not, because we have ℤ ⊆ Opt(ℤ) but not ℤ ⊆ Maybe(ℤ). 3 ∈ ℤ, 3 ∉ Maybe(ℤ). Again, we have Just 3 ∈ Maybe(ℤ), and an isomorphism that maps Just 3 ↦ 3, so in theory this isn’t a problem.

                                                                                              The problem is that Haskell’s type system makes design choices that makes that isomorphism not a valid substitution. In fact, I don’t think Haskell even has a way of represent Opt(ℤ), only its isomorphism. Which means that we can’t automatically translate between “functions that use Opt(ℤ)” and “functions that use Maybe(ℤ)”. Take the functions

                                                                                              foo :: Maybe Int -> Int
                                                                                              foo Nothing = 0
                                                                                              foo (Just x) = x*x
                                                                                              
                                                                                              -- I don't think this is possible in Haskell, just bear with me
                                                                                              bar :: Opt Int -> Int
                                                                                              foo Nothing = 0
                                                                                              foo x = x * x
                                                                                              

                                                                                              Is map foo [1..10] type-safe? Not in Haskell, because map foo has type [Maybe Int] -> [Int] and [1..10] has type [Int]. Is map bar [1..10] type-safe? In a type system that supported “proper” union types, arguably yes! ℤ ⊆ Opt(ℤ), so map bar is defined for all [Int]. So maybe types emulate useful aspects of union types but, in the Haskell type system, don’t have all the functionality you could encode in union types.

                                                                                              Now there are two common objections to this:

                                                                                              1. Haskell has very good reasons for doing things this way. This is 100% true. But it’s true because Haskell has a lot of goals with this type system and being able to encode this particular union type requires us to have proper function subtyping, which would absolutely be a nightmare to combine with everything else in Haskell. But nonetheless shows that Haskell is only exploring one part of the possible space of type systems, and there are valid things that it chooses not to represent. “Proper” union types are one of these things.
                                                                                              2. You can easily write a shim function to make map foo type safe. This is usually people’s objection to this talk. And most of the time you can do this. But this is just a form of emulation, not reproducing the core idea. It’s similar to how in OOP you can “emulate” higher-order functions with the strategy pattern. But it’s not a perfect replacement. For any given emulation I can probably construct a example where your emulation breaks down and you have to try something slightly different. Maybe it doesn’t work if I’m trying to compose a bunch of fmaps.

                                                                                              This is why I think the talk is underrated. There’s a lot of genuinely interesting ideas here, and I get the impression Rich Hickey has thought a lot of this stuff through, but I think it’s hampered him presenting this ideas to a general clojure audience and not a bunch of type-theory nerds.

                                                                                              1. 1

                                                                                                I don’t follow: map foo [1..10] wouldn’t even typecheck; it’s not even wrong to say it’s not typesafe (edit: and apologies if that’s what you meant, I don’t mean to beat you over the head with correct terminology, I just honestly didn’t get it). And while it’s well and fine that there’s an isomorphism between {Just x | x ∈ ℤ} and , it’s not clear to me what that buys you. You still have to check your values to ensure that you don’t have Nothing (or in the case of Clojure, nil), but in Haskell, because I have algebraic data types, I can build abstractions on top to eliminate boilerplate. Your Opt example doesn’t present as better or highlight the power of this isomorphism. Why do I even care that I can’t represent this isomorphism in Haskell? I’m afraid your post hasn’t clarified anything for me.

                                                                                                As far as the talk, I think that if he had some really interesting ideas to share, he’d be able to explain them to type theory nerds in the same talk he gives to his “core constituency.”

                                                                                                At this point, I have trouble considering the output of someone who, no matter how thoughtful they may be, has made it clear that they are hostile to certain ideas without justifying that hostility. There is plenty of criticism to be made of Haskell and type theory without taking on the stance he has taken, which is fundamentally “type theory and type systems and academic PLT is not worth your time to even consider, outside of this narrow range of topics.” If he was a random crank that’d be fine, but I think that because of his position, his presentations do real harm to Clojure programmers and anyone else who hears what he’s saying without having familiarity with the topics he dismisses, because it shuts them off to a world of ideas that has real utility even if it’s very much distinct from the approach he’s taken. It also poisons the well for those of us in the community who have spent serious time in these other worlds and who value them, which is a tremendous shame, because Rich Hickey does have a lot of great ideas and ways of presenting them intuitively. So regardless of how eloquently you may be able to translate from Rich Hickey-ese, what I object to fundamentally is his intellectual attitude, moreso than his ideas, many of which I agree with.

                                                                                                Thank you! I should note that just because I deeply engage with a lot of different CS topics doesn’t mean I won’t flail around like an idiot occasionally, or even often.

                                                                                                Well understood from personal experience. ;-)

                                                                                                1. 2

                                                                                                  I don’t follow: map foo [1..10] wouldn’t even typecheck; it’s not even wrong to say it’s not typesafe (edit: and apologies if that’s what you meant, I don’t mean to beat you over the head with correct terminology, I just honestly didn’t get it).

                                                                                                  That’s what I meant, it wouldn’t typecheck. Brain fart on my part.

                                                                                                  And while it’s well and fine that there’s an isomorphism between {Just x | x ∈ ℤ} and ℤ, it’s not clear to me what that buys you. You still have to check your values to ensure that you don’t have Nothing (or in the case of Clojure, nil) … Your Opt example doesn’t present as better or highlight the power of this isomorphism.

                                                                                                  Let’s try a different tack. So far we have

                                                                                                  foo :: Maybe Int -> Int
                                                                                                  bar :: Opt Int -> Int
                                                                                                  

                                                                                                  Now I give you three black-box functions:

                                                                                                  aleph :: Int -> Maybe Int
                                                                                                  beis :: Int -> Opt Int
                                                                                                  gimmel :: Int -> Int
                                                                                                  

                                                                                                  foo aleph typechecks, as does bar beis. foo gimmel doesn’t typecheck. I think all three of those we can agree on. Here’s the question: what about bar gimmel? In Haskell that wouldn’t typecheck. However, we know that Int ⊆ Opt Int. gimmel’s codomain is a subset of bar’s domain. So bar must be defined for every possible output of gimmel, meaning that bar gimmel cannot cause a type error.

                                                                                                  This means that because Haskell cannot represent this isomorphism, there exists functions that mathematically compose with each other but cannot be composed in Haskell.

                                                                                                  Why do I even care that I can’t represent this isomorphism in Haskell?

                                                                                                  Mostly this is a special case of function subtyping, which I don’t think Haskell supports at all? So if function subtyping makes your problem domain more elegant, it’d require workarounds here.

                                                                                                  1. 1

                                                                                                    To be clear I understood your point initially about function subtyping being a thing that maybe Haskell can’t represent, and I apologize for making you come up with multiple creative examples to try to illustrate the idea (but I appreciate it)!

                                                                                                    So if function subtyping makes your problem domain more elegant, it’d require workarounds here.

                                                                                                    What remains unclear for me–if we’re treating this as a proxy for Rich Hickey’s argument–is how this demonstrates the practical insufficiency of Maybe, which is his main pitch. I am happy to acknowledge that there are all kinds of limitations to Haskell’s type system, this is no surprise. What I don’t yet understand is why this is a problem wrt Maybe!

                                                                                                    In any case, thank you for the thoughtful responses.

                                                                            2. 3

                                                                              Static typing, in my uninformed opinion, is less about catching bugs than it is about enforcing invariants that, as your application grows, become the bones that keep some sort of “tensegrity” in your program.

                                                                              Without static typing your application is way more likely to collapse into a big ball of mud as it grows larger and none of the original engineers are working on it anymore. From this perspective I suppose the contract approach is largely similar in it’s anti-implosion effect.

                                                                              However, type theory offers a whole new approach to not only programming but also mathematics and I think there is a lot of benefit we still haven’t seen from developing this perspective further (something like datafun could be an interesting protobuf-esque “overlay language” for example).

                                                                              On the other hand, dynamic programming (I think) peaked with lisp, and clojure is a great example of that. A lightweight syntax that is good for small exploratory things has a lot of value and will always be useful for on the fly configuration. Doesn’t change that the underlying platform should probably be statically typed.

                                                                            3. 6

                                                                              Static typing is mentioned in section 8 in regards to the spec library, and Scala is named in the Epilogue as a mature alternative.

                                                                              1. 3

                                                                                “In this article we have listed a number of features that positively separates Clojure from the rest.” well, seems like the author thinks they address scala as well as java in the article, even though it’s only named in the previous sentence.

                                                                                Spec is no alternative to static typing, as far as I know. Isn’t it just runtime checks, and possibly a test helper? Scala and kotlin both have decent (or even advanced) type systems. I think some of the points are also advantages over kotlin and scala (repl and simplicity/stability, respectively), but the choice is not as black and white as depicted in OP.

                                                                                1. 2

                                                                                  Spec isn’t static, but it can provide much of the same functionality as a type system like Scala’s, just that it does so at runtime rather than compile time. In addition, it can be used to implement design-by-contract and dependent types and it can also generate samples of the specified data for use in testing. It’s not the same but it is an alternative.

                                                                              2. 6

                                                                                Yeah #8 and #9 (and arguably #7) really are just downsides that are framed as if they were upsides. “There’s no static typing, but here’s what you can do instead!” and “The startup time is really slow; here are some ways to work around that problem!”

                                                                                1. 2

                                                                                  I read #9 as ‘by default, clojure is compiled, like java. But here’s a way to get instant(-ish) startup time, which is impossible with java.’

                                                                                  1. 2

                                                                                    Being compiled has nothing to do with how fast something starts; planck is compiled as well.

                                                                                    1. 1

                                                                                      clojure is compiled, like java

                                                                                      That is, clojure’s startup time characteristics are similar to java’s.

                                                                                      1. 2

                                                                                        uh … no?

                                                                                        Clojure code can’t run without loading the Clojure runtime, which is implemented as an enormous pile of Java bytecode. A “hello world” in Java only has to load a single bytecode file, whereas a comparable Clojure program will have to almost all of clojure.jar before it can even begin to do anything.

                                                                                2. 4

                                                                                  That sounds like someone who’s a bit too enthusiastic to acknowledge the shortcomings that any tool has.

                                                                                  Immediately after reading the article, I agreed with this comment. After re-reading it more critically, I think the issue isn’t that he is too enthusiastic, as much as that isn’t the point of this. Outside of the brief jab in the second to last sentence (“that positively separates Clojure from the rest”), to me this doesn’t read as a comparison between all the potential successors, just something aimed at getting people interested in trying Clojure.

                                                                                  As someone who hasn’t written Clojure before, but really enjoys Lisp-based languages, I found it to be a helpful overview of the language. The fact that there are no negatives listed doesn’t deter me from believing the positives any more than if I was looking for a new computer and the specifications didn’t list out other companies that make a better product. It just makes me want to carry out his last sentence and see for myself:

                                                                                  … the best way to learn Clojure is to actually use it.

                                                                                  1. 5

                                                                                    As a fan of static typing, I would not advertise Java’s implementation of it as a feature. More of an anti-feature. It doesn’t track whether or not a field or variable pointing at a reference type can be null or not. It doesn’t support algebraic data types.

                                                                                    For contrast, I would advertise the static type systems in Kotlin and Scala as features.

                                                                                    1. 2

                                                                                      There is little difference between Java’s type system and Kotlin’s type system.

                                                                                      1. 4

                                                                                        Kotlin has literally the exact two features I just mentioned. Java does not.

                                                                                        1. 1

                                                                                          Yes, and those two features do little difference – so it’s weird saying “Java bad, Kotlin good”.

                                                                                          1. 2

                                                                                            Explicit nullability tracking is not a small deal. Never having to accidentally cause a NullPointerException again is liberating.

                                                                                            ADTs mean you can implement everything in the “make wrong states unrepresentable” paper.

                                                                                            1. 1

                                                                                              That’s a bit like saying “look how crazy fast this 486 is .. compared to a 386!”.

                                                                                        2. 2

                                                                                          Doesn’t Kotlin have algebraic data types and pattern matching?

                                                                                          1. 1

                                                                                            Yes, barely, and no.

                                                                                    1. 1

                                                                                      Interesting to me because in a past life (about 5ish years ago) I was working at a web advertising company. The company was in the process of switching from having statistics in MySQL and counting things with queries like SELECT hour, count(*) FROM hits WHERE customerId=? GROUP BY hour, to storing everything in Cassandra instead with counters precomputed for all of the roll-ups (e.g. one counter for each (customerId, hour) pair).

                                                                                      I say it was in that process because the company died ignomiously, being bought out by a competitor AFAIK for a rather modest sum. I don’t think this was Cassandra’s fault though. ;)

                                                                                      The thing I’d be super interested to know, which this article didn’t get into at all (*), is why the use of counter columns was so much more expensive for them than regular upserts. Never benchmarked it myself but at the time I was told that Cassandra counter increments were cheaper than regular upserts - precisely because of the crappy (non-idempotent, prone to under- or over-counting) semantics.

                                                                                      (* No blame for not digging into the root cause, it’d probably take a bunch of effort.)

                                                                                      1. 13

                                                                                        I probably won’t express this quite right, but to me this is one of the Big Ideas of golang, which is talked about a bit less than - say - channels or lack of generics :-)

                                                                                        We write code, with an implicit sequence. “perform this request to get a response, do this operation on it and return it to our caller”.

                                                                                        That works fine in a single-threaded world. Until we want to do other things at the same time, and we realise we’ve made a blocking I/O request to fetch the result.

                                                                                        So we don’t want to do that. So various syntactic sugar is invented for “ok, schedule this request and then get on with other work. When the request arrives back, run this bit of code and then arrange things so we can return our calculated value to our caller, whom you will have resumed”.

                                                                                        That sugar might be async/await, it might be a chain of callbacks, it might be promises. Heaven help you, it might be a continuation. But what they are all striving to convey is that simple “do this, get that, return something”, in a sequence.

                                                                                        The golang runtime allows you to write exactly this. Behind the scenes, it will spawn an OS thread to sit in the blocking syscall, while the runtime gets on with your other work. There is no “colour of function” problem.

                                                                                        I know it is simply “green threads and OS threads with a managed thread pool”. I also know that “threading is hard”. Golang encourages promiscuous use of green threads (goros) and it seems to work. Sometimes something which seems like a difference in degree is actually a difference in kind.

                                                                                        But this idea - that the programmer simply expresses “this, then that” and the runtime removes most of the obstacles, is - I think - very powerful.

                                                                                        1. 2

                                                                                          I’m on board with this.

                                                                                          Python already had, before async was added, a green threading system that worked kinda okay within Python’s well known limitations such as the existence of a GIL.

                                                                                          import gevent
                                                                                          gevent.monkey_patch_all()
                                                                                          import thread # and go use threads
                                                                                          

                                                                                          PyPy made gevent reasonably faster by bringing some of the perf critical bits in to the interpreter AIUI.

                                                                                        1. 0

                                                                                          It mentions needing Google Play Services (for notifications?). Does anyone know if it works (perhaps with reduced functionality) without Google Play services? Contemplating a Pixel 3a running Graphene

                                                                                          1. 2

                                                                                            Not really what you’re asking but I’m running the latest version (5.600) from F-Droid on GrapheneOS without Google Play services. This one works at least.

                                                                                            I do have the most minimal microg installed for running a Gcam port.

                                                                                            1. 1

                                                                                              Huh? Where do you see that? It’s working fine for me without Google Play Services on my OnePlus 6T running LineageOS! I’m sure the same would be true for a Pixel 3a running Graphene.

                                                                                              1. 1

                                                                                                “So why did we not update the app? It’s a combination of things. A major factor was the API level requirement by Google Play. “

                                                                                                Did I mis-understand? Likely… :) Good to know though, thanks!

                                                                                                1. 3

                                                                                                  I think that sentence is referring to a Google Play (store) requirement (last year) that apps bump their targetSdk versions in AndroidManifest.

                                                                                                  The main breaking change this resulted in is you have to add explicit runtime requests for some permissions that used to be requested at install time through AndroidManifest.xml

                                                                                                  1. 3

                                                                                                    Ah, the “API target level” is an Android app configuration option that determines what Android APIs you can use in your app. Google Play now requires that all apps increase this to a new minimum version. It has nothing to do with Google Play Services.

                                                                                                    1. 1

                                                                                                      Additionally to this, when Google decide to make a non backwards compatible change, they always do so by having the change only take effect when the app’s declared targetSdk version is >= the version in which the being change was introduced.

                                                                                                      The second half of this is that a while later the Google Play store starts rejecting new apps with older targetSdk, so app devs don’t get to just leave everything on the oldest targetSdk value forever

                                                                                                    2. 1

                                                                                                      Ah… Thanks all!

                                                                                                1. 2

                                                                                                  I await the Rock Paper Shotgun review with bated breath.

                                                                                                  1. 4

                                                                                                    Very difficult to read on mobile, does this weird whole screen flickering on scroll on Android Chrome.

                                                                                                    1. 11

                                                                                                      I was hoping for a real discussion of using Elixir at significant scale, alas. This is an intro article from an agency that, while employing a number of Elixir heavy hitters, mostly deals in agency-scale projects.

                                                                                                      1. 1

                                                                                                        I was hoping for the same. I could not really find content about how to scale up a gen_server. If somebody has any good resources it would be great to see it here.

                                                                                                        1. 4

                                                                                                          There’s no “scaling up” a genserver, a genserver is one process just like any other one on the VM. To scale past a bottleneck where one genserver can’t process its messages fast enough (a common Elixir/Erlang scaling problem), you need to figure out how to use more than one genserver to do the work. That means you can’t call the genservers by name anymore, but the process Registry is good for that

                                                                                                          I’ve been using a pattern lately where, in the genserver’s init(args) function, I register the process under a name that is passed in the args parameter. Then I launch the genserver under a supervisor. If the process dies, its entry will be removed from the registry, and when the supervisor launches another genserver to replace it, that server will register itself automatically. I like it so far.

                                                                                                          1. 2

                                                                                                            You’ve just posted a very short introduction to the concept of how to scale up the use of gen_server(s, plural) 😸

                                                                                                            1. 2

                                                                                                              If your gen_server does not need a lot of state to start it up (or side steppable with persistent_term), it may be easier to use simple_one_for_one and spawn a new process for each unit of work and have the source of the request monitor the process to handle retries. This works especially well when the inputs are binaries as then a copy of that data is not produced so starting up becomes cheaper and GC is now just the process destruction.

                                                                                                              Routing is difficult and sometimes it is best just to skip it altogether, especially when spawning is really really cheap.

                                                                                                              timer:tc(fun() -> Self = self(), Pid = spawn(fun() -> Self ! ok end), receive ok -> ok end end).
                                                                                                              
                                                                                                        1. 2

                                                                                                          Confused. What would the Linux equivalent to “base installation” be? Perl is included in pretty much every relevant Linux distro out there.

                                                                                                          1. 3

                                                                                                            There isn’t one. In FreeBSD the base system contains everything that the FreeBSD project themselves want to take responsibility for directly. Boot loader, kernel, libc, at least one shell, the login daemon, sshd, a C compiler. Everything you need to get the system running and at least minimally useful and self hosting.

                                                                                                            Everything in the base system is checked into a single repo. The base system has its own mechanisms for updating (used to be a thing called freebsd-update, dunno if it changed since).

                                                                                                            Third party software (which is most useful software, since the set of things you might want to do is much bigger than what the FreeBSD project would ever want to put in their own tree) is installed & updated via a separate mechanism called packages. FreeBSD Ports is a repo of instructions for how to build packages.

                                                                                                            As a rough approximation, the base system takes care of stuff in / and /usr and.packages all install into /usr/local and /opt.

                                                                                                            Unlike in Linux where the kernel (Linux) and libc (glibc, musl, etc) are written by different sets of people who don’t necessarily even like each other, in FreeBSD the kernel and userland are all worked on together. It’s possible to land atomic patches that simultaneously add a new syscall and add support for it to libc and man pages for it, for example.

                                                                                                            You could call the minimal set of packages that come with a default Debian install its “base” but it’s still not the same I think because they all upgrade separately.

                                                                                                            1. 3

                                                                                                              Yeah, Debian has two or three related concepts of a base system.

                                                                                                              There is the set of packages with Essential: yes — basically, the package manager and the utilities the package manager depends on for its operation. The package manager won’t even let you remove these, because if you did, you wouldn’t have anything left to reinstall them with. This amounts to around 20 packages (including perl-base, as a matter of fact, since debconf uses it).

                                                                                                              Then there is the set of packages with Priority: required. This is another 50 or so packages that the package manager can live without, but the system administrator probably can’t. This includes libc6 (nice to have, right?), mount, passwd, procps, and e2fsprogs (provides fsck). The essential and required packages together form minbase.

                                                                                                              Then there’s the set of packages with Priorty: important, which adds about another 100. These include bzip2 and xz (minbase only gets you gzip), the tools for loading and unloading kernel modules, less, vim-tiny, sudo, ping, and… unavoidably on a modern version, systemd and dbus. It also includes python3, making it the other language you’re pretty much guaranteed to find. minbase plus important equals base, which is the minimum thing you can install from install media (but you can debootstrap to minbase for containers or embedded or whatever, if you know what you’re doing).

                                                                                                              Anyway, as you say, most of that stuff is third-party, not created by the Debian maintainers themselves. Although more than the usual amount of scrutiny goes into anything that’s marked as important enough to get into base.

                                                                                                          1. 4

                                                                                                            Nice writeup. I’ve always heard that Git doesn’t handle files (with names), but handles obiects. How does that relate to this? Are file names just tags to an ‘object’, for which you change the tag on rename? And does committing make git resolve these names to these objects first?

                                                                                                            1. 11

                                                                                                              (Simplified) Git has multiple kinds of objects, one is a blob of content, addressed by its hash, another is a tree which is a list of file names associated with a blob’s hash, and yet another is a commit which is a commit message, a tree addressed by its hash, and zero or more parents addressed by their hash.

                                                                                                              These are all immutable, so you don’t change a tag, you create a new commit with a new tree and whose parent is the “previous” commit, and you make that your active commit (HEAD) which is again just addressing the commit object by its hash.

                                                                                                              Renames are a function of presentation of the data, if you ask it to look at two trees (do a diff) and one has a file a and the other has a file b and they both point at the same blob (their contents have the same hash), git is going to infer that they were renamed (whether that’s what happened or not).

                                                                                                              1. 3

                                                                                                                Oh hey does that mean that git deduplicates its storage of identical files for free? (Obvs not in the working tree, but in the .git directory.) Since they’ll have the same hash, it can just have the same blob referred to from multiple points in a single tree?

                                                                                                                1. 2

                                                                                                                  Yep.

                                                                                                              2. 3

                                                                                                                This utility is quite nice to explore the underlying data structure:

                                                                                                                $ git ls-tree HEAD
                                                                                                                <snip>
                                                                                                                100644 blob 5caf2e89168505c24ad1e3146fd029929f27487a	main.go
                                                                                                                040000 tree d0357c0f78bab0bd5dbb19f7d805bcb987ce74a6	man
                                                                                                                040000 tree 1ce4d49aa464dfdfe0314b0937e2a203dacdc96e	nix
                                                                                                                100644 blob 0959aae462cbec0d6e1cd1d7691f1262350989ee	rc.go
                                                                                                                <snip>
                                                                                                                $ git ls-tree HEAD man/
                                                                                                                100644 blob b5b49633b7fe4cb364b476ad7255575e4e515765	man/direnv-stdlib.1
                                                                                                                100644 blob 57ff9cb23b73219eeac2317c2d4f52ed0cdbaf59	man/direnv-stdlib.1.md
                                                                                                                100644 blob b4a2fa2e806593c80dfbf5b0ad325303635ca74a	man/direnv.1
                                                                                                                100644 blob e180e462681bf41c458c47e85470cd2e882c3899	man/direnv.1.md
                                                                                                                100644 blob 763d8b9e0383ca9f2ae6d1433aaafbad1753f406	man/direnv.toml.1
                                                                                                                100644 blob 1487278964fd7d98c1200c01cbd020ab0953647e	man/direnv.toml.1.md
                                                                                                                

                                                                                                                see also git cat-file

                                                                                                                1. 1

                                                                                                                  Git stores a directory as a list of (<name>, <hash>) pairs. The hash of that list is stored in the parent directory (along with the directory name).

                                                                                                                  When you edit app/foo.sh and commit, foo.sh gets a new hash. The listing for app includes this new hash. The root directory entry for app also gets a new hash by the same process.