1. 9

    Perhaps unfortunately, I think this article is more interesting for the assumptions the author makes, or glosses over than the actual content. A few I noticed were:

    • There’s no metadata header required per object in memory (there must be somewhere, as you’ll need it for both knowing which fields in an object need to be traced by the GC, and also run type casts)
    • That other languages that experience fragmentation do so simply because they don’t use a “modern” allocator, and that it’s not a tradeoff between space utilisation, allocation / deallocation speed and runtime overhead.

    The hackernews comments have a lot more details on mistakes, too, if that’s your jam.

    1. 7

      Agreed. I mostly agree with the author, (and like him I prefer Go to Java) but there’s a lot I think he overlooks or got wrong.

      • He did actually talk about the object metadata in Java; that’s the Klass* in the OpenJDK header. But Go must need something similar, even if it’s statically typed, since the GC doesn’t know compile-time types.
      • He mentions Java objects having to store mutexes. It’s true that synchronized was one of Java’s biggest mistakes, but to my knowledge the HotSpot runtime stores mutexes in a side table not in the object header.
      • He mentions cache coherency, but not that relocating objects is terrible for cache performance.
      • “Sooner or later you need to do compaction, which involves moving data around and fixing pointers. An Arena allocator does not have to do that.” Yes, but on the flip side, an arena allocator keeps the whole arena in memory as long as even one object in it is still in use — thi can cause memory bloat requiring you to track down how pointers are escaping.
      • “it’s likely to be more efficient to allocate memory using a set of per-thread caches, and at that point you’ve lost the advantages of a bump allocator.” My understanding is that you get around that with per-thread bump allocators.
      • “Modern memory allocators such as Google’s TCMalloc or Intel’s Scalable Malloc do not fragment memory.” — There’s no such thing as a non-fragmenting allocator, unless all your allocations are the same size. Modern allocators just fragment less.
      1.  

        Yes, but on the flip side, an arena allocator keeps the whole arena in memory as long as even one object in it is still in use — thi can cause memory bloat requiring you to track down how pointers are escaping.

        You can still selectively free unused pages in a nearly-empty arena to save RSS. But that’s a tricky latency/throughput tradeoff since madvise() is so expensive.

      2.  

        Go’s GC needs to know the start and size for all objects, and where pointers are in them. Start and size are easy for objects up to 32 Kb, which are allocated in size classes in a slab-style allocator; given a memory address, you can determine what slab class it’s in, which gives you both start and size using no per-object metadata. For pointers, the simple version is that Go keeps a bitmap of where there are active pointers in a given area (page, etc) of memory. The pointer alignment requirements mean that this bitmap can be quite dense. As an optimization, there are special slab classes for ‘size class X and contains no interior pointers’, which need no bitmaps at all.

        The important thing about these is that while they both require metadata, it’s large-scale metadata (and it’s aggregated together in memory, which probably helps efficient access as compared to chasing pointers to per-type information for each object).

        One corollary of this is that Go’s GC doesn’t actually know the type of any object in memory. It might not even know the exact size, since some size classes are ranges of sizes.

        (I don’t know exactly how Go implements objects larger than 32 Kb, but I can think of various plausible approaches.)

      1. 21

        We can see that very practically using standard linux / programming tools. While you can use cat to show the data of a file, it won’t work for directories

        That’s a relatively recent addition. Early versions of UNIX exposed directories just like any other file. Before the APIs were added for traversing directories in a filesystem-independent way, userspace would just open directories and parse them directly. Even on modern systems, the ‘data can be stored only in leaf nodes’ part is very dependent on the filesystem and the definition of data. In NTFS and HFS+, I think you can store alternate data streams on directories and on most *NIX systems you can set extended attributes on directories.

        If you’re thinking of files and folders using their actual analog equivalents

        This is why I like to differentiate between a directory and a folder. A directory is (like a telephone directory) a key-value store that indexes from some human-friendly names to something a computer uses. On the original UNIX filesystems, it was exactly that: a file that used a fixed-sized record structure to map from names to inode numbers.

        In contrast, a folder is a UI abstraction that represents a container of documents.

        The fact that folders are typically implemented with a 1:1 mapping to directories is an implementation detail. In BFS, for example, both folders and saved searches were implemented as directories.

        The fact that inner nodes in a file system hierarchy cannot hold data is limiting. In many use cases, it’d be natural to have nodes that can hold data AND have children. Because file systems don’t allow this

        The problem is not that filesystems don’t allow this, it’s that there are no portable interfaces to filesystems that allow with. People want to be able to deploy their persistent data structures onto ext4, ZFS, UFS2, APFS, HFS+, NTFS, and sometimes even FAT32 filesystems and access it with SMB, WebDAV, or NFS (or AFS or AFP) network shares. This means that the set of things that you can use from any given filesystem is the intersection of the features provided by all of these things.

        I was expecting from the title for the author to realise that filesystems are not trees, but apparently this didn’t happen. Filesystems are DAGs if you’re lucky, graphs if you’re not. Hard links make a filesystem a DAG because the same file can be in multiple leaf nodes. Hard links on HFS+ are allowed (if you’re a sufficiently privileged user) to point to directories, which allows them to be an arbitrary graph (the ‘sufficiently privileged’ part is because a lot of things break if they are and so this privilege is granted only to things like the Time Machine daemon that promise not to create cycles). Junctions / reparse-points on NTFS can also create cycles, as can symlinks (though in both cases, at least you know that you’re potentially entering a cycle).

        1. 3

          We can see that very practically using standard linux / programming tools. While you can use cat to show the data of a file, it won’t work for directories

          That’s a relatively recent addition. Early versions of UNIX exposed directories just like any other file. Before the APIs were added for traversing directories in a filesystem-independent way, userspace would just open directories and parse them directly. Even on modern systems, the ‘data can be stored only in leaf nodes’ part is very dependent on the filesystem and the definition of data. In NTFS and HFS+, I think you can store alternate data streams on directories and on most *NIX systems you can set extended attributes on directories.

          IIRC it still worked on FreeBSD versions as recently as 11. In most traditional Unix filesystems (at least I believe it’s the case for [UF]FS and ext{,2,3,4}) a directory is stored much the same way as a file, the only difference is a flag that indicates the blocks pointed to by the inode are used to store directory entries rather than file content. The Rust equivalent is probably closer to:

          struct DirEnt {
              name: String,
              node: Rc<Inode>
          }
          
          enum Inode {
              File: Vec<u8>,
              Dir: Vec<DirEnt>
          }
          

          Note the Rc as well, because an inode might be pointed to by more than one directory entry.

          1. 1

            Can confirm this worked in recentish FreeBSDs. Made some nice file read vulnerabilities funnier to exploit :)

            1. 1

              Modern Unix filesystems use more complicated on-disk structures for at least large directories, because they want to be able to get a specific directory entry (or determine that it doesn’t exist) without potentially having to scan the entire directory. The increased complication of the actual structure of directories and directory entries is one reason why Linux moved to forbidding reading directories as regular files.

              (Some directories haven’t been readable as regular files for a long time. One example is directories on NFS mounts; from the beginning of NFS in the 80s, you could only look at them with NFS directory operations.)

              1. 1

                The increased complication of the actual structure of directories and directory entries is one reason why Linux moved to forbidding reading directories as regular files.

                Did Linux ever allow it? I was under the impression that it had been distinguishing feature between it and the BSDs quite early on.

                1. 1

                  Based on looking at the source code of 0.96c on www.tuhs.org, it appears that early Linux allowed reading directories with read(). ext_file_read() in fs/ext/file.c, appears to specifically allows reads of S_ISDIR() inodes, although 0.96c also has readdir() and friends.

            1. 4

              Very interesting, but the question remains: Why did 4BSD add a stack size limit?

              My guess: 4BSD also introduced some kind of shared memory threading. Unbounded stack growth is incompatible with multithreading in C (since threads in the same address space would run into each other’s stacks), so they would need to limit stack sizes.

              The fact that Unix didn’t have a stack size limit early on is a good illustration of how shared memory threading is counter to the Unix model. (the Unix model is processes passing messages)

              1. 12

                The fact that Unix didn’t have a stack size limit early on is a good illustration of how shared memory threading is counter to the Unix model. (the Unix model is processes passing messages)

                I feel like people put too much stock in there being a single, coherent “UNIX model” that hasn’t chiefly been assembled to justify the limitations of early machines and early software. Even if processes had been limited to a single thread forever, you would also have had to eschew all memory mappings other than brk and the thread stack. Also, there was a stack limit: when you bumped into the brk moving the other way in what was by modern standards an unimaginably small address space.

                1. 1

                  I feel like people put too much stock in there being a single, coherent “UNIX model” that hasn’t chiefly been assembled to justify the limitations of early machines and early software

                  Ah, actually operating systems before Unix (such as Multics) had shared memory. Unix deliberately chose to not have shared memory, it wasn’t a restriction of the available hardware and software.

                  1. 3

                    Ah, actually operating systems before Unix (such as Multics) had shared memory. Unix deliberately chose to not have shared memory, it wasn’t a restriction of the available hardware and software.

                    Multics was, as far as I recall, developed on and for machines with a larger address space and in general more capacity than the original PDP-7 and PDP-11 systems that were the target for UNIX. I’d love to see a citation for choosing not to have shared memory on purpose as a permanent design goal, and not merely as an artefact of an early software implementation that had not yet grown a number of the features that make it useful. Recall that early UNIX also used whole-process swapping as a means for time sharing, which is also relatively simple to implement but not desirable as an end state.

                    1. 3

                      The Multics design of sharing memory through segments could work on any system, it wasn’t expensive in terms of hardware resources. I don’t have a citation on hand, but these facts:

                      • segments (shared memory) were the core abstraction of Multics
                      • Unix was famously designed in reaction to the failure of Multics
                      • Unix does the exact opposite of Multics when it comes to sharing memory (i.e. Unix doesn’t have it at all)

                      are pretty suggestive. Note also that Unix only got shared memory many years later after the original designers stopped working on it, and also that Unix was already deployed and used at many sites long before it got shared memory. (so shared memory is not really in the class of “features that make it useful”)

                2. 4

                  Although I don’t know for sure, I think it matters that 4BSD was one of the earliest Unixes that ran on 32-bit architectures, which permitted large memory spaces, and ran in a semi-hostile environment (undergraduate students running student programs). You hardly needed memory limits with V7 on a PDP-11, which had at most 64 KB for your stack and data; a VAX provided far much more room for unfortunate memory usage events. For the record, 4BSD still had no threads and as far as I know no explicit shared memory of any form (whether System V shared memory or the modern mmap). Many things about V7 Unix are very simple and minimal compared to today, so not having process limits is not much of a surprise.

                  (I’m the author of the linked-to entry.)

                  1. 3

                    Unbounded stack growth with a single thread just means it runs into the heap. AFAIK this was a “feature” of hardware of the time - on x86 you’d have SS==DS, both grow from opposite ends of the same segment, and programs can decide for themselves how to apportion that memory. But without some kind of boundary enforced by memory protection, when they collide the result is memory corruption that the kernel can’t protect.

                    So I’ll bet the limit is because it’s desirable to have a boundary, and the limit was chosen to be extremely large, indicating runaway stack consumption more than an actual constraint. 4Mb is a lot of stack, particularly of that era.

                    1. 3

                      As I recall, 4BSD was the first UNIX system to support a paged MMU. It introduced the mmap system call, which was used to create anonymous memory mappings, shared memory mappings, and perform memory-mapped I/O. This meant that you had a fragmented address space for the first time.

                      On traditional UNIX (note: the following paragraph contains gross oversimplifications), your memory abstraction was comprised of three / four segments in the address space (which may or may not have been enforced by an MMU, depending on the target). These were code (+ data from globals, sometimes in a separate segment sometimes part of the code segment), heap, and stack. The binary was loaded at the bottom of the address space. The stack started at the top of the address space. The heap grew from the top of the binary. On systems with a segmented MMU, each of these was one (or more) segments with permissions. The brk and sbrk system calls moved the line for ‘memory that can be used for heap’ up and for ‘memory that can be used for the stack’ down. Without an MMU, these told the kernel how much memory it needed to write out on context switch and allowed it to raise an error if they stack and heap segments overlapped.

                      With a paged MMU, the address space could contain arbitrary mappings at arbitrary locations. You could still grow the stack with sbrk, but you might fail long before you ran out of address space because some other mapping was situated below the stack. The kernel therefore needed to track the size of the address space reservation for the stack, to prevent anything else being mapped there. Once you have a thing whose size you need to track, the obvious next step is to allow the size to be configurable. You already need to be able to track dynamic sizes for other kinds of VM object (e.g. file-backed mappings) so you don’t lose any space (and you do gain on generality) by making the stack something with a per-process configurable size.

                      1. 2

                        While BSD did introduce paged memory as part of their VAX port, the mmap() system call only came years later, in SunOS 4 (as far as I can determine). 4BSD still had a linear data space map with code / data / heap at the bottom, growing up, and stack at the top, growing down. Also, sbrk() doesn’t change the stack; both it and brk() affect the heap.

                    1. 18

                      This is one of the things I wanted to write in response to https://lobste.rs/s/ezqjv5/i_m_not_sure_unix_won but haven’t really been able to come up with a coherent response. Anyone who believes that “in the good old days” UNIX was a monolithic system where programs could be easily run on different UNIXes wasn’t there. Hell, even if you stuck with one vendor (Sun), you would have a hell of a time upgrading from SunOS to Solaris, not to mention, HP-UX, AIX, SCO UNIX (eugh), IRIX and many others. Each had their “quirks” and required a massive porting effort.

                      1. 3

                        Hi, author of that original post. You’re definiltey not wrong, unfortunately. My concern with that original post was the fact that Linux was heading in the same direction of doing their own thing, rather than POSIX or the Unix-way. We had a chance to do it better, with hindsight this time.

                        (Whether the Unix-way ever truly existed is another point I’m willing to concede!)

                        Having had time to think about it more, Linux does deserve more credit than I gave for it. By and large, porting Linux stuff to BSD now is easier than some of the commercial Unixen of late (yes, I was there, if only for a few years). But it does feel like we’re slowly going backwards.

                        1. 6

                          As a flip side to that, I think that getting away from POSIX and “The UNIX way” (whatever that means), is actually moving forwards. “The UNIX way” was conceived in the days when the standard interface was a jumped up printer, and 640KB of RAM was “enough for anyone”. Computers have exploded in capability since then, and “The UNIX way” was seeming outdated even 30 years ago (The UNIX-HATERS mailing list started in 1987). If you told Dennis Ritchie and Ken Thompson in the 70s that their OS would power a computer orders of magnitude more powerful than the PDP-11, and then told them it would fit in your pocket… Well, I dunno, Ken Thompson is still alive, ask him.

                          Anyways… My point is that the philosophical underpinnings of “The UNIX Way” have been stretched to the breaking point for a long time now, and arguably, for computer users, rather than developers, it broke a long time ago and they went to Windows or Mac. It’s useful as a metaphor for the KISS principle, but it just doesn’t match how people interface with Operating Systems today.

                          1. 2

                            The Bell Labs people did do ‘Unix mark II’ in the late 1980s and early 1990s in the form of Plan 9. It was rather different from Unix while retaining the spirit (in many people’s view) and its C programming environment definitely didn’t attempt to stick to POSIX (although it did carry a number of elements forward). This isn’t the same as what they might do today, of course, but you can view it as some signposts.

                            1. 1

                              My apologies, I thought the Unix way/Unix philosophy/etc were widely understood. Probably the most famous of these was Doug McIlroy’s “Make each program do one thing well.” Even if we’re building orders of magnitude more complexity today, I think there are still lessons to that approach.

                              I agree we have to move with the times, but thus far reinventions have so far looked like what Henry Spencer warned about with reinventing UNIX, poorly.

                              1. 1

                                “Make each program do one thing well.”

                                The precept is violated by a program like ls. Why does it have different options for sorting by size, ctime etc? Isn’t it more flexible to simply pipe that through sort?

                                sort itself has a -u option, unneeded as you can just filter it through uniq. Yet it’s a feature in both GNU and (Open)BSD versions.

                                1. 1

                                  Are we at the splitting hairs or yak shaving stage now? I guess yaks can have Split Enz, like a Leaky Boat.

                                  My original post was that it was disengenuous to say “Unix won” when Linux did. @mattrose disagreed, saying that the past wasn’t a cross-platform utopia either (true, alongside his quote from famed Unix-fan Bill Gates). I opined we had the opportunity to do better this time, but Linux is making the same mistakes to the detriment of OSs like BSD. Heck, even macOS. Also, that those Unix guys had good ideas that I assert have stood the test of time despite the latest in a long line eager to replace them. The machine I’m writing this on now is proof.

                                  Se a vida é. Wait, that was the Pet Shop Boys, not Split Enz.

                                  1. 2

                                    Proponents of “the Unix way” espouse a strange dichotomy: they propose that the philosophy is superior to all competitors, and decry that the competitors are trouncing it in the market[1].

                                    Something has to give. Perhaps the penchant for it is an aesthetic preference, nothing more.

                                    [1] both in the economic one, and the marketplace of ideas.

                                    1. 2

                                      I totally understand the penchant for “the UNIX way”, and actually share it. It makes everything really simple. UNIX, at it’s base, is sending streams of text from one file-type object to another. It makes “do one thing well” really easy, as it enables combining the output of one program into the input of another program, and so you can write a program that enables that kind of pipelining. Even with non-text streams you can write that kind of pipeline, like gstreamer does, but…

                                      From a user perspective, it’s a nightmare. Instead of having to know one program, you have to know 10 or more to do the same thing, and there is no discoverability. With Excel or its equivalent, I can sum easily select a column of numbers and get the sum of that column. The easiest “UNIX way” of doing the equivalent is, something like cat file.csv | awk -F"," '{ (s+=$3) ; END {print s}' and it took me a while to figure out how to invoke awk to do that, and that required me to know that

                                      1. awk exists
                                      2. awk is good at splitting text into columns
                                      3. awk takes an input field-delimiter option

                                      And that is completely outside of all the awk syntax that I needed to actually write the command.

                                      This is why “the UNIX way” is being trounced in the market. When there’s a more complex, but user-friendly option, the conceptually simpler option is crowded out.

                                      The trick is to try and keep in mind the Einstein aphorism ““Everything should be as simple as it can be, but not simpler”

                            2. 2

                              My personal experience is with the shared GPU drivers; one pile of C code used to work for both Linux and BSD. The main change was the move into the kernel. As GPUs required more kernel-side scaffolding to boot, kernel-side memory management for GPU buffers, etc. the code had to start specializing towards kernel interfaces.

                              In general, a lot of code has moved down into kernels. Audio and video codec awareness, block-device checksumming, encryption services, and more.

                          1. 17

                            I know the author doesn’t want to use Tailscale, but they’re really, and I can’t stress this enough, really good. However, I understand that cost is a concern — perhaps headscale, an open source reimplementation of the coordination server (the proprietary stuff in Tailscale) can possibly be used instead.

                            1. 1

                              Or you can use good old OpenVPN. For remote access to a university network it’s more than sufficient. It’s an old, somewhat clunky tool, but it does the job.

                              1. 7

                                Most VPN technology, OpenVPN included, has the idea of ‘sessions’. Sessions are great in some ways but not great in others, because sessions can get broken and then you have to start over, which can often cut off any existing connections you have over the VPN session (such as ongoing ssh connections). WireGuard is appealing partly because it is completely session-less (and as a result can roam freely; your client can shift IPs without the WireGuard connection exploding). If we could provision WireGuard, I suspect this would make it a better experience for some of our users.

                                (I’m the author of the linked-to entry.)

                                1. 3

                                  I’m not sure I know exactly what problems you’re trying to solve, but you might be interested in innernet as a self-hosted wireguard provisioning option.

                                  1. 1

                                    What do you think of

                                    https://github.com/seashell/drago

                                    1. 2

                                      Something like Drago could eventually automate provisioning clients, but it’s hard to tell how it will evolve as it gets developed more, and the tricky (and time consuming) bit is supporting a UI and integration with WireGuard clients on all of the major platforms (Windows, macOS, iOS, Android, and ideally Linux). Drago also seems to support more flexibility than we’d use, which might be a drawback in practice.

                                  2. 3

                                    For some reason, I can’t get OpenVPN to generate wireguard certs.

                                    And for that matter, OpenVPN usually relies on a local CA to generate OpenVPN certs, which is an exciting premise of its own.

                                    1. 1

                                      wireguard used to have a line like “Don’t even attempt to generate anything with non-wireguard tools” - which at the point was really annoying for one use case I had…

                                1. 26

                                  This post talks about the downsides but does not acknowledge the underlying problem that is being corrected, namely that “go get” today does two completely different things: (1) add or update dependencies of a program, and (2) install binaries.

                                  Nearly all the time, you only mean to do one of these things, not both. If you get in the habit of using “go get” to update your dependencies, you might run a command like “go get path/…” and be surprised when it installs “path/internal/ls” to your $HOME/bin directory. We have had people claim that’s a security issue, and whether that’s true or not, it’s certainly a usability issue. In this example, you only want the dependency change and are getting the install as an unwanted side-effect. After the transition, the rule will be simple: “go get only changes your dependency versions”.

                                  On the other hand, we have commands in READMEs that say things like “go get -u rsc.io/2fa”. Today that means to fetch the latest version of rsc.io/2fa, then upgrade all its dependencies, recursively, producing a build configuration that may never have been tested, and then build the binary and install it. In this example, you only want the install and are getting the “update dependencies” as an unwanted side effect. If instead you use “go install rsc.io/2fa@latest” (which works today in Go 1.16), you get the latest tagged version of rsc.io/2fa, using exactly the dependency versions it declares and was tested with, which is what you really want when you are trying to install a binary. We didn’t just make a gratuitous change to the way the command is spelled: we fixed the semantics too.

                                  The double meaning of “go get” was introduced in GOPATH mode many years ago. It is my fault, and all I can say is that it seemed like a good idea at the time, especially in the essentially unversioned context of GOPATH-based development. But it’s a mistake, and it is a big enough mistake to correct for the next million users of Go.

                                  The post claims this is an abrupt deprecation schedule, but the change was announced in the Go 1.16 release notes and will take effect in Go 1.18. Once Go 1.17 is released, the two supported versions of Go will be Go 1.16 and Go 1.17, meaning all supported Go versions will know what “go install path@version” means, making it time to start nudging people over to using it with prints from the go command. This two-cycle deprecation schedule, where we make sure that we don’t encourage transition work until all supported versions of Go (the last two) support the new behavior, is our standard procedure for this. I suspect that Go 1.18 will keep the warning print, so you’d actually get two releases of warning prints as well.

                                  We try very hard not to make breaking changes in tool behavior. This one is necessary, and we’ve done everything we can to make sure it is a smooth transition. The new behavior should be obvious - the tool tells you what is going on and what you need to do differently - and easy to learn.

                                  1. 2

                                    I know it’s too late to change anything now, but in retrospect, would it have been better to leave go get in the old GOPATH world and introduce the module version of go get under a different name?

                                    1. 1

                                      A quick note: “go get” does three things today, not two. The third is that it clones the version control repository for some Go package or module (into $GOPATH/src). In the process of doing this, it reaches through vanity import paths and works out the correct VCS system and clone path to use. As far as I know, in the non-GOPATH world there is currently no way to do this yourself, either for programs or for other packages. It would be nice at the very least to have a command that resolved through vanity import paths to tell you the underlying thing to clone, and with what.

                                      (Cloning this way also locks you to the current location of the repository. One advantage of vanity import paths is that people can change them, and have, and you will automatically pick up that change.)

                                      The other thing that is not possible in the non-GOPATH world is to install the very latest VCS version of a program (at least in any convenient way). As noted, the @latest syntax is the latest tagged version, not the latest VCS version. This means that if you want to have someone test your very latest development version, you need to either tag it or tell them to clone the repo and build it themselves; you can’t give them a simple, generic ‘go install …@…’ command to do it. (You can give them a VCS version, but there are various problems with this.)

                                      1. 1

                                        The other thing that is not possible in the non-GOPATH world is to install the very latest VCS version of a program (at least in any convenient way).

                                        You can use “go install …@branchname” to get the latest commit for whatever branch.

                                        1. 1

                                          Oh oops, that’s my mistake. I missed that in the Version queries section of the specification.

                                          (If reaching for the specification strikes people as odd, I can only plead that I couldn’t readily find documentation of this in the “go” help and “go help modules” did point me to the specification.)

                                    1. 4

                                      I wanted to argue with this essay but I find that I can’t. “Free as in beer” is certainly why my university adopted open source operating systems; by 1999, proprietary Unixes (usually on proprietary hardware) were no longer cost competitive with x86 machines running open source. A philosophy of being free or open source had nothing to do with it. I don’t think we’re alone in this.

                                      1. 4

                                        I think the “free as in beer” economics can’t be overlooked when considering by profit-driven businesses adopted open source software. I mean, business-friendliness is the the origin of the “open source.”

                                        But the power of “free as in freedom” in motivating people to put open software up for consideration can’t be ignored. It’s ahistorical to pretend the rhetoric wasn’t persuasive and “moving as in a movement.”

                                        Writing off the contributions of thousands as “a lot of competent geeks, plus a lot of precocious children” is … 🤷🏾‍♂️

                                        1. 2

                                          I think that’s definitely true when the competitor is other COTS software, but in the world of bespoke (or heavily tailored) software, which at least used to be about 90% of the industry, there’s very little difference between the two kinds of free. Free-as-in-freedom-to-modify-and-redistribute means free from vendor lock in, which is a massive financial incentive.

                                          I think a lot of the drive towards F/OSS is really about that: companies don’t like vendor lock-in from any of their suppliers. They either want a compatible alternative to exist (often the mere fact of open source alternatives’ existence has helped here, when organisations want to negotiate a good deal) or they want to ensure that no single entity captures their supply.

                                          I also suspect that the success of a few major players in the open source ecosystem is harming this. Red Hat (IBM) has a disproportionate amount of influence over the desktop Linux ecosystem, to the extent that they’ve managed to push core dependencies on quite unpopular projects that they have more-or-less total control over. This means that building on today’s desktop Linux means practical lock-in to an IBM-controlled ecosystem. Even if the license says that, in theory, someone else can fork the components that they control, they’ve managed to build sufficient inertia and employ all of the people that know the codebases well and so that would be difficult.

                                        1. 7

                                          Some of the dormant shells are more ‘mostly finished’ than completely dormant. The one I know about for sure is rc; the modern Unix versions are pleasantly usable, although they still don’t have job control. I believe there’s another version of rc in the Plan 9 ports collection.

                                          1. 2

                                            For rc, I’d suggest looking here: https://github.com/benavento/rc

                                            The plan 9 version of rc largely doesn’t need the interactive features, because they’re outsourced to the window system. The unix version needs them, and most ports don’t add them. This one does.

                                            And yes – it largely is done. There are some changes committed occasionally in 9front, but for the most part, we seem pretty happy with it.

                                            1. 1

                                              The version of rc I personally use has a number of changes cherry-picked from Bert Münnich’s features, which are on top of the base Byron Rakitzis version. Bert Münnich has a number of highly useful features (including quite good command completion); the repo is here.

                                              (Correct command completion for rc is harder than it looks because rc allows command paths like ‘a/b’, which search for that through $PATH. Most people probably don’t use that rc feature, but I do in order to namespace my personal commands and scripts.)

                                          1. 2

                                            Why can’t statically linked Go programs look up hostnames? Is it because of how security is set up on OpenBSD?

                                            1. 10

                                              In general, full support for looking up hostnames requires calling the platform’s C library functions. Go currently only does this in dynamically linked programs; in statically linked programs it uses a pure Go resolver that supports only a subset of the possible hostname lookup features.

                                              (I’m the author of the linked-to article.)

                                              1. 5

                                                Not sure about OpenBSD specifically, but on a lot of *NIX systems name lookup is handled via NSS, which dynamically loads libraries for libc to use: this allows you to extend name lookup without recompiling. Solaris also uses library loading to be able to get locale support in libc.

                                                It’s interesting that this doesn’t mention macOS. All Go programs on macOS broke a while ago because the syscall parameters for gettimeofday changed and Go didn’t go via libSystem and so didn’t pick up the change.

                                              1. 3

                                                I think this debate needs a bit more data, so I’m considering making a “call to arms for Python OO enthusiasts” at some point, my plan is:

                                                • make a github repo, people raise PRs with their code (do I require tests?)
                                                • their code should be the most idiomatic OO, least refactorable into data + functions
                                                • no operator overloading specific stuff
                                                • max 500 lines + I’m only going to look at 5 submissions - I’ve not got all the time in the world
                                                • I attempt to expunge the OO and present my results - the mob decides

                                                Is this fair or useful or just dumb?

                                                1. 3

                                                  I’d play. I mostly agree with your article but think there’s a better case to be made for OO with immutable value objects. Meaning: I think I could write object solutions that would be more or less equivalent to any bag-of-functions approach, both in readability and verbosity.

                                                  1. 1

                                                    I don’t think I could fit a readable submission into 500 lines, but a common (and classical) use for OO is polymorphic evaluation of simple abstract syntax trees for small expression/rule languages. Each different meaningful term or syntax element is a class, and every class has an .Eval() method (usually they take a context object as the argument). To evaluate an expression or a rule, you .Eval() the top level object for the entire rule’s AST, and it .Eval()s its children as appropriate, and so on. Nothing has to know anything about what all of the different terms, operators, elements, and so on are, which makes it quite easy to add more, tinker with the internal implementations, and so on.

                                                    The nicest Python non-OO version I can think of would be to have the AST nodes be bag-of-data objects (different data for different types of nodes) with a common ‘type’ field that named their type. Then you would have a big dict mapping types to their evaluation functions; these functions would take the node bag-of-data object and a context object as arguments. You could also implement a giant ‘eval’ function that knew all of the node types and how to evaluate each one, but that gets ugly.

                                                  1. 8

                                                    Tomorrow seems to be a very bad day for all those poor souls, who didn’t have time/resources to switch to py3 yet. Fortunately enough it can be easily fixed with pip<21 but it will definitely add additional grey hairs to some heads.

                                                    1. 7

                                                      As one of those poor souls, thanks. We have eight years of legacy code that Just Works and so seldom gets touched, and a major 3rd party framework dependency that hasn’t updated to Python 3 either. We just got permission and funding to form a new engineering sub-group to try to deal with this sort of thing, and upper management is already implicitly co-opting it to chase new shinies.

                                                      1. 9

                                                        Python 3.0 was released in 2008. I personally find it hard to feel sympathy for anyone who couldn’t find time in the last twelve years to update their code, especially if it’s code they are still using today. Even more so for anyone who intentionally started a Python 2 project after the 3.0 ecosystem had matured.

                                                        1. 9

                                                          Python 2.7 was released in 2010. Python 3.3 in 2012. Python 2.6 last release was in 2013. Only from this date people could easily release stuff compatible with both Python 2 and Python 3. You may also want to take into consideration the end of support date of some of the distributions shipping Python 2.6 and not Python 2.7 (like Debian Squeeze, 2016).

                                                          I am not saying that 8 years is too fast, but Python 3.0 release date is mostly irrelevant as the ecosystem didn’t use it.

                                                          1. 7

                                                            Python 3.0 was not something you wanted to use; it took several releases before Python 3 was really ready for people to write programs on. Then it took longer for good versions of Python 3 to propagate into distributions (especially long term distributions), and then it took longer for people to port packages and libraries to Python 3, and so on and so forth. It has definitely not been twelve years since the ecosystem matured.

                                                            Some people do enough with Python that it’s sensible for them to build and maintain their own Python infrastructure, so always had the latest Python 3. Many people do not and so used supplied Python versions, and may well have stable Python code that just works and they haven’t touched in years (perhaps because they are script-level infrastructure that just sits there working, instead of production frontend things that are under constant evolution because business needs keep changing).

                                                            1. 4

                                                              Some of our toolchain broke in the last few weeks. We ported to python3 ages ago, but chunks of infrastructure still support both, and some even still default to 2. The virtualenv binary in Ubuntu 18.02 does that; and that’s a still-supported Ubuntu version, and the default runner for GitHub CI.

                                                              I think python2-related pain will continue for years to come even for people who have done the due diligence on their own code.

                                                              1. 4

                                                                Small tip regarding virtualenv: Since python 3.3 virtualenv comes bundled as the venv module in python, so you can just use python -m venv instead of virtualenv, then you are certain it matches the python version you are using.

                                                                1. 1

                                                                  virtualenv has some nice features which do not exist for venv. One of the examples is activate_this.py script, which can be used for configuration of remote environment, similar to what pytest_cloud does.

                                                                  1. 1

                                                                    virtualenv has some nice features which do not exist for venv

                                                                    Huh, thanks for pointing that out. I haven’t been writing so much Python in the last few years, and I totally thought venv and virtualenv were the same thing.

                                                              2. 4

                                                                Consider, at a minimum, the existence of PyPy; PyPy’s own position is that PyPy will support Python 2.7 forever because PyPy is written in RPython, a strict subset of Python 2.7.

                                                                Sympathy is not required; what you’re missing out on is an understanding that Python is not wholly under control of the Python Software Foundation. By repeatedly neglecting PyPy, the PSF has effectively forced them to create their own parallel Python 2 infrastructure; when PyPI finally makes changes which prevent Python 2 code from deploying, then we may see PyPy grow even more tooling and possibly even services to compensate.

                                                                It is easy for me to recognize in your words an inkling of contempt for Python 2 users.

                                                                1. 21

                                                                  Every time you hop into one of these threads, you frame it in a way which implies you think various entities are obligated to maintain a Python 2 interpreter, infrastructure for supporting Python 2 interpreters, and versions of third-party packages which stay compatible with Python 2, for all of eternity.

                                                                  Judging from that last thread, you seem to think I am one of the people who has that obligation. Could you please, clearly, state to me the nature of this obligation – is its basis legal? moral? something else? – along with its origin and the means by which you assume the right to impose it on me.

                                                                  I ask because I cannot begin to fathom where such an obligation would come from, nor do I understand why you insist on labeling it “contempt” when other people choose not to maintain software for you, in the exact form you personally prefer, for free, forever, anymore.

                                                                  1. 2

                                                                    Your sympathy, including any effort or obligation that you might imagine, is not required. I don’t know how to put it any more clearly to you: You have ended up on the winning side of a political contest within the PSF, and you are antagonizing members of the community who lost for no other reason than that you want the political divide to deepen.

                                                                    Maybe, to get some perspective, try replacing “Python 2” with “Perl 5” and “Python 3” with “Raku”; that particular community resolved their political divide recently and stopped trying to replace each other. Another option for perspective: You talk about “these threads”; what are these threads for, exactly? I didn’t leave a top-level comment on this comment thread; I didn’t summon you for the explicit purpose of flamewar.

                                                                    Finally, why not reread the linked thread? I not only was clearly the loser in that discussion, but I also explained that I personally am not permanently tied to Python 2, and that I’m trying to leave the ecosystem altogether in order to avoid these political problems. Your proposed idea of obligation towards me is completely imagined and meant to make you seem like a victim.

                                                                    Here are some quotes which I think display contempt towards Python 2 and its users, from the previous thread (including your original post) and also the thread before that one:

                                                                    If PyPy wants to internally maintain the interpreter they use to bootstrap, I don’t care one way or another. But if PyPy wants that to also turn into broad advertisement of a supported Python 2 interpreter for general use, I hope they’d consider the effect it will have on other people.

                                                                    Want to keep python 2 alive? Step up and do it.

                                                                    What do you propose they do then? Extend Python 2 support forever and let Python 2 slow down Python 3 development for all time?

                                                                    That’s them choosing and forever staying on a specific dependency. … Is it really that difficult for Python programmers to rewrite one Python program in the newer version of Python? … Seems more fair for the project that wants the dependency to be the one reworking it.

                                                                    The PyPy project, for example, is currently dependent on a Python 2 interpreter to bootstrap and so will be maintaining their own either for as long as PyPy exists, or for as long as it takes to migrate to bootstrapping on Python 3 (which they seem to think is either not feasible, or not something they want to do).

                                                                    He’s having a tantrum. … If you’re not on 3, it’s either a big ball of mud that should’ve been incrementally rewritten/rearchitected (thus exposing bad design) or you expected an ecosystem to stay in stasis forever.

                                                                    I’m not going to even bother with your “mother loved you best” vis a vis PyPy.

                                                                    You’re so wrapped up in inventing enemies that heap contempt on you, but it’s just fellow engineers raising their eyebrows at someone being overly dramatic. Lol contempt. 😂😂😂

                                                                    If I didn’t already have a long history of knowing other PyPy people, for example, I’d be coming away with a pretty negative view of the project from my interactions with you.

                                                                    What emotional word would you use to describe the timbre of these attitudes? None of this has to do with maintainership; I don’t think that you maintain any packages which I directly require. I’m not asking for any programming effort from you. Indeed, if you’re not a CPython core developer either, then you don’t have the ability to work on this; you are also a bystander. I don’t want sympathy; I want empathy.

                                                                    1. 6

                                                                      You have ended up on the winning side of a political contest within the PSF, and you are antagonizing members of the community who lost for no other reason than that you want the political divide to deepen.

                                                                      And this is where the problem lies. Your behavior in the previous thread, and here, makes clear that your approach is to insult, attack, or otherwise insinuate evil motives to anyone who disagrees with you.

                                                                      Here are some quotes which I think display contempt towards Python 2 and its users

                                                                      First of all, it’s not exactly courteous to mix and match quotes from multiple users without sourcing them to who said each one. If anyone wants to click through to the actual thread, they’ll find a rather different picture of, say, my engagement with you. But let’s be clear about this “contempt”.

                                                                      In the original post, I said:

                                                                      The PyPy project, for example, is currently dependent on a Python 2 interpreter to bootstrap and so will be maintaining their own either for as long as PyPy exists, or for as long as it takes to migrate to bootstrapping on Python 3 (which they seem to think is either not feasible, or not something they want to do).

                                                                      You quoted this and replied:

                                                                      This quote is emblematic of the contempt that you display towards Python users.

                                                                      I remain confused as to what was contemptuous about that. You yourself have confirmed that PyPy is in fact dependent on a Python 2 interpreter, and your own comments seem to indicate there is no plan to migrate away from that dependency. It’s simply a statement of fact. And the context of the quote you pulled was a section exploring the difference between “Python 2” the interpreter, and “Python 2” the ecosystem of third-party packages. Here’s the full context:

                                                                      Unfortunately for that argument, Python 2 was much more than just the interpreter. It was also a large ecosystem of packages people used with the interpreter, and a community of people who maintained and contributed to those packages. I don’t doubt the PyPy team are willing to maintain a Python 2 interpreter, and that people who don’t want to port to Python 3 could switch to the PyPy project’s interpreter in order to have a supported Python 2 interpreter. But a lot of those people would continue to use other packages, too, and as far as I’m aware the PyPy team hasn’t also volunteered to maintain Python 2 versions of all those packages.

                                                                      So there’s a sense in which I want to push back against that messaging from PyPy folks and other groups who say they’ll maintain “Python 2” for years to come, but really just mean they’ll maintain an interpreter. If they keep loudly announcing “don’t listen to the Python core team, Python 2 is still supported”, they’ll be creating additional burdens for a lot of other people: end users are going to go file bug reports and other support requests to third-party projects that no longer support Python 2, because they heard “Python 2 is still supported”, and thus will feel entitled to have their favorite packages still work.

                                                                      Even if all those requests get immediately closed with “this project doesn’t support Python 2 anymore”, it’s still going to take up the time of maintainers, and it’s going to make the people who file the requests angry because now they’ll feel someone must be lying to them — either Python 2 is dead or it isn’t! — and they’ll probably take that anger out on whatever target happens to be handy. Which is not going to be good.

                                                                      This is why I made comments asking you to consider the effect of your preferred stance on other people (i.e., on package maintainers). This is why I repeated my point in the comments of the previous thread, that an interpreter is a necessary but not sufficient condition for saying “Python 2 is still supported”. I don’t think these are controversial statements, but apparently you do. I don’t understand why.

                                                                      I also still don’t understand comments of yours like this one:

                                                                      Frankly, I think that you show your hand when you say “really important packages like NumPy/SciPy.” That’s the direction that you want Python to go in.

                                                                      Again, this is just a statement of fact. There are a lot of people using Python for a lot of use cases, and many of those use cases are dependent on certain domain-specific libraries. As I said in full:

                                                                      So regardless of whether I use them or not, NumPy and SciPy are important packages. Just as Jupyter (née IPython) notebooks are important, even though I don’t personally use them. Just as the ML/AI packages are important even though I don’t use them. Just as Flask and SQLAlchemy are important packages, even though I don’t use them. Python’s continued success as a language comes from the large community of people using it for different things. The fact that there are large numbers of people using Python for not-my-use-case with not-the-libraries-I-use is a really good thing!

                                                                      Your words certainly imply you think it’s a bad thing that there are, for example, people using NumPy and SciPy, or at least that you think that’s a bad direction for Python to go in. I do not understand why, and you’ve offered no explanation other than to hand-wave it as “contempt” and “denigration”.

                                                                      But really the thing I do not understand is this:

                                                                      You have ended up on the winning side of a political contest within the PSF

                                                                      You seem to think that “the PSF” and/or some other group of people or entities in the Python world are your enemy, because they chose to move to Python 3 and to stop dedicating their own time and resources to maintaining compatibility with and support for Python 2. The only way that this would make any sense is if those entities had some sort of obligation, to you or to others, to continue maintaining compatibility with and support for Python 2. Hence I have asked you for an explanation of the nature and origin of that obligation so that I can try to understand the real root of why you seem to be so angry about this.

                                                                      Admittedly I don’t have high hopes for getting such an explanation, given what happened last time around, but maybe this time?

                                                                      1. 4

                                                                        Your behavior in the previous thread, and here, makes clear that your approach is to insult, attack, or otherwise insinuate evil motives to anyone who disagrees with you.

                                                                        As Corbin has said themselves multiple times, they are not a nice person. So unfortunately you can’t really expect anything better than this.

                                                              3. 2

                                                                Why will tomorrow be a bad day? pip will continue to work. They’re just stopping releasing updates.

                                                                1. 1

                                                                  From my OpenStack experience – many automated gates could go south, because they could do something like: pip install pip --upgrade hence dropping support for py2. I know, that whomever is involved in this conundrum, should know better and should introduce some checks. But I also know, that we’re all humans, hence prone to make errors.

                                                                  1. 2

                                                                    pip install pip --upgrade should still work, unless the pip team screwed something up.

                                                                    When you upload something to PyPI, you can specify a minimal support Python version. So Python 2.7 will get the latest version that still supports Python 2.

                                                                    And indeed, if you go to https://pypi.org/project/pip/ you will see “Requires: Python >= 3.6”, so I expect things will Just Work for most Python 2 users.

                                                              1. 4

                                                                It’s worth mentioning that the history of the MIPS ISA is more complex than this article makes it out. MIPS (and the company involved) was originally an independent ISA and CPU maker that scored design wins in both DEC and SGI machines, but had problems keeping up in the early 1990s. DEC developed its own Alpha architecture and SGI bought MIPS to secure the CPUs it was using. One view of the spin-out of MIPS from SGI in the late 1990s was as a tacit admission by SGI that MIPS CPUs could no longer compete as things stood and so controlling MIPS was no longer essential or even useful for SGI.

                                                                1. 9

                                                                  Interesting tangental thought:

                                                                  I saw this article and the URL and was like, “oh its the guy who hates Go again!” But when I looked at the actual website this author actually writes a significant amount about Go, the vast majority not negative. It seems like that the users of lobsters seem to be only interested in posting and upvoting anti-Go articles, and it is we who are biased.

                                                                  1. 4

                                                                    I’m the author of the linked-to article, and as you sort of noticed I’m pretty fond of Go (and use it fairly frequently). If I wasn’t, I wouldn’t write about it much or at all. I suspect that part of the way my writing about Go has come out the way it has is that I think it’s a lot easier to write about problems (including mistakes that you can make) than about things that are good and just work.

                                                                    (We have one core program in our fileserver environment that’s written in Go, for example, but it’s pretty boring; it sits there and just works, and Go made writing it straightforward, with simple patterns.)

                                                                    1. 3

                                                                      I quite fond of Go myself, and I enjoy reading your articles even when I thought you only talked about the problems! 😂

                                                                      I think negative language news (especially regarding “newer” languages) has more success on this site, so there’s a secondary filtering happening as well.

                                                                    2. 0

                                                                      who’s the guy who hates Go?

                                                                    1. 2

                                                                      Is there no mode that would share the physical network port but tag all IPMI traffic with a VLAN you configure?

                                                                      1. 6

                                                                        Many HPE servers have a dedicated network ports for the iLO card but can also optionally share one of the regular network ports if needed. When in shared mode, you can indeed configure a VLAN tag for the management traffic, which can be different to the VLAN tag used by the host operating system normally.

                                                                        1. 1

                                                                          Unfortunately, in the same way that chris explained that a any compromised host might be able to switch the device IPMI mode from dedicated to shared, using a VLAN for segregation can have a similar problem. If the compromised host adds a sub-interface with the tagged VLAN to their networking stack they now can gain network access to the entire IPMI VLAN.

                                                                          1. 2

                                                                            In addition there are other annoyance with using a shared interface. Because the OS has control of the NIC it can reset the PHY. If the PHY is interrupted while, for example, you’re connected over Serial over LAN or a virtual KVM, you lose access. If you’re lucky, that’s temporary. If you’re really unlucky the OS can continually reset the PHY making IPMI access unusable. A malicious actor could abuse this to lock out someone from remote management.

                                                                            That can’t happen when you use a dedicated interface for IPMI (other than explicit IPMI commands sent over /dev/ipmi0). Generally switching a BMC from dedicated mode to shared mode requires a BIOS/UEFI configuration change and a server reset.

                                                                            (Speaking from experience with shared mode and the OS resetting the NIC. The malicious actor is merely a scenario I just dreamt up.)

                                                                            1. 1

                                                                              Indeed, although I suspect in many cases these IPMI modules are already accessible from the compromised host over SMBus/SMIC or direct serial interfaces anyway - possibly even with more privileged access than over the network. That’s how iLOs and DRACs can have their network and user/group settings configured from the operating system.

                                                                              1. 4

                                                                                The increased risk mostly isn’t to the compromised host’s own IPMI; as you note, that’s more or less under the control of the attacker once they compromise the host (although network access might allow password extraction attacks and so on). The big risk is to all of the other IPMIs on the IPMI VLAN, which would let an attacker compromise their hosts in turn. Even if an attacker doesn’t compromise the hosts, network access to an IPMI often allows all sorts of things you won’t like, such as discovering your IPMI management passwords and accounts (which are probably common across your fleet).

                                                                                (I’m the author of the linked to article.)

                                                                                1. 3

                                                                                  The L2 feature you are looking for is called a protected port. This should be available on any managed switch, but I’ll link to the cisco documentation:

                                                                                  https://www.cisco.com/en/US/docs/switches/lan/catalyst3850/software/release/3.2_0_se/multibook/configuration_guide/b_consolidated_config_guide_3850_chapter_011101.html

                                                                                  1. 1

                                                                                    In a previous life at a large hosting we used this feature on switch ports that were connected to servers for the purposes of using our managed backup services.

                                                                        1. 2

                                                                          Doesn’t using proxies address this on top of performance and security boost?

                                                                          1. 1

                                                                            Using proxies just pushes the problem back one layer; now you need to maintain the proxy’s TLS configuration instead of your web server’s TLS configuration, but it still has to be maintained. Or you outsource maintaining it to Cloudflare. You still can’t walk away from the server entirely and let it keep running quietly.

                                                                            (I’m the author of the linked-to entry.)

                                                                            1. 1

                                                                              “In the era of HTTP, you could have set up a web server in 2000 and it could still be running today, working perfectly well”

                                                                              “And now you have to keep reasonably up to date with web server software, TLS libraries, and TLS configurations on an ongoing basis, because I doubt that the deprecation of everything before TLS 1.2 will be the last such deprecation”

                                                                              This is what I’m addressing. It seemed like folks you talk about wanted these servers to keep running. These diverse and interesting setups. Then, HTTPS’s varying needs gets in the way. So, we rely on a mature proxy whose developers and/or ecosystem handle all that so HTTP or whatever it proxies to keep working without all that work. Then, the rest can keep focusing on the non-HTTPS stuff that sits behind the proxies. There’s existing tools for that.

                                                                              “Another, more relevant side of this is that it’s not going to be possible for people with web servers to just let them sit.”

                                                                              This part remains true. Looking at the big picture, it probably was and always will be true for a lot of things in tech and life. Just due to how our environments change constantly whether offline or online. If anything, we should be pleasantly surprised when something we build still works five years later online without changes. Even more as pace of change and extra complexities increase over time.

                                                                          1. 8

                                                                            I think that coco and others who mentioned it are certainly correct that the package format, by itself, doesn’t improve the security situation. We need better support in Unix distributions for sandboxing of end-user applications. These package formats do help with that, in that by bundling all the dependencies inside the package, you relieve the distribution from the burden of identifying the dependencies and copying them in. Of course, to do it properly you would want the distribution to enforce the details of the sandbox configuration, rather than relying on the package to do it.

                                                                            As with containers for server software, this approach has drawbacks. If the distribution isn’t providing the dependencies, it also doesn’t have visibility into whether they’re up-to-date with all security patches. So these package formats as they stand actually take control away from end users… I can’t say I’m a fan. In other words, distributions should want to be involved in identifying the dependencies and copying them into any sandbox that’s being used.

                                                                            I do understand the benefit to having a format that works across distributions, but I really think that benefit is almost entirely to software publishers, and not to end users or to distribution maintainers. At least in the server world, system administrators do get some benefits from containers, but I don’t think that applies here.

                                                                            Since I do have higher-than-average security needs, I actually don’t ever use Flatpak, since the security trade-off would be unacceptable to me. I regard it as an important security feature to have binaries that are built by my distribution’s tooling, not by the software publisher. That means there are apps I can’t use at all, but that’s just how it is sometimes.

                                                                            I will take the opportunity, in passing, to plug NixOS. There is more work to be done wrt sandboxing of graphical apps, but you can already write a Nix package config which will build a Flatpak binary based on a Nix config, rather than on an upstream, publisher-provided recipe. Nix is far better than any other system that exists today, as far as easy configuration of dependencies. I hope that as people think about better solutions to these problems, they’ll keep the Nix tooling in mind and build on top of it when it makes sense to.

                                                                            1. 1

                                                                              I agree with the view that the first level benefits are primarily to software creators. Software creators get to build one artifact that will work on all Linuxes, and they get to distribute that artifact through a single place (where users can discover it, at least theoretically) instead of many. People like Canonical benefit indirectly through more software being available for Linux and perhaps through running the Snap store which leads to getting to take a cut of any purchases through it. People using Linux theoretically benefit through more software being available for it (and for their distribution), while small Linux distributions might benefit through more software being usable on them.

                                                                              (I’m the author of the linked to article.)

                                                                              1. 2

                                                                                I mean, this trend is specifically taking resources away from efforts to support packaging software properly for distributions. Users who understand their security needs have less software available as a result of it.

                                                                            1. 1

                                                                              I always thought the short 1U servers were for shallower racks, like the network boxes you often see on walls.

                                                                              1. 3

                                                                                The normal Dell rails for their short 1U servers are still full length (although perhaps you can get special shorter rails if you ask), so I don’t think they’d fit in short racks. Also, I believe that short telco racks are often only front-mount (with no back set of posts), which doesn’t work for server rails. My best guess about why Dell appears to like shorter 1U servers is ‘less metal for the case’, which presumably translates to cost savings.

                                                                                (I’m the author of the linked to entry.)

                                                                              1. 2

                                                                                I’m currently deep within the third world described in the article, internal client/server TLS. Already being within a private network, it’s unreasonable to purchase a unique certificate for every server host on the network.

                                                                                My best two options seem to be:

                                                                                1. Dynamic self-signed certificates created at server start up. Publish certificate to centralized & trusted location that clients can read from.
                                                                                2. Distributing a single certificate to entire server pool, signed by an implicitly trusted internal CA.
                                                                                1. 4

                                                                                  The standard approach seems to be an internal CA with some sort of automated certificate issuing mechanism (and often trusting only the internal CA, not any public CAs). This does require the automated CA stuff, but I believe there are open source projects for that. If that was too much work, I would be inclined to treat the situation like basic SSH, with a self signed certificate created on startup somehow (either centrally and then distributed, or locally and then published).

                                                                                  (SSH can also use the ‘internal CA’ route, of course, with server host keys being trusted because they’re signed.)

                                                                                  1. 1

                                                                                    We do have an internal CA, so I will probably go that route to get maximum coverage at sites we host. Unfortunately, clients can choose to host themselves and therefore will not trust our internal CA, leaving them to their own devices.

                                                                                    This service is very core to the company, so failing to form a secure connection means failing to ingest important data. I may end up having to go to a hybrid approach in the end.

                                                                                    1. 1

                                                                                      At least for our product at work (cloud-first with on-prem option), the TLS scheme used in “the wild” sometimes meshes badly with internal CA’s used by the on-prem customers. The “stumbling block” is often browsers like Chrome, which can’t easily be convinced to trust an internal CA.

                                                                                    2. 2

                                                                                      you want option 3, like @cks mentioned. Each service gets their own cert signed by your internal CA[1]. You would do the same with SSH[2] except obviously it’s by node for ssh instead of by service. Hashicorp Vault[0] will help manage all of this for you.

                                                                                      0: https://www.vaultproject.io

                                                                                      1: https://www.vaultproject.io/docs/secrets/pki/

                                                                                      2: https://www.vaultproject.io/docs/secrets/ssh/signed-ssh-certificates/

                                                                                    1. 24

                                                                                      In some cases, I have a great deal of sympathy for the author’s point.

                                                                                      In the specific case of the software that triggered this post? Not so much. The author IS TALKING ABOUT A SENDMAIL MILTER when they say that

                                                                                      Python 2 is only legacy through fiat

                                                                                      No. Not in this case. An unmaintained language/runtime/standard library is an absolute environmental hazard in the case of a sendmail milter that runs on the internet. This is practically the exact use case that it should absolutely be deprecated for, unless you’re prepared to expend the effort to maintain the language, runtime and libraries you use.

                                                                                      This isn’t some little tool reading sensor data for an experiment in a closed environment. It’s processing arbitrary binary data from untrusted people on the internet. Sticking with this would be dangerous for the ecosystem and I’m glad both python and linux distro maintainers are making it painful for someone who wants to.

                                                                                      1. 2

                                                                                        A milter client doesn’t actually process arbitrary binary data from the Internet in a sensible deployment; it encapsulates somewhat arbitrary binary data (email messages and associated SMTP protocol information that have already passed some inspection from your MTA), passes it to a milter server, and then possibly receives more encapsulated binary data and passes it to the MTA again. The complex binary milter protocol is spoken only between your milter client and your milter server, in a friendly environment. To break security in this usage in any language with safe buffer handling for arbitrary data, there would have to be a deep bug that breaks that fundamental buffer safety (possibly directly, possibly by corrupting buffer contents so that things are then mis-parsed at the protocol level and expose dangerous operations). Such a deep break is very unlikely in practice because safe buffer handling is at the core of all modern languages (not just Python but also eg normal Rust) and it’s very thoroughly tested.

                                                                                        (I’m the author of the linked-to blog entry.)

                                                                                        1. 2

                                                                                          I guess I haven’t thought about one where it would be safe… the last one I worked on was absolutely processing arbitrary binary data from the internet, by necessity. It was used for encrypting/decrypting messages, and on the inbound side, it was getting encrypted message streams forwarded through from arbitrary remote endpoints. The server could do some inspection, but that was very limited. Pinning it to some arbitrary library version for processing the message structures would’ve been a disaster.

                                                                                          That’s my default frame of reference when I think of a milter… it processes information either on the way in or way out that sendmail doesn’t know how to and therefore can’t really sanitize.

                                                                                          1. 1

                                                                                            For us, our (Python) milter client sits between the MTA and a commercial anti-spam system that talks the milter protocol, so it gets a message blob and some metadata from the MTA, passes it off to the milter server, then passes whatever the milter server says about the email’s virus-ness and spam-ness back to the MTA. This is probably a bit unusual; most Sendmail milter clients are embedded directly into an MTA.

                                                                                            If our milter client had to parse information out of the message headers and used the Python standard library for it, we would be exposed to any bugs in the email header parsing code there. If we were making security related decisions based on header contents (even things like ‘who gets how much spam and virus checking’), we could have a security issue, not just a correctness or DoS/crash one (and crashes can lead to security issues too).

                                                                                            (We may be using ‘milter client’ and ‘milter server’ backward from each other, too. In my usage I think of the milter server as the thing that accepts connections, takes in email, and provides decisions through the protocol; the milter clients are MTAs or whatever that call up that milter server to consult it (and thus may be eg email servers themselves). What I’m calling a milter server has a complicated job involving message parsing and so on, but a standalone client doesn’t necessarily.)

                                                                                            1. 2

                                                                                              Mine was definitely in-process to the MTA. (I read “milter” and drew no client/server distinction, FWIW. I had to go read up just now to see what that distinction might even be.) Such a distinction definitely wasn’t a thing I had to touch in the late 2000s when I wrote the milter I was thinking about as I responded.

                                                                                              The more restricted role makes me think about it a little differently, but it’d still take some more thinking to be comfortable sitting on a parsing stack that was no longer maintained, regardless of whether my distro chose to continue shipping the interpreter and runtime.

                                                                                              Good luck to you. I don’t envy your maintenance task here. Doubly so considering that’s most certainly not your “main” job.

                                                                                        2. 1

                                                                                          Yeah, it’s a good thing they do, it’s not the distro-maintainers fault that Python became deprecated.

                                                                                        1. 5

                                                                                          It’s got to be a keyword, that’s how it can work with is not, not also being a keyword.

                                                                                          Also there’s a quite ubiquitous use of is that justifies its inclusion; compare a is None to a == None to not a… only by testing the identity can you know for certain that None was passed.

                                                                                          1. 2

                                                                                            You put that very clearly, here and in the other comments; I learned something today.

                                                                                            One other ubiquitous use of is: type inspection, like if type(a) is MyClass or if type(a) is str.

                                                                                            (Some of the time isinstance(a, MyClass) will also do, but if you want to exclude the possibility of subclass instances, only a type identity check suffices.

                                                                                            Ooonn the other hand, one could also argue that is tempts people into checking for identity when a subclass would also suffice; and that this may needlessly prevent interoperation with otherwise-valid subclasses. Hm. I really like the keyword, though, especially compared to ===)

                                                                                            1. 5

                                                                                              Note that you don’t normally need is for this sort of type inspection, as type(a) == MyClass works fine pretty much always. I think the only time it wouldn’t work correctly is if one of the classes involved had a metaclass that overrode __eq__ and did something perverse with it. I think that the cases where you really need is are uncommon; you need to care about object identity and be in a situation where something is overriding == in a way that affects the comparison you care about. Python definitely should have something that checks object identity in the way that is does, but I don’t know if it needs to be a tempting keyword instead of a built-in function.

                                                                                              (I’m the author of the linked to article.)

                                                                                              1. 2

                                                                                                Every language must have a means of checking both identity and equality … they are fundamentally different things. The choice to make both of them an operator is quite common, the only real distinction Python makes by choosing == and is over, say, == and === is that promotion to also being a keyword.

                                                                                                And just because you find it to be uncommon does not mean that it is, in fact, uncommon. I use is all the time, for instance with enum.Enum members, to properly handle default values that need to be mutable, and to identify sentinel values in iterators (usually None, but not if None is a legitimate value of the iterator) … it’s also extremely necessary to things like the Singleton pattern in Python.

                                                                                                Moreover you’re throwing out the baby with the bathwater … sure, exactly as you said you can use type(a) == MyClass pretty much anywhere you might use type(a) is MyClass, but why would you? Why necessarily invoke the performance cost of equality comparison? The underlying cost of is is near zero, but by using == you force lookup and execution of MyClass.__eq__ and, if that returns False or NotImplemented you further force lookup and execution of type(a).__eq__ … both of which could be implemented in such a way as to be unbearably time consuming.

                                                                                                See the question of identity is, precisely, “are these the same thing” rather than “do these share the same attributes” … they’re fundamentally different questions, and when all you care about is identity, why bother asking for equality?

                                                                                            2. 2

                                                                                              Arguing that it has to be a keyword because otherwise you would have to write it differently is a weird argument to me. It just means that it would be written not is(a, b) instead, which might not read as nicely, but that’s a different argument.

                                                                                              1. 7

                                                                                                Perhaps I should have been more specific: is needs to be either a reserved keyword or an operator (in effect it’s both, though it’s not truly an operator because, like and, or, and not it cannot be overloaded) precisely because Guido et all intended it to be used as a syntactic construct… it cannot be a function because it would then have to be used with the call syntax, be able to be assigned to a name, and overloadable in user code. There is no desire for an overloadable identity operation, indeed allowing it to be overloaded would break fundamental language semantics. The same is true of None, which is why it’s also a keyword, though it could just be a globally available variable… in Python the argument for promoting to an operator is “should this bind values around it into an operation” and for promotion to a keyword is “should the meaning of this never even potentially change during runtime” and (optionally) “should this modify other keywords” … in the case of is the answer to all of those is “yes”, so the only option was to make it a keyword.

                                                                                                1. 1

                                                                                                  It seems to me like the same arguments would mean that isinstance should also be a keyword though?

                                                                                                  1. 3

                                                                                                    Sure, it could be extended to any number of things — Python 2 didn’t see True or False as keywords, but that protection was extended to them in Python 3 — but would it add any semantic value to do so, by stripping call semantics from it and allowing it to be used as an operator or to directly interact with other keywords?

                                                                                                    Some keywords have been downgraded (most notably print), but that was because there was added value in doing so, and also a colorable argument for overloading existed. The simple fact is that identity testing is a fundamental operation — much more so than introspecting class phylogeny — and the benefits of being a keyword far outweigh the minor confusion it can cause in relative newbies, while there’d be no advantage to removing it except for people who don’t know why it’s fundamental.

                                                                                                    In other languages you’ve got equality as == and identity as ===… frankly we’ve got it easy with is.

                                                                                                2. 2

                                                                                                  is being built in means you know it can’t be overridden.

                                                                                                  It being a function would be ugly