1. 1

    I’m surprised that they still keep recompressing pack files to be bigger and bigger. IIUC the main benift of large packfiles is a large surface for delta compression. However most of the repo won’t delta-compress with the rest of the repo so why do you need to stuff it into one file? It seems like once files were a GiB or there is little benift to making them larger.

    I guess we will learn more about this when they complete the additional work mentioned at the bottom of the post 🙂

    1. 3

      However most of the repo won’t delta-compress with the rest of the repo so why do you need to stuff it into one file?

      Commits are snapshots, not diff. So multi versions of a file can be delta compressed very well.

      Ofc, there are always edge cases. But for most git repo, delta compression works quite efficient ly.

      1. 2

        Would be ideal if one could tell git what constitutes a component in a large monorepo, so git could produce packfiles per component. But we probably won’t see that for years.

        1. 1

          If you want git to know about the parts, why use a monorepo at all?

          1. 1

            One of the reasons is discovery. Whenever I joined a new company, what helped me the most in figuring out how things work was going through the code. It’s way easier to do it in a single repo, then in tens/hundreds of them.

      1. 1

        The biggest issue I have with What3Words is that it’s based on the language with some bizarre pronunciation rules. That is, for non native speakers, correctly pronouncing English words is quite hard. Thus, it has a limited use globally, where majority of people have no idea how to pronounce the coordinates. Even worse, I could see how a british person could be completely misunderstood anywhere on Balkans.

        1. 2

          Lojban might be a better specified language, but it fails at the most important job of a language: unite people from disparate places and walks of life.

          English does a pretty good job of that.

          1.  

            One of the analyses showed that it doesn’t control for homophones properly, so even if you are a native speaker then there are words that give nearby locations that can sound too similar. It doesn’t matter if the words are pronounced differently by folks who are native speakers of another language, as long as they’re not pronounced in a way that can be confused easily with another word. Unfortunately, if fails spectacularly on this metric.

          1. 11

            Because this thing direly needs a TL;DR:

            The core flaw is in the misclassification of a bare-boned script-based application as “not a bundle”, which allows code to run that would normally have been blocked by your corporate overlords.

            1. 1

              Yeah, this article brings such great findings and provides a really thorough root cause analysis, but it’s written in such a bad way. It reads like History Channel shows: “and soon we will show you this. That thing, soon we will show you that…”

            1. 4
              1. 24

                This is a weird comment. I checked the link, and my understanding is that due to the way SQLite parses the types, FLOATING POINT would be treated as INT as it ends with INT. Who writes “floating point” as a type? That’s not even in the standard.

                I read that tweet as classical “let’s shit about stuff others wrote so I look smart”. SQLite has its flaws, just like every software written, but it is a really good piece of software.

                1. 6

                  I mean, I’m not sure what the tweet is saying, but you can insert a string “foo” into a column declared “int” no problem in SQLite.

                  1. 3

                    I read that tweet as classical “let’s shit about stuff others wrote so I look smart”. SQLite has its flaws, just like every software written, but it is a really good piece of software.

                    I wrote this tweet, it’s just a joke, riffing on the popular How SQLite is Tested article which gets shared often. SQLite does have an absurd implementation of a “type affinity” system which takes the spelling of a type into consideration. I think we can all have a good laugh about that and also appreciate how unbelievably high-quality the software itself is.

                    I think your comment was the classic “this other guy is doing the classic asshole thing” asshole thing ;)

                    1. 2

                      Well, sure, but people tend to take certain jokes seriously if it lines up with their confirmation biases, inadvertently turning the jokes into actual criticism.

                    2. 1

                      Wouldn’t it be more reasonable to die than to try to guess what the programmer meant?

                    3. 3

                      Ran into issues with this just this week. Never again

                      1. 1

                        Perhaps you have multiple apps using the same db?

                      2. 2

                        Usually I’m kinda thankful if databases stay away from type systems, because I’d rather have a minimal one than a wrong one, but what really hurts is that even those types are gone as soon as you use a database function.

                      1. 8

                        A shame to build this kind of thing with LVM2. ZFS could have been designed for this use case: It can create a block device, snapshot it, create clones (in constant time), modify them by applying more layers, and store only the deltas.

                        1. 3

                          The licensing issues around ZFS make it a non-starter for a lot of places.

                          1. 2

                            There are no licensing issues around ZFS, there are licensing issues round ZFS on Linux. That sounds like a reason to build your infrastructure on FreeBSD, rather than a reason to use LVM2.

                            1. 1

                              What’s the freebsd container ecosystem look like? Vanishingly small compared to what is available for Linux. Picking freebsd to get zfs in a container runtime system is the tail wagging the dog. As nice as zfs is you can still build cool stuff with LVM2/DM.

                              1. 1

                                As I understand it, fly.io uses a lot of Linux-specific technologies - Amazon’s Firecracker, Wireguard, eBPF, etc. It’s not impossible to build something that runs Docker containers based on FreeBSD, but switching fly.io to FreeBSD would not be a trivial undertaking.

                                1. 1

                                  If I understood the blog post correctly, they’re running the containers in a separate VM. There’s no reason why that VM couldn’t be Linux even if the VM that’s exporting the block devices is FreeBSD.

                              2. 1

                                Honest question about dishonesty: if a company never advertises they use ZFS, how can licensing issues affect them? How would anyone detect they used ZFS?

                                1. 1

                                  The licensing issues only apply to open-source distribution. A company can use whatever they want as long as they don’t distribute the software under a license that clashes with that of ZFS.

                                  1. 1

                                    Disgruntled staff leaking info? Server logs from package downloads? Discovery in an unrelated lawsuit?

                              1. 17

                                We’re sadly missing a tag for incident stories, as these carry quite a lot of information and chances to learn from.

                                1. 2

                                  I’m using home-manager (Nix): https://github.com/knl/dotskel. Even though I would like to avoid wasting 6 minutes to manually install apps and instead spend 6 hours to fail to automate it, I didn’t get to it. Thus, I keep a list of the apps that I need to install (README is essentially a runbook).

                                  In the past I had positive experiences with JAMF stack, where I would just push a bunch of scripts to configure the machines. Almost declarative, since there are many scripts available online :)

                                  1. 1

                                    I only heard of that game, and looking at the website doesn’t give out more information. What is this exactly trying to solve? What are all these terms (turn/river/combos/flop/…)?

                                    1. 3

                                      It’s the most popular variant of Poker, https://en.wikipedia.org/wiki/Texas_hold_'em

                                    1. 1

                                      I know I’m late to the party, but wasn’t there some system that offered 5x speedup over MySQL for lobste.rs?

                                      After a bit of digging it was 2 years ago:

                                      I don’t know how feasible would be to use it (PhD theses powered work tends to receive less attention after the main author graduates), but it might be an alternative…

                                      1. 1

                                        Isn’t this:

                                        if let Ok(sockpath) = std::env::var("SOCKPATH") {
                                            use tokio::net::UnixListener;
                                            use tokio_stream::wrappers::UnixListenerStream;
                                            let listener = UnixListener::bind(sockpath).unwrap();
                                            let incoming = UnixListenerStream::new(listener);
                                            server.run_incoming(incoming).await;
                                        } else {
                                            server.run(([0, 0, 0, 0], port));
                                        }
                                        

                                        running against the whole premise of the approach? Specifically, the second paragraph says:

                                        Mostly to prevent you from messing up and accidentally exposing your backend port to the internet.

                                        Yet, with the code above, by forgetting the SOCKPATH we’re going to expose the port on the internet. I really like the approach of preventing/limiting the risks of configuration and deployment errors. Maybe running the server on 127.0.0.1 by default would be a better approach?

                                        1. 4

                                          Is this AppleScript for people who prefer Lua?

                                          1. 3

                                            No, because AppleScript responses need to be provided by the application. Hammerspoon allows for much more, treating apps like black boxes. Sure, you can direct some keypresses to them, but it doesn’t talk to apps otherwise.

                                          1. 7

                                            I manage my dotfiles with git and home-manager so I have no real use case for this but for someone new to dotfiles this looks promising.

                                            1. 1

                                              Could it still be useful for config files that can’t be read-only?

                                              1. 2

                                                Those are just in my git repository in my home directory.

                                                1. 2

                                                  Yes, you can use link https://github.com/knl/dotskel/blob/main/home.nix#L17.

                                                  That essentially creates a matroshka of symbolic links, where the last one points to the checked out file. This is because for nix, all files have to be in the nix store. However, home-manager uses a clever trick to store as the content the pointer to the file outside of the nix store.

                                                  I’ve used chezmoi in the past, but switched to home-manager as it’s model is easier for me to understand.

                                              1. 3

                                                I’m still using Macs as both my personal and professional computer, but been wondering what’s next on both sides. First, my personal machine is 10 years old, and getting a new Mac is expensive. Second, it’s getting frustratingly hard to use Macs as a development machines, when code has to run on Linux. Despite unixy nature, they are diverging hard… getting Valgrind to work is nigh impossible, many packages that work nicely on Linux are still lagging on Macs. I believed that Nix will solve my issues of having means to easily bring dependencies to projects, but getting the Mac side to work is starting to take more and more time…

                                                1. 18

                                                  This is exactly the sort of reason I use magit. The interactive rebase is phenomenal. I regularly reorder, fine-tune and touch up a series of commits before pushing.

                                                  1. 10

                                                    I came to the comments just to say this. Sometimes people look at good GUI git tools (like magic) and scratch their heads and ask ‘why?’. This is the reason why.

                                                    1. 2

                                                      I have recently started using magit, butbI don’t find it significantly different from normal interactive rebase? The keybinds are a bit nicer

                                                      1. 2

                                                        I use magit-commit-instant-fixup a lot. There are equivalents in regular interactive rebase, but in my experience they’re nowhere near as fast (I sometimes do Instant Fixup very frequently while cleaning up a branch with a number of atomic commits, and not wanting to break the order or create a bunch of “fixup” commits to rebase in one go later.)

                                                        I think the main advantage is being able to edit, stage, fixup in the same interface without any significant context switch.

                                                        1. 1

                                                          Completely agree, magit is just amazing because it’s so well thought through, and doesn’t require context switching. For me, the fact I can interact with the version control system using the same verbiage (key strokes) as the editor helps so much.

                                                        2. 1

                                                          I stage chunks and then make a commit, that’s my magit solution to the original post. I also use smerge and ediff3 for merge and rebase conflicts. I had to read the magit manual to learn all the functionality, maybe that’ll help you?

                                                          1. 1

                                                            What’s the advantage over good old git gui, and maybe a good graphical merger like kdiff3?

                                                            1. 2

                                                              Part of the advantage comes from keeping everything in emacs, and is therefore self-reinforcing (i.e. keybindings baked into muscle memory, etc.) I use Meld a bit sometimes for “overview” diffs, but if you already know emacs then the workflow of resolving conflicts with magit-ediff-resolve is super fast - full keyboard shortcuts, fully integrated in the editor, etc.

                                                      1. 3

                                                        I’m a bit puzzled by this article and I might be missing something. In the given example, depending on the type of the machine (big endian/little endian) one has to use different extraction methods for the uint32 in the network order. That’s exactly the use case for ifdef, if I were to build a binary for different architectures.

                                                        1. 7

                                                          In the given example, depending on the type of the machine (big endian/little endian) one has to use different extraction methods for the uint32 in the network order.

                                                          Not at all – if you read the example carefully, the author is making the point that, depending on the type of the peripheral (not the host machine!) you can extract the uint32 once, straight into native format, regardless of what the native format is.

                                                          That is, if you need to read a uint32_t, you can either:

                                                          a) Read it straight into a uint32_t on the host and swap the bytes as needed depending on host and peripheral byte order, or

                                                          b) Read it into an array of 4 uint8_ts, at which point the only variable in this equation is the peripheral order (because the result of data[0] << 0 | data[1] << 8 | data[2] << 16 | data[3] << 24 doesn’t depend on host order)

                                                          In terms of performance, things are a teeny tiny bit less black-and-white than the author makes it seem, depending on how smart the underlying compiler is and on how good the underlying architecture is at shifting bytes, unaligned access and the like.

                                                          But in terms of code quality my experience matches the author’s – code that takes route a) tends to end up pretty messy. This is particularly problematic if you’re working on small systems with multiple data streams, from multiple peripherals, sometimes with multiple layers of byte swapping (e.g. peripherals have their own byte order, then the bus controller at the MCU end can swap the bytes for you as well, and the one little-endian peripheral on that bus gives you 12-bit signed integers).

                                                          This is likely why the author hasn’t mentioned man byteorder, as @viraptor suggested. There’s no shortage of permissively-licensed byteorder-family functions for these systems if you’re not writing against a Unix-y system, but in these cases – where you get data from different peripherals, with different native byte orders, over different buses — the concept of “network” ordering is a little elusive. If you’re on a little-endian core you do ntoh conversions for big-endian peripherals, but what do you do for little-endian peripherals? Presumably, not “htoh” (note for confused onlookers: there’s no htoh ;-)), you leave the result as is, but in that case your code isn’t portable for big-endian cores. *to* functions implicitly rely on the relationship between network and host order, which works okay when the network byte order is clear and homogenous, but – as the author of this post points out – it breaks down as soon as you deal with external byte streams of multiple endiannesses.

                                                          (Edit: this is a point that Rob Pike, and others from the Plan 9 team, have made over the years. I thought this was someone echoing that point but lol, turns out this is Pike’s blog?)

                                                          1. 1

                                                            If you’re on a little-endian core you do ntoh conversions for big-endian peripherals, but what do you do for little-endian peripherals?

                                                            In that case use more modern https://linux.die.net/man/3/endian

                                                            htobe32 / htole32 have you covered. htonl is just nicer in cases where you don’t give people choice - network is network, don’t think about which one is it specifically.

                                                            1. 7

                                                              The author’s argument is that portable code should be endianness-independent, not that it should handle endianness with syntactic sugar of the right flavour. The “modern” (meh, they’re about 20 years old at this point?) alternatives work around the ambiguous (and insufficiently diverse) typing of the original API but don’t exhibit all the desirable properties of the version that Pike proposes.

                                                        1. 29

                                                          I agree with the points raised in the article. One thing to add is that many tools built in academia are there to support the paper, not to take a life of their own. Given the short life of a thesis, there is not much one could do to make these tools gain much traction (all examples in the article show this).

                                                          There is another weird thing in our industry — I’ve seen many companies embracing the latest shiny tools and frameworks (k8s, react, containers, …), yet when it comes to thinking and implementing ways to speed up developer work by improving build times or reorganize dependencies, that is always a task for the intern, the least suitable person to handle such a thing.

                                                          1. 10

                                                            yet when it comes to thinking and implementing ways to speed up developer work by improving build times or reorganize dependencies, that is always a task for the intern, the least suitable person to handle such a thing.

                                                            That has not been my experience. I’ve been tasked with that sort of work many times, and given permission to pursue it many more. Currently a senior UI engineer at Microsoft, but previously worked at a 500 person EdTech company for several years, and had a brief, but enjoyable stint at Todoist. All three were very happy to give that work to an experienced developer.

                                                            Have I had managers who were averse to that? Yes. But it has been the exception.

                                                            1. 3

                                                              Agreed with this. My experience at both large and small tech firms has been that when a senior person wanted to work on developer productivity, it was almost always welcomed. Maybe it’s different at companies whose product is something other than what the developers are building, though.

                                                            2. 7

                                                              The problem in academia is that it is controlled by where the professor can get money from. There is a lot more grant money for making shiny new tools than for maintaining and adapting old tools, and your academic life depends really strongly on how much grant money you can bring in (determines how many students you can support, how many conferences you can send your students to, and what equipment you can give your students). (New papers are also indirectly about grant money. You need new papers to bring in grant money, and you can’t write new papers simply by maintaining or updating existing tools).

                                                              1. 3

                                                                I agree, and it’s also about promotion criteria; it would be hard to get tenure if all you ever worked on was maintaining old tools. I think the solution is to create a new tenure-track academic role which is evaluated, promoted, and funded based on the maintenance and creation of useful open-source artifacts (rather than the traditional goal of the discovery of novel knowledge).

                                                            1. 1

                                                              Perhaps I’m missing something obvious, but is the code mentioned in the article available anywhere? It would be nice to see the same program in the four different languages side by side.

                                                                1. 2

                                                                  Yup, it’s available here: https://github.com/zserge/glob-grep (it was linked in the middle of the article, easy to miss for skimmers :))

                                                                  1. 1

                                                                    Yep, I just skipped right to the first language header. My bad. Thank you.

                                                                1. 6

                                                                  Interesting quote in there:

                                                                  rsync.net has no firewalls and no routers. In each location we connect to our IP provider with a dumb, unmanaged switch.

                                                                  I like the pragmatic, let’s keep it as simple as possible approach.

                                                                  1. 6

                                                                    I wished there was a Linux distro that allowed me to simply install the Rust versions over the unsafe ones.

                                                                    Much easier to see what breaks in practice, instead of trying to chase 100% bug compatibility.

                                                                    1. 10

                                                                      This looks like a cool tool, but I would hesitate to call any llvm based jit “safe” in the rust sense of the word.

                                                                      1. 7

                                                                        author here, I just want to echo this sentiment, but point out some subtleties.

                                                                        There’s some unsafe code in the runtime, and then all of the JIT code (particularly LLVM) should really be considered unsafe.

                                                                        By default, however, frawk is using Cranelift to JIT the code. Cranelift is a pure-rust project, so I’d expect it to be safer to use than LLVM. Still, JITs like the one in frawk are going to be inherently unsafe. Even Cranelift is providing you with a low-level builder API that doesn’t check the generated code is memory-safe, so running that generated code is still unsafe (both in the Rust sense, but also in the colloquial sense I’d say).

                                                                        1. 4

                                                                          Even if it’s just compiling in a single back end for LLVM, there is vastly more unsafe C++ code in the ‘safe’ Rust version than in a typical C++ implementation of awk.

                                                                          1. 2

                                                                            Eh, how much unsafe c++ code do you think there is in llvm?

                                                                            1. 3

                                                                              Around 10MLoC. Nothing in LLVM uses the .at (bounds checked) accessors instead of operator[] (not bounds checked) for example. Nothing in LLVM is safe in the Rust sense of the word. Most of the unsafe things are hidden in classes like SmallIntPtrPair, which hides an integer in the low bits of a pointer (would require unsafe in Rust), but there are a lot of abstractions in LLVM that are built on things that would not be permitted in safe Rust.

                                                                        2. 2
                                                                          1. 1

                                                                            In NixOS you can do this. Technically, on any Linux distro “enriched” with Nix you could do it.