1. 2

    I would love to practice, but I’m usually exhausted by work, so I try not to think about code in my free time. I realize this will make me a worse programmer over time, relative to others. After all, all the best programmers I know code a lot outside of work.

    1. 2

      I realize this will make me a worse programmer over time, relative to others. After all, all the best programmers I know code a lot outside of work.

      I disagree, I think striving for balance in your life and making time for family, friends, relationships, life admin, and non-technical hobbies benefits your health and ultimately makes you a balanced individual. Being balanced makes you a better human being.

      1. 1

        Yes, it makes your more well-rounded. But someone who programs for more hours, even without much intentionality, is going to be a better programmer over time.

    1. 2

      I’m going to the New York Film Festival! Excited to see Benedetta and Titane on the big screen, with the restored Kummaty on a smaller one. The line up this year is phenomenal and I’ve been clearing my schedule for it.

      I would like to also spend some time writing poems, maybe revise some of my older ones.

      1. 4

        I worked on a cross-platform Windows/Mac desktop app built in C++ using Chrome Embedded Framework, which essentially gives you a webframe to render HTML/CS/JS to. The UI was a small layer on top of a very complex logic layer. It gave us some level of “coordinated featurefulness”. A simple change like adding a button or changing text was always replicated on both platforms. We were able to use the company stylesheets for the UI, just as in the web - so this was good for brand consistency. But some simple and most complex things required significant custom work. Simple things like the default margins of windows meant you had to sometimes change stylesheets per platform. More complex things like accessibility and desktop notifications were totally custom and required weeks of work. Accessibility work required us to learn how the OS screenreader APIs worked, which ones Chrome implemented, and which ones were recognized by screenreaders. There’s a default, built-in one on Mac (VoiceOver), but on Windows we just picked the most popular one (Jaws). Overall, it was a mixed bag. Using CEF didn’t make the hard things easier, but made the easy things easy.

        1. 2

          More complex things like accessibility and desktop notifications were totally custom and required weeks of work. Accessibility work required us to learn how the OS screenreader APIs worked, which ones Chrome implemented, and which ones were recognized by screenreaders. There’s a default, built-in one on Mac (VoiceOver), but on Windows we just picked the most popular one (Jaws). Overall, it was a mixed bag. Using CEF didn’t make the hard things easier, but made the easy things easy.

          I’m pretty sure that CEF still made accessibility easier than it would have been if you had used a custom GUI toolkit or even implemented a custom control using Win32 or Cocoa. It’s unfortunate that you had to test and work around screen-reader-specific quirks, but doing that using HTML and ARIA is still a lot easier than implementing the platform-specific accessibility APIs.

        1. 28

          Perhaps related: I find that a certain amount of redundancy is very useful for error checking, both by the speaker/author and listener/reader. This might take several forms:

          • I repeat myself often when giving talks or writing, so you never have to jump too far back to an earlier point in time / space. If I say something that is very similar but slightly different than before, I’m either doing it to draw attention to an intentional change, or I’ve made a mistake! Either way, people are paying attention closely to the difference.

          • When writing math, I never rely on symbols or words alone, but a mixture of both. The rule I follow is that you should be able to understand the sentence even if most of the symbols are deleted. This is invaluable for catching mistakes, and also for establishing notation – using words alone can be imprecise, and using symbols alone might be disorienting for a student or for someone familiar with slightly different conventions. The document should be skimmable, and every theorem should be more or less self-contained.

          • When writing code, I prefer to comment each logical block with some explanation of “what, why, how”, even if an educated reader might be able to infer from context. The point is that if their inference about the code does not match my comments, then they can confidently conclude the code has a bug, and it’s not just some mysterious hack they should be afraid of changing. Yes, this means maintaining comments along with the code, but I find that to be good practice anyway.

          A good way to summarize might be: When communicating, it’s important to keep in mind the predictions / assumptions your audience will make about your speech / writing. They’ll spend a lot of cognitive effort working through the things you say that don’t match their predictions. So, make sure you only surprise them with the new information you’re trying to communicate, rather than potential distractions like unfamiliar notation or syntax. (non-standard notation is ok and sometimes preferable, but the meaning should be evident from context!)

          1. 9

            The rule I follow is that you should be able to understand the sentence even if most of the symbols are deleted.

            As someone bad at math, I appreciate you.

            1. 12

              Heavy use of symbols is an underappreciated roadblock for a lot of people who might otherwise be able to grasp math concepts, I think.

              As someone who didn’t study math beyond basic calculus, I’ve had the experience of being confronted with a page full of unfamiliar glyphs and having no idea what to make of it, but then understanding the idea just fine once someone translated it to English for me.

            2. 5

              In one concrete example, I try to never use “former” and “latter” to refer to previous things because it requires a jump. You can usually summarize the options in a small phrase.

              1. 3

                I wonder how many, like me, have to go through mental gymnastics to separate “former” and “latter”. I know latter comes after former because of the resemblance to “later”.

                1. 2

                  I use the same mnemonic!

                  Though I wouldn’t go so far as calling it gymnasgics (for me personally), I do think it inherently causes a pause to backtrack and re-parse.

                  Another point for the Finnish language because of having a tighter coupling to all the other words referring to something earlier or later. I can’t remember how this is properly expressed in German or French, but Swedish also gets this quite correct.

                  English just makes it unnecessarily difficult.

                  1. 2

                    I thought the English construction was somehow derived from French (or Norman, rather) but apparently it’s from Old English if this blog post is to believed: https://www.grammarly.com/blog/former-vs-latter/

              2. 2

                This is an underrated statements. I find that the “avoid needless words” sentiment does more harm than good, even in fiction, and can be disastrous in technical writing.

                Human speech naturally has some redundancy in it, and the higher the probability of errors and the higher the cost of those errors, the more redundancy is required. Clarity and redundancy aren’t mutually exclusive, sometimes clarity is redundancy.

                1. 1

                  The point is that if their inference about the code does not match my comments, then they can confidently conclude the code has a bug, and it’s not just some mysterious hack they should be afraid of changing.

                  Nitpicking: what if it’s the comments that are incorrect or out of sync with the code? With both code and comment, now you have two sources of truth…

                  1. 9

                    Exactly! If the code doesn’t do what the comments say it does, whoever is reading will file an issue and the matter will be sorted out. This redundancy is really helpful for hunting down bugs and avoids two problematic scenarios:

                    1. If someone has changed the code but not the comment, they probably didn’t do a thorough enough job reading the surrounding context to actually understand the impact their changes will have on the rest of the code. So they’ve either introduced a bug, or they forgot to verbally explain the reason for their changes, which itself warrants a bug report.

                    2. I feel that code takes on just as much (or more) technical debt when the documentation is missing as when the documentation is incorrect. Incorrect documentation (as long as it is colocated with the code it describes) can act as a big red flag that something is wrong. When documentation is missing, anyone who attempts to modify the code will be afraid to make any changes, afraid that the suspicious lines really do serve some mysterious purpose that they just don’t understand yet. If someone is afraid to make changes, they might instead add code to wrap the mysterious bits

                    Some caveats:

                    • It does take some discipline to work this way, and I haven’t tried enforcing this style in large codebases / large teams. I have seen it work well on smaller projects with one or two like-minded collaborators. It’s especially useful on solo projects where I might work on the project in bursts of a few days once every few months. This way I document my in-progress thought process and can be more confident that local changes to the code will work as intended, even if I haven’t touched the code in three months.

                    • For documentation that is not written inline with the code itself, and intended to be consumed by users of the software, not developers, I feel differently about missing vs incorrect documentation. In this scenario, it’s much harder to keep the documentation in sync, and incorrect documentation is more dangerous.

                    1. 4

                      I think that’s the point, just like parity bits on memory. At least you know something isn’t right.

                  1. 12

                    Oh God, I haven’t finished so many things that I’ve long lost count of them. I have a long series of projects in various states of unfinished. I don’t really regret it, most of them are unfinished because there was something I really wanted to do, and I did it, and the rest of the project was just an excuse to do that particular thing. Others, especially those that I did in a professional context, were cut short by budget and/or time constraints. But it’s fun to reminisce about them. In no particular order, some of the things I started and never finished in the last 15 years or so include:

                    • Single-board computers, based on various processors (6809, Z80, 8086, we-don’t-need-no-stinking-microprocessor-I’ll-just-wire-my-own – the Z80-based one is that sorta made it to the breadboard stage). But I did build/play with various parts that I didn’t understand well enough – like clock generators or the most expensive programmable interrupt controller in history, an 8259-ish clone that “ran” on a Zynq-7000 development board because that’s what I had lying around at work. Honestly, the biggest reason why none of these got finished is that I didn’t really want to build a whole computer, I just wanted something with front-panel switches. I have some killer front panel designs, I just don’t have a computer to plug them into :-D.
                    • Sort of in the same vein, an emulator. I must have started dozens of them but never finished one. It’s one of those goals I never accomplished but one day it’s gonna happen.
                    • A debugger/monitor for small (e.g. MSP430) systems (context: I was working on an operating system for that kind of devices at the time – that one was actually finished, put in boxes and sold and all – and I wanted to test/prototype various pieces of peripheral code/drivers without the whole OS behind me, but I also wanted to be able to poke at things in memory in an interactive manner and so on, and debugger support at the time was really bad on some of the platforms we needed, including the MSP430). It sort of happened but I never used it enough to polish the rough edges. It was actually useful and interesting – at the expense of a little flash space, you got an interactive debugger of sorts over a serial port that allowed you to “load” (eh) run and edit small programs off of a primitive filesystem. Realistically, it was mostly a waste of time: this wasn’t a microcomputer, it “ran” on MCUs inside various gadgets. The time it took to “port” it to a new one vs. what you got in return just wasn’t worth it.
                    • A SDR-based radiotelescope. I had a pair of Ettus Research SDR boxes more or less all to myself for a few months and I could play with them more or less at will as long as I didn’t break them, but the company I was at went under before I got to try anything (my knowledge of antennae was, uh, I’d say rudimentary but that would probably be overselling it). I did get to write some antenna positioning code that I later integrated into some real-life firmware at $work so it wasn’t all wasted.
                    • A Star Trek meets Rogue, uh, I’d say rogue-like? Unfortunately implementing all the cool things (random story generators! random races with political intrigue and all! Gandalf-like figures roaming the galaxy!) was way more fun than implementing the actual game so I ended up with 40,000 lines of Java that spit galaxy news in a log file and nothing else. I learned a lot about sparse matrices though – that was actually the whole reason why I wanted to get into it in the first place (tl;dr I wanted to model something that supported tens of thousands of star systems with millions of ships and so on) – and of all the projects in this list, it’s the one that would’ve probably been easier to make into something cool. I tried to restart it at some point, then I learned about Dward Fortress and I honestly couldn’t see the point anymore :-).
                    • A software synthesiser that tried to use some pretty advanced physical models to generate wind instrument sounds. Unfortunately I got so bogged down into the modelling side of things that by the time I had some basic prototypes, integrating them into a program worth using wasn’t really fun anymore, and I also couldn’t (still can’t…) really play any wind instrument so my understanding of these things was limited. I later tried to do a more ambitious synthesiser for a harp (tl;dr also software-driven but it used lasers instead of strings) for my final year university project but that never happened, and while I have a crude hardware prototype tucked in a closet somewhere, I never got around to writing any of the hard parts of the software. The biggest problem I had, and the main reason why this didn’t get anywhere, is that I just didn’t understand enough about real-time audio processing to get something useful. I still don’t.
                    • An Amiga Workbench clone for Wayland. By the time enough of the cool features got implemented (e.g. multiple screens) I got so fed up with Wayland and Linux in general that I never wanted to finish it. Various bits and pieces, like an Amidock clone, got to a usable(-ish) state. This is the only project in this list that I didn’t really enjoy. I was already fed up with these things when I started it, I just didn’t really want to admit it. I don’t want to say anything about how this one could be improved and why I failed at it because I’m quite bitter over these things, but tl;dr I’d rather have all my teeth pulled out and swallow them than touch any of that stuff again.

                    There were others, much smaller, these are the cool ones.

                    All in all I think I finished very few of the side projects I started but I learned a lot out of all of them and many of them came in handy when doing stuff I actually got paid for. I have zero regrets for not finishing them. It’s important to finish some things but not all of them.

                    Reading a long list of projects that failed sounds a bit like a long list of failures but really, they weren’t. I achieved most of my goals. If I had an infinite supply of free time I could probably finish the ones that were never finished because they were too ambitious for my level of knowledge at the time (e.g. the wind instruments thingie) but there are so many cool things that I don’t know how to make that it kindda feels pointless to use my free time doing the ones that I now know how to make.

                    (Edit: I guess the point I’m trying to make is that no time spent hacking on something cool is truly lost, no matter what comes out of it in the end, and no matter how modest or grand the ambitions behind them. There’s a whole side project hustle mill going on these days and this whole “don’t send us a resume show us your Github profile” thing and I think it’s a con, and all it’s doing is making people afraid on doing things in their spare time, because they treat these things the way they treat projects they do at work. Computing was my hobby long before it became my profession, and it still is; finishing something “successfully” is besides the point when it comes to these things – their function is fulfilled as soon as a line of code is written or a piece of schematic is drawn, that brings me joy all by itself. Don’t fall into the trap of taking these things more seriously than you ought to. Most of us spend at least 8 hours/day agonising over whether something will be finished successfully or not – unless you enjoy that part of the job, there’s no reason to take it home with you.)

                    1. 3

                      I, too, have looked at Dwarf Fortress and concluded I couldn’t possibly top it. Perhaps the way to approach something like that is to bite off a chunk of DF and trying to make it better, more realistic, more complex, or more fun. Of course, a lot of the magic of DF is the interconnectedness of the complex systems. But I can imagine one person making a very complex 2 person battle/dueling system, or a complex home decorator, or a terrain generator that goes higher into the sky or deeper into the earth, or a DF with birds instead of dwarves.

                      1. 2

                        An Amiga Workbench clone for Wayland. By the time enough of the cool features got implemented (e.g. multiple screens) I got so fed up with Wayland and Linux in general that I never wanted to finish it. Various bits and pieces, like an Amidock clone, got to a usable(-ish) state. This is the only project in this list that I didn’t really enjoy. I was already fed up with these things when I started it, I just didn’t really want to admit it. I don’t want to say anything about how this one could be improved and why I failed at it because I’m quite bitter over these things, but tl;dr I’d rather have all my teeth pulled out and swallow them than touch any of that stuff again.

                        Holy cow this sounds cool. I’m trying to envision what this even would look like.

                        Specifically because as you’re well aware I’m sure an Amiga “screen” was kind of a different animal from anything that exists in a modern desktop context, and I don’t know how you’d enforce that kind of sliding behavior with modern windowing systems.

                        I just recently saw this project which bundles a fully emulated Amiga system into a Visual Studio Code package so you can compile, debug and run your Amiga code from a modern environment.

                        1. 2

                          Specifically because as you’re well aware I’m sure an Amiga “screen” was kind of a different animal from anything that exists in a modern desktop context, and I don’t know how you’d enforce that kind of sliding behavior with modern windowing systems.

                          It’s been done before (to some degree) on X11 as well, see e.g. AmiWM (I think? I might be misremembering it, but I think AmiWM supported sliding screens. e16 had support for something like this a long time ago but I don’t recall if it could do split-screen). I only implemented a very rudimentary prototype which worked sort of like the famous spinning cube thing, except instead of mapping each desktop surface on a spinning cube, I just mapped it on different screen sections. I wasn’t really planning on adding it so It was more of a hack I cobbled together, it only worked under some basic scenarios, but I’m sure it can be done with a little patience.

                      1. 3

                        I wish he took the last step to figure out how many characters that is given an alphabet you can easily type on a keyboard.

                        If we define that as the 95 displayable ascii characters, would that be bitlength/sqrt(95), or am I mixed up? So for the 327 suggestion, that’d be 34 randomly selected characters?

                        Of course, I get a simple number is probably not a good idea because people wouldn’t pick a truly random 34 character password in practice unless they’re using dice or a password generator, so that’d be irresponsible if he was quoted out of context.

                        1. 1

                          I think it would be bitlength/log2(95), because log2(95) gives you how many bits needed to represent 95 possibilities. log2(95) ~= 6.5. So, 327/6.5 = 50-51 displayable ASCII characters. For the higher estimate of 405.3 bits, this results in ~62.3 displayable ASCII characters.

                          1. 3

                            Passwords of the same length using the same charset can have different levels randomness, and thus different entropy measurements. For this reason, I’d recommend using a tool like zxcvbn (many implementations exist) or KeePassXC to calculate password entropy; they take into account things like dictionary words and non-random patterns. I’ve generally found KeePassXC to give much better results than zxcvbn.

                            I do understand the need to connect this to “acutal” passwords, so I updated the article with some sample passwords. Diff.

                            Thanks for the feedback.

                            @kmm

                        1. 16

                          It’s not tech, or at least not… what we mean by tech around here, but I’m going to make an exception about this because I have a pretty fun story to tell about it:

                          Today, my kids just assume that the sum of all human knowledge is available with a single search or a “hey Alexa” so the world’s mysteries are less mysterious and they become bored by the Paradox of Choice.

                          So back when I was a kid I’d read Jules Verne’s “The Mysterious Island” and “In Search of the Castaways”. And that raised a very interesting problem for me: I wanted to know if Tabor island really existed. This was a bit of a tricky thing with Jules Verne’s books, because they were pretty well researched. Some of the places were obviously fictitious (Lincoln Island in The Mysterious Island, for example, and a bunch of stuff around the North Pole and so on) but most of them were absolutely real places.

                          So I embarked on a two-year quest that was completely inconclusive. I had a map at home, of course, and it didn’t show it. But it also didn’t show a lot of other small islands that I knew were real, and I figured that was because it was a large-scale map. I borrowed an atlas from the school library, which had some more detailed map of the Pacific region and it didn’t show up there, either, but then again, the maps were small and not that detailed, either, since the Pacific is a long way from us.

                          So I pestered my father to get me a book about the Pacific Ocean and its islands. Thankfully, nerdiness runs in the family. A few weeks later he got me three books, two of which had pretty good maps. That was a dead end though: most Pacific islands worth writing a book about are in one of the major archipelagos, whereas Tabor would have been a lone island. I read all three but didn’t find anything.

                          So I asked one of my mother’s colleagues (she teaches elementary school and after enough pestering she introduced me to one of her colleagues who taught geography). She told me she doesn’t know of any island with that name. I came prepared, of course – I knew the latitude and longitude from the book! – and she said it doesn’t ring a bell but that doesn’t mean much, there are thousands of island all over the world and she doesn’t know them all, either. But she took me to her office where she had a huge map that covered an entire wall, and we looked for it and we didn’t find anything.

                          That sort of laid it to rest – I was pretty sure Tabor island didn’t exist – but, still being at an age where you’re dreaming of chasing pirates and all that, I sort of kept a tiny glimmer of hope that maybe it’s a real place. After all, I didn’t read anything that said it’s not there – all I had as confirmation was a bunch of maps drawn beyond the Iron Curtain in the early 60s (at best), which got a bunch of things in remote areas of the world wrong anyway.

                          Fast forward a few of years later and now my computer has a Winmodem in its guts (I was in… 7th grade, I think). I stumble upon some discussion about Jules Verne’s books on a forum and I remember my quest for Tabor Island. So I go to altavista (yep)…

                          …and bam, two minutes later I’m staring at three pages that explain what phantom reefs are, how Maria Theresa is one of them, and how there are a bunch of other phantom islands like that, and how they tried to look for it several times, most recent one in the 1970s, and it wasn’t found.

                          Two minutes to settle a question that I’d previously spent two years seeking an answer to. This is the power of the Internet.

                          1. 2

                            Two minutes to settle a question that I’d previously spent two years seeking an answer to. This is the power of the Internet.

                            The question isn’t settled. Not found != not exist.

                            1. 2

                              Well, I guess one could say it was only conclusively settled in the age of Google Earth. But there’s a big difference in certainty between “I couldn’t find it on a map” and “two expeditions in the 1950s and 1970s looked for it at a time when everyone already suspected it doesn’t exist and they didn’t find it, either”.

                              I’m not going to deny that I still do hope it exists, though :).

                            2. 2

                              The scale was amazing. The web would probably have fitted on a fairly small stack of CD-ROMs when Encarta first came out. My computer had a 60 MB hard disk, Encarta was over ten times the size of all of the programs and data I had installed. The computer I got in 1996ish (the first one I owned with a CD-ROM drive) had a 1GiB disk, so only slightly larger than Encarta.

                              Wikipedia, ignoring talk pages and history, is around 14GiB compressed. That’s around 20 times the size of Encarta and most pages don’t have any video. The amount of data that is available in a quick search these days is just staggering.

                              1. 4

                                I don’t want to take anything away from Wikipedia, it’s great, but I also think people have forgotten what a traditional reference is like.

                                For example if the subject is a British person (or is related to the former British Empire at all) the Oxford Dictionary of National Biography articles are still streets ahead of the majority of Wikipedia’s. All written by academic historians, all fact checked by the editors. It’s really good. And free to almost everyone in the UK - just enter your council library card number.

                                1. 2

                                  I can’t wait for the Oxford Dictionary of National Biography to get into the public domain so we can incorporate sections of it wholesale into Wikipedia. :) This is already done with old versions of the Encyclopedia Brittanica.

                                2. 2

                                  It was! Even in 1998, when the web had already grown a lot, Encarta came on two (or four?) CDs and it seemed huge.

                                  And it especially seemed huge when compared to what was available on paper. I didn’t have an encyclopedia at home but I did have something called an encyclopedic dictionary – a thick, 2000-page book printed in minuscule letters, that included some full-color illustrations and a lot of charts. At some point I’d read something about Denmark in some school magazine and I wanted to know more about it – and everything I could find, and knew, about Denmark at the time was condensed in the 200-word or so entry in that dictionary, a 3-by-5” black-and-white picture of Copenhagen also in the dictionary, plus whatever I could gather from the maps in an atlas (the names of about a dozen cities, I think?). I’d read those 200 words enough times that I probably knew them by heart at some point. I looked for a book about it but didn’t find anything for a few weeks.

                                  Then I got a hold of Encarta 98 which had several pages about Denmark, including one about history. And a recording of a traditional song. And images from all over that country. And it took me all of thirty seconds to find it.

                                  By 1998 the web was definitely large enough that it wouldn’t fit on a CD anymore, but there were few places on the WWW that had as much information in one place as Encarta did, and covering such a diversity of formats.

                              1. 5

                                Great article!

                                Given the page table overhead of allocating large amounts of virtual memory, does anyone know if people actually use virtual memory to implement sparse arrays?

                                Virtual memory is fascinating. We take it for granted, but it had to be invented at some point. And, there were several precursors to page-based virtual memory, too. Perhaps we’ll move away from virtual memory?

                                That’s the premise of this article: “The Cost of Software-Based Memory Management Without Virtual Memory”

                                “While exact area and power consumption are difficult to quantify, we expect that removing (or simplifying) support for address translation can have a net positive impact since current translation infrastructure uses as much space as an L1 cache and up to 15% of a chip’s energy”

                                “Modern, performance-critical software is considered non-functional when swapping, so it is avoided at all cost.” We learn about swapping as one of the primary reasons for using virtual memory in college, but then in the real world it’s essentially not used at all.

                                1. 6

                                  I don’t think swap being unused is necessarily true - it depends on your use case. Swap is fantastic when, for example, you have a bunch of daemons running in the background, and you want them to be running, but you rarely need them. In that case those programs can be paged out to swap and that memory can be used for something better, like the disk cache. The boost you get from the disk cache far outstrips the hit you take by swapping the daemon back in, because you don’t have to perform the latter operation very often.

                                  I think really the issue is predictability. On my laptop I have a big swap partition to enable the above effect, but in production I don’t because it makes the system easier to understand. IIRC I even go as far as to disable overcommit in production because, again, having it on makes the system less predictable and therefore less reliable. If something gets OOM killed on my laptop it’s annoying; if something gets OOM killed in prod, something just went down.

                                  This fundamental trade-off between predictability/complexity and optimization comes up in other places too. For example: free space under ext4 is trivial to understand. But under ZFS or btrfs? Incredibly complicated, especially when it comes to deletion (i.e. “if I delete this, how much space will I actually get back”). You can delete a snapshot with 1TB of data in it and end up freeing <10 MB because the snapshot was taken only a minute ago. How much space a 5 MB file will take depends on how well its contents compress. Under btrfs this is even affected by where in the filesystem you write the file, because different parts of the filesystem can have different levels of redundancy. And blocks might be deduplicated, too. There is a separation and disconnection between the logical filesystem that userspace sees and the physical disk media that the filesystem driver sees that simply didn’t exist in e.g. ext4. And this can potentially cause big problems because tools like df(1) examine the logical filesystem and expect that that’s equivalent to the physical filesystem.

                                  1. 2

                                    This is awesome. Thanks for writing all this!

                                    1. 2

                                      Sure thing :-) I’m glad you enjoyed it. I also found your original article fascinating. Sometimes I wonder about posting long things like this because a) I wonder if I’m getting way too detailed/if I just like talking too much and b) I have a lot of experience as a technologist, but almost exclusively as a hobbyist (as opposed to in industry) - “production” for me is mostly a single server running in my house that I administer by hand. I always wonder if I have some glaring blind spot that’s going to make me say something silly. So it’s super nice to see that at least some other people thought it made sense :D

                                      As a side note to just tack onto the end of my original comment: all of the ZFS/btrfs examples I listed above are about things that you actually can calculate/understand if you know where to look, but I just thought of an example where (AFAIK) that’s not the case: under ZFS at least, how do you answer the question, “if I delete both these snapshots, how much space will I free?” If both snapshots are identical, but you deleted a 1GB file since they were both taken, zfs(8) will report that deleting either snapshot will free 0 bytes. And yet, deleting both will free 1GB. This is fundamentally harder to present in UI because instead of “if you perform x, y will happen”, it is “if you perform x, y will happen; if you perform a, b will happen, but if you perform x and a, then y, b, and some third effect will happen”. Unix CLIs really like outputting tabular data, where rows are things (disks, datasets, etc.) and columns are properties about that thing. But given that kind of tabular format, it is virtually impossible to usefully express the kind of combinatorial effect I’m describing here, especially because given x operations that could be performed (e.g. destroying a particular snapshot), listing the combinatorial effects would require outputting ℙ(x) rows.

                                      Again, this is all AFAIK. (If someone knows how to answer this question, please correct me because I actually have this problem myself and would like to know which snapshots to delete! Maybe it can be done with channel programs? I don’t know much about them.)

                                      1. 2

                                        It’s a fascinating UI problem you bring up wrt combinatorial options. Perhaps an explorable UI with a tree of options would make sense. Maybe there needs to be a tool just for the purpose of freeing up space that calculates the best options for you.

                                        1. 2

                                          A tree UI is actually a phenomenal idea. Whenever I tried to think of a solution to this problem before, the best I could come up with was usually a program that would let you simulate different filesystem operations (perhaps it would actually run the real operations in-kernel, but in a “fake” ZFS txg that was marked to never be committed to disk?) and then interrogate the results. You’d be guessing and checking at the different combinations, but at least you’d actually get a solid answer.

                                          The problem with having a tool to calculate the “best” way is that what’s “best” is incredibly subjective. Maybe I have a few snapshots taking up a large amount of space, but those backups are incredibly valuable to me so I’d rather free up smaller amounts of space by destroying many more smaller but less important snapshots. I really do think a tree UI would work well though, especially if it had a good filtering system - that would help alleviate the power set explosion problem.

                                    2. 2

                                      You’re absolutely correct that swapping has a role to play. I just don’t think it’s used much in high-performance, low-latency systems - like web servers. Also: the disk cache is my hero <3.

                                      The tradeoff between optimization and predictability is a good point. Another example I see in my work is how much more complexity caching adds to the system. Now, you have to deal with staleness and another point of failure. There’s even a more subtle issue with positive and negative caching. If someone sends too many uncached requests, you can overload your database, no matter your caching setup.

                                      1. 2

                                        Yeah, this is a great point. I think the predictability problem in both our examples is closely related to capacity planning. Ideally you’re capacity planning for the worst case scenario - loads of uncacheable requests, or lots of undeduplicatable/incompressible/etc. data to store. But if you can handle that worst case scenario, why bother optimizing at all?

                                        I think really all this optimization is not really buying you additional performance or additional storage space, which is how we often think about it. It’s buying you the ability to gamble on not hitting that worst case scenario. The reward for gambling is “free” perf/storage wins… but you’re still gambling. So it’s not actually free because you’re paying for it in risk.

                                        Side note: we love the disk cache! <3

                                    3. 5

                                      Thank you!

                                      does anyone know if people actually use virtual memory to implement sparse arrays?

                                      I just searched “virtual memory sparse array” and found this real life example of this being useful with numpy: https://stackoverflow.com/a/51763775/1790085

                                      That’s the premise of this article

                                      Sounds like a really interesting paper!

                                      We learn about swapping as one of the primary reasons for using virtual memory in college, but then in the real world it’s essentially not used at all.

                                      I’ll point out that the other, possibly more important feature of virtual memory is memory permissions. The efficiency benefits of physical memory sharing are also significant (the canonical example is sharing libc between every process.) Virtual memory is also important for implementing copy-on-write semantics; sometimes the kernel needs to write to userspace memory and relies on a page fault to tell if this is a CoW page.

                                      I’ll have to see if the paper talks about replacing these.

                                      1. 2

                                        They propose hardware support for memory permissions - extra metadata for each memory location, with the hardware doing checks before allowing access. They argue that this can still be a net win given how much chip space is given to virtual memory infrastructure.

                                        It should be possible to share physical memory without virtual memory, no? A process simply gets permission to read shared code segments, like libc code. Of course, it would need it’s own copy of libc global state. There might need to be an extra bit of indirection from libc code to the global state if the physical addresses of the state may vary across programs (they could be in the same place under virtual memory).

                                        Since they argue swapping could be implemented at the application layer, perhaps copy-on-write could be too?

                                      2. 3

                                        people actually use virtual memory to implement sparse arrays

                                        I think sometimes but not often. If you only use virtual memory for your sparse array, you don’t know which entries are set and which aren’t. Without knowing that, you can’t skip reading and multiplying the unset entries.

                                        Iirc Linux has a flag for turning off checks in mmap() that normally prevent you allocating silly numbers of pages in one go. The description of it says it’s there for some scientific programs that wanted unlimited overcommit.

                                        The memory overhead of potentially having an entire page allocated for a single entry, when the entries are spread out, may be unwelcome. On the other hand sometimes people work with sparse matrices that have large contiguous dense chunks embedded in a colossal sea of zeroes.

                                        e.g. scipy has several completely separate representations available for sparse matrices https://docs.scipy.org/doc/scipy/reference/sparse.html

                                        1. 2

                                          Dense chunks in an ocean of zeros would be a good case for using virtual memory for sparse arrays. I hadn’t thought of that.

                                          1. 3

                                            FWIW I don’t mean to imply that virtual memory is necessarily a great way to implement that, just that the “entire page for one number” thing doesn’t bite you so hard when your matrices look like that.

                                            I think you’re still likely to benefit from a representation where you write down where each sense block is + a dense array holding all the entries.

                                      1. 40

                                        I want a Wikipedia for time and space. I want people to collaborate on a globe, marking territories of states, migrations of peoples, important events, temperatures, crop yields, trade flows. I want to pick which layers I want to see at one time. This sort of thing would give us a clearer, more integrated view of history, and show how connected we are all. A volcano eruption in Mexico led to crops failing around the world. New World crops led to an explosion of people in southern Chinap, a previously less inhabited place. A Viking woman met natives in present-day Canada and also visited Rome on pilgrimage.

                                        1. 4

                                          Excellent idea. I have an idea for something similar but less… featured. My idea is about time and space tagged news.

                                          1. 3

                                            Also thinking about one from time to time - in my view kinda like Wikimapia for history, usually thinking how country borders and armies could be represented.

                                            1. 3

                                              The Seshat global history databank is a bit similar to this (great) idea.

                                              “systematically collects what is currently known about the social and political organization of human societies and how civilizations have evolved over time”

                                            1. 5

                                              My technological dream: laws are written in a restricted, runnable language (let us say, Prolog) and validated whether they conflict with other laws before they get ratified.

                                              And looking to the predicates in the predicate.pl file, it seems a (minimalist) vocabulary of validations (well, predicates…) was developed, which is a great effort per se, in my opinion. :)

                                              1. 4

                                                Assuming laws will never be written in anything but words with their own legal formalities I think it’s more plausible that there will be nlp “burden detection” on entities that are extracted from existing law, then reviewed and tagged by humans. As I understand it existing work in this area bumps up against the fact that there is a lot of manual work reviewing the output of a tagged legal document for determining accuracy of what is binding upon which people in which situations.

                                                However, once past a critical mass of tagged and reviewed documents I can see how things like “validated” and “in conflict” can be done in a more automated fashion. American law ends up being that which is passed, then of course the text of the judicial rulings and precedent. If only the full text of constitution to the first bill passed by congress, all the way through to today’s explosion of daily judicial decisions were all available in a single place for that sort of nlp to work against.

                                                Neat links, thought about this too much. Of course the tools lawyers use probably already have approximations of what I’m describing but they’re not at all free. :)

                                                https://en.wikipedia.org/wiki/An_act_to_regulate_the_time_and_manner_of_administering_certain_oaths

                                                ^^ first act passed by congress

                                                https://www.thestrangeloop.com/2019/improving-law-interpretability-using-nlp.html

                                                ^^ really good presentation on this subject

                                                1. 3

                                                  Someone did codify French tax law to find discontinuities. We should do this sort of thing as a matter of course for mathematical laws like those for taxes.

                                                  https://blog.merigoux.fr/en/2019/12/20/taxes-formal-proofs.html

                                                1. 52

                                                  I have a very serious suspicion that the industry’s move to flat designs – for everything, not just website UI elements, which the article analyzes – is, in part, economically-motivated. It’s not a quality judgement per se (although I certainly don’t like flat UIs, but people who grew up with them might think differently), but it’s an observation I’ve made over several years.

                                                  About 20 years ago, when I got my first gig in the computer industry, one of the people I worked with was an icon designer at a media agency. He did other things besides drawing icons, of course, but whenever the agency he worked with needed an icon for an application, or a set of buttons and icons for a website (this was the peak of the Flash era so this was very common), he was the man.

                                                  For particularly challenging contracts – e.g. OS X applications, which had those amazing, detailed icons – it could sometimes take two or three weeks for several icons to be drawn and the “winner” to be picked and retouched. The price for all that effort was the kind of money that, provided you outsource it someplace cheap, can get you a whole mobile app written today. And that was the sort of time frame (and budget!) involved in any major design project.

                                                  It was also very hard to string things together. The agency this guy worked with made a lot of money out of what they called “design kits” – ready-made buttons, icons and whatnot that were specifically design so that they were easy to customize. That was because getting inexperienced people to make a good UI out of photorealistic buttons and 3D buttons usually got you mad bling and crashes, rather than an image of the cyberfuture.

                                                  But that was also at a time when you could charge 25$ for a shareware applications and it sounded like a decent price (and this was 2002 dollars, not 2020 dollars). Most applications today – sold for 1.99$ on an AppStore or, more commonly, in exchange for personal data that may or may not turn into profit eventually – are developed on a budget that 2002-era design would gobble up in no time.

                                                  So of course most icons are basically a blob of colour with an initial on them and all buttons are flat rectangles with an anonymous, black-and-white symbolic icon on them: the kind of time frames and budgets involved don’t leave room for anything more advanced than that. This not only takes care of the money problem, it also makes it a lot easier to work with “design kits”, and also makes it less likely for brofounders to get mad bling instead of something that they can at least demo to potential investors when, instead of paying experienced designers at market rates, they hire college dropouts for peanuts.

                                                  I don’t want to claim that this is an universal explanation. I’m sure that, especially after Apple and Google made it fashionable, a lot of companies adopted flat design in good faith, and that a lot of experienced designers adopted it both in good faith and for their own subsistence (I’m still in touch with that guy, and while he also thinks most flat designs are crap, he will also gladly come up with one for his paying customers).

                                                  But I also doubt that flat design is all about good usability and user friendliness, especially since, as soon as you ask about usability studies (real ones, not “I asked twenty people on the hallway oh and I asked my dad and my mom and my sister and they’re really not computer nerds”) and numbers everyone starts ranting about how these things never paint a realistic picture of usage patterns and usability has a certain Zen that can’t be captured by raw numbers and experiments and we don’t, like, really understand how perception and the human brain work…

                                                  1. 13

                                                    This comment itself deserves to be an article or at least a story on lobste.rs.

                                                    1. 8

                                                      National Instruments produces the LabView visual programming language. Every operator and function pretty much has its own icon. See an example of a LabView program. They had full time icon designer(s) to make those. It’s a hard job, especially for niche, technical, or abstract concepts. How would you create an icon for a parallel while loop?

                                                      Users could create custom icons for their “functions”. One interesting consequence is that library writers could create visually distinctive, branded icons for their functions.

                                                      1. 3

                                                        It’s cool that you mention LabView because one of the reasons why their icons are so great is that, if you’ve done non-virtual instrumentation before, most of them make sense. They aren’t some abstract symbol thing – for example, variable inputs have tiny beveled buttons and display on, what, 48x48px icons? They kindda look like the real-life devices or blocks that they represent. And those that are just black and white symbols tend to follow decades-old standards for electrical symbols, logical diagrams etc., which everyone can look up, or knows by now. (Not because they’re intuitive, btw, or because they follow some unconscious interpretative impulses that have been ingrained in our brains since time immemorial but because they drill them into you through four years of engineering school :) ).

                                                      2. 7

                                                        The sensation of downward pressure being placed on the economics in the context of creating software has many parallels:

                                                        • why hire experienced polyglots? Just use JS and hire 2x as many people
                                                        • why make a native app? Just use Electron and eat 3x the resources
                                                        • why create an architecture? Just force everyone to use redux and hope for the best
                                                        • why design a UI? Just throw bootstrap on it and tweak the colors
                                                        • why write integration tests when you could be shipping?

                                                        The goal is always the same: commoditization.

                                                        1. 5

                                                          I’m not so sure about this. A lot of the “flat UI trend” was started by Microsoft and Google – others followed. And you can make “UI packs” which are not flat icons with just as much ease.

                                                          My own theory is that it’s essentially a confusion between graphic design and user interface design. People who do either are often just called “designers”, but they’re quite different disciplines: one is about making things look good, the other about making them work well.

                                                          I’ve worked with quite a few designers over the years who made great looking designs which … were not all that great to use, and not infrequently got very defensive about small changes to make it easier to use “because muh art”.

                                                          This is not intended as a condemnation of graphical designers, but graphical design is an art skill whereas UI design is engineering skill. Ideally, both should work together to create a great-looking UI that works well.

                                                          Why are companies like Google making unusable flat-UI designs with lightgrey hairline thing text on a white background? I don’t know … In principle they should have enough money to hire UI designers and should be knowledgable enough to know the difference between graphic and UI design. I guess fashion is one hell of a drug?

                                                          1. 1

                                                            And you can make “UI packs” which are not flat icons with just as much ease.

                                                            It’s even easier – but they’re also pretty useless. What would you customize about it? A while back there was an entire debate about the right way to represent buttons: “clean” vs. “outlined” bevel (i.e. just the bevel, as Windows 95 did it, or a black outline around the bevel, a la Amiga’s MUI?), curved vs. non-curved lit edges, harsh vs. soft gradients and so on, and each one fit one kind of application better than another, at least in aesthetic terms. I don’t know if those discussions made sense or not, who knows, maybe it was just in order to make it look like the money you paid was worth it :). But what would a “design pack” consist of, when all you get to customize about a button are the color, padding, and font?

                                                            You can sort of see that in tools like oomox (https://github.com/themix-project/oomox). It’s not exactly a design kit, and it answers a completely different problem (GTK being obtuse), but it’s close enough.

                                                          2. 1

                                                            So of course most icons are basically a blob of colour with an initial on them and all buttons are flat rectangles with an anonymous, black-and-white symbolic icon on them: the kind of time frames and budgets involved don’t leave room for anything more advanced than that.

                                                            not even text? we are stuck at pictograms now?!

                                                            1. 1

                                                              是的 orz

                                                              1. 1

                                                                You’re kidding but six or seven years ago, when I last worked on a consumer device, the design team insisted that there should be as little text as possible. First, because it looks intimidating and is not user-friendly. Second, because graphical representations allow us to establish a better brand presence for us – our own graphical design language, as opposed to generic text, would make our identity more recognizable. The gizmo’s PM had no strong opinions on the first point, but loved the second one, plus as little text as possible meant as little internationalization as possible – so, yep, it was pictograms everywhere.

                                                                1. 2

                                                                  Pictograms don’t always make localisation easy. The classic example of this is the choice a bunch of companies made in the late ‘80s to use an owl as the help symbol. Owls are traditionally associated with wisdom in European culture, but in China they’re associated with bravery and in Latin America with black magic and evil. Localisation translated the text, but they got a lot of support calls from people who didn’t realise that the brave / evil animal was the one you clicked on for text. Translating words is generally much easier than translating cultural symbols. A lot of pictures have a variety of different cultural meanings and picking a replacement that matches the particular association that you meant can be a tricky problem.

                                                                  1. 3

                                                                    I think I brought up the exact same example with the owl, and all I got was “well we’re not gonna use an owl then, we’re going to make sure we use only symbols with non-controversial meanings” :).

                                                                    That department, despite masquerading as an engineering group, was effectively the cargo cult arm of the Steve Jobs Fan Club, and virtually all decisions were made by non- or semi-technical people based on whether or not they thought Apple would have done it the same way. I could write… well maybe not a book, because my patience (which I’d previously thought to be virtually neverending) lasted less than an year, but at least two or three chapters, about all the hilarious disasters that this process resulted in. Thankfully, most of the products I worked on during that time didn’t even make it past the prototyping phase, so the world only got to laugh at one of them, I think.

                                                                    (Edit: FWIW, it’s very likely that I did a lousy job at advocating for these things, too. By the time that happened I had long lost any kind of patience or trust in the people making these decisions, or their higher-ups, plus at the end of the day it really wasn’t my job, either. I heard it was pretty hard to argue with Steve Jobs when he was alive, too; arguing with a handful of people possessed by his spirit was an uphill battle that I most certainly lost.).

                                                            1. 27

                                                              I don’t get why people think Microsoft should discuss this on GitHub? Nobody who has a GitHub account at Microsoft is qualified to speak on a trademark issue. The reality is that their legal department seems to have dropped the ball and the only thing the devs can really say to this is “It’s been through legal” even if they were wrong. It’s not like they can ignore a trademark, so we will probably see a rebrand pretty soon.

                                                              1. 10

                                                                It’s also not unheard of that Microsoft’s legal department misses checking German trademarks. Modern UI was called Metro UI at some point, too (https://www.techpluto.com/metro-ui-renamed/)

                                                                1. 2

                                                                  SkyDrive had to be renamed OneDrive because of a trademark dispute with Sky Group in the UK. Google has to rename Team Drives to Shared Drives for a similar reason. Big company lawyers often get trademarks wrong. If you’re using such common words, it’s likely to be used by other companies, too.

                                                                2. 8

                                                                  I would have said “I didn’t know that, I’ll double check about this with legal” instead of “perhaps the Linux project should change its name […]”

                                                                  No reason to be an ass about it.

                                                                  1. 11

                                                                    The person who was an ass about it didn’t work for Microsoft, and was banned from commenting on Microsoft repos for 7 days according to the repo code of conduct agreement. In addition, a Microsoft employee can’t admit to knowing or not knowing about them on behalf of the company. If they did, legal would have their ass and he’d probably get them in trouble. Remember, anything you say can and will be used against you.

                                                                    1. 4

                                                                      If I can say “I didn’t know about that” and that can be construed to mean “Microsoft didn’t know about that,” then that’s bananas. Not every person at a company knows every single thing. If I were speaking on behalf of Microsoft, all I would be saying is “at least one person at Microsoft does not know about the existing Maui project.” And I’m not claiming I will do anything about it, just that I personally will attempt to learn about it by asking someone else at the company.

                                                                      Either way, in this position I would probably ask someone with more context how to respond before firing off something on GitHub. That could be exactly what happened here. So I see your point.

                                                                      1. 5

                                                                        If I can say “I didn’t know about that” and that can be construed to mean “Microsoft didn’t know about that,” then that’s bananas.

                                                                        Yet you feel the need to express in your profile that opinions are your own and not Google’s?

                                                                        1. 2

                                                                          Yet you feel the need to express in your profile that opinions are your own and not Google’s?

                                                                          Signatures to that effect go back to 1980s Usenet at the very least.

                                                                          It’s likely a cargo cult but it’s one with a long history.

                                                                          1. 2

                                                                            On Lobste.rs I’m not constantly on guard about what I say or how I say it. I’m sure I’ve written comments that could be interpreted as an opinion held by Google, or Google employees in general. I really could not tell you if I’ve written “we…” anything or not. I don’t think about it and I don’t care to. But I sure wouldn’t say “we didn’t know about this trademark” when speaking as a Google employee on a Google repo that Google pays me to work on.

                                                                          2. 1

                                                                            It’s not so much that they know about it or not. It’s more about acknowledging that there is a trademark issue. If he had acknowledged the others trademark then they would be admitting to negligence from the get-go, severely limiting their options in court or settlement negotiations. Their best course of action, legally speaking, is to not acknowledge the others existence. It looks pretty dumb from a PR point of view, but legal doesn’t care about our opinions of the company as long as it saves them a couple millions from a lawsuit.

                                                                      2. 3

                                                                        Why do you think microsofts projects should be any different from any other project on github? Most projects on github use the platform for all kinds of changes. The fact that their legal and PR departments don’t seem to be aware of github doesn’t mean that the developer who has the issue is doing it wrong, it means that MS has to catch up with the way software projects work today.

                                                                      1. 5

                                                                        We have SEO because people make money by people visiting websites, and that happens mostly through search engines. Until web commerce is dead or when people stop using search engines, SEO will not dry up. Of course its wet where there is water!

                                                                        1. 2

                                                                          There are also plenty of “SEO” example from before the internet, for example “Acme” being a popular company name as that meant you were higher up in the telephone book’s alphabetic listing.

                                                                          1. 2

                                                                            AAA

                                                                            1. 1

                                                                              This is why Bezos liked the name “Amazon”

                                                                          1. 53

                                                                            UUID keys shred index cache friendliness and price performance, so take the author’s claims of “high scale” with a nice chunk of salt.

                                                                            B-tree-based databases that have put any effort into optimizing their index layout will employ prefix encoding and suffix truncation to reduce the space consumed in the index layer significantly, even with wasteful high-entropy approaches like this, but leaf nodes are only able to employ prefix encoding, which is pretty low impact on high-entropy workloads like this. And leaf nodes will make up over 99% of all nodes in the tree in most cases.

                                                                            The main property the author is getting at (unique, low-coordination identifiers) can be achieved in a much better way that will play with decent b-tree indexes by assigning a very short prefix to a local node, and having monotonic ID’s generated locally. This plays nicely with prefix encoding and avoids redundant bloat which causes prices and latency to go up. Another approach is to just batch ID allocations to servers by having them claim an ID range periodically and then allocate from that. 16 bytes is a huge amount of data to waste for uniqueness. For context, with 8 bytes you can dish out over 6 billion unique IDs per second for 100 years. Since this article is talking about “high scale”, this approach is totally inappropriate for indexing dozens of terabytes and up from a storage efficiency perspective. High scale is when TCO matters.

                                                                            You trade a lot of tools that can increase the robustness of your applications by going down this path at all though. The author totes avoiding the frontend having to wait on the database server as a nice property, and sometimes it is, but just know that you are getting into the territory of what a database does a lot better than you or your engineers probably can do: guaranteeing the consistency of contents through transactions, foreign-key constraints, etc… Presumably your business uses that data for something useful, so it can be advantageous to have it exist in a non-conflicted state from time to time.

                                                                            If you pay attention to things like index size, it’s not unreasonable to expect a round trip to a relational database in the same dc to complete in under 500-700 microseconds. That isn’t very much compared to how long your users are waiting anyway to establish a TLS connection with your web app and waiting for it to slap everything together and ship it across the internet. But users definitely notice when their shit disappears, and this general approach to data modeling is a very sharp knife that begs for dataloss unless people who effectively know how databases achieve the desired correctness properties are the ones writing the logic, and it’s pretty rare that they are.

                                                                            1. 1

                                                                              I don’t buy his speed argument either. You’d still have to wait for the server to do data validation and permission checks (jeez, I certainly hope they’re not doing client-side data validation!) including cross-record consistency checks which can’t be done on the client. So the server may still respond with either “permission denied” or validation errors.

                                                                              1. 1

                                                                                Let me summarize what I think you said so I can respond to it.

                                                                                One concern is that 16 byte UUIDs are too big and would consume extra disk space (and therefore in-memory cache for those pages.) Since UUIDs’ prefixes are random, prefix compression wont be very helpful. With sequential IDs, you get better latency because you can fit more data in the in-memory index and better disk utilization because each row takes less space.

                                                                                You’re also saying that while UUIDs save you from having to ask for a new ID, that’s only a small part of ensuring data consistency (especially between entities in your data model.) You still have to do more complex validation when manipulating the database anyway. The small latency savings of not asking another service to provide a UUID is not worth ignoring these other consistency checks. (UUIDs may lull you into a sense of not having to deal with these problems.)

                                                                                I think the points above are valid and make sense.

                                                                                UUIDs have other benefits though. Lets say you have a streaming ingestion system. You need to stamp every incoming message with an ID. Relying on a centralized service to give you an ID per message is too slow; even requesting extents of message IDs requires the presence of the ID service at start up. In this way, UUIDs can increase decoupling between systems and improve availability. You may still have dependencies on other services for quota/permissions checking, but you don’t have to rely on the ID service.

                                                                                The ULID spec mentioned elsewhere seems to give you some of the best of both worlds. It’s prefix is based on time, so it has good prefix compression (still not as good as auto-increment IDs). It can be generated anywhere so it can improve availability.

                                                                                So we sorta have a tradeoff between storage and availability here.

                                                                                1. 2

                                                                                  ID generation can be batched to have amortized near-0 cost, and then you don’t need to burn all those extra bytes for ULIDs. Batch size can be chosen to allow for servers to prefetch ID batches long before they need them, giving a nice ops runway for addressing any issues with the generator before services start draining it.

                                                                                  Thoughtful architecture saves money and reduces operator stress.

                                                                                  1. 1

                                                                                    I agree with what you said. Still, even prefetching batches introduces a dependency on the ID service. It’s possible to avoid this with UUIDs. YMMV with how much this actually helps if you have other reasons for a dependency on the ID service (perhaps it’s a database that stores configs you need.)

                                                                              1. 5

                                                                                Universally unique identifier. Absolutely unique once you generate it ANYWHERE. It is possible because of using a large (128-bit) random
                                                                                number.

                                                                                Randomness doesn’t imply uniqueness.

                                                                                1. 7

                                                                                  Not only that, but most v4 UUIDs use six bits to say, essentially, “I’m a v4 UUID,” meaning that only 122 of the bits consist of random data. That’s why most UUIDs you see have a third “group” that starts with a 4 and a fourth group that starts with 8, 9, A, or B.

                                                                                  (Normally it would be excessively pedantic to point this out, but if you’re going to go to the trouble of writing an entire article about this, these missing bits of knowledge count against your credibility a little.)

                                                                                  1. 2

                                                                                    And the author is a CTO. This is frustrating.

                                                                                    1. 1

                                                                                      Randomness doesn’t guarantee uniqueness, but it is good enough for real world applications. (Assuming each client has access to a good RNG library and a good implementation of UUIDs. You can’t get get the MAC address in JS, for example, if you want to use the version of UUIDs that incorporates MAC addresses.)

                                                                                    1. 3

                                                                                      I don’t think this author is benchmarking correctly. It’s not possible to use an on-disk B-tree to look up things in 49 microseconds. A single disk read is like 10 ms. A single read on a fast SSD takes like 200 microseconds.

                                                                                      1. 2

                                                                                        It’s amazing that you only have to go up to 10^80 to hit the number of atoms in the universe. So if you use one atom to signify one digit in the Googleplex, you’ll run out of atoms by the time you get to 1/100 billion billion of the way to the Googleplex.

                                                                                        We outrun the real world way quicker than our representational technology.

                                                                                        1. 59

                                                                                          This is a great example of how programmers dress up arbitrary opinions to make them look like they were rationally derived. The cognitive dissonance that led to writing that C is not supported, but the near-complete superset C++ is supported, or that rust should not be supported because end developers do not use rust must have been nearly overwhelming.

                                                                                          1. 18

                                                                                            The cognitive dissonance that led to writing that C is not supported

                                                                                            Huh? The doc clearly says that C is approved both for end development and for use in the source tree. It’s not “recommended”, because apparently they’d rather use C++, but that’s not the same as not supporting it.

                                                                                            1. 10

                                                                                              None of our current end-developers use Rust.

                                                                                              … says more about Google than anything else.

                                                                                              1. 21

                                                                                                I read it as, “the biggest problem with Rust is that we didn’t make it, so how good could it possibly be?”

                                                                                                1. 3

                                                                                                  We interviewed a guy the other day who told me that he knows “insider sources” in Google and that they are abandoning Go for all new project starts and moving entirely to Rust.

                                                                                                  EDIT: to be clear, I didn’t believe this guy. I was relating the story because it would be funny that they don’t support Rust for Fuchsia but supposedly are abandoning Go in favor of it…

                                                                                                  1. 8

                                                                                                    This is not true.

                                                                                                    1. 3

                                                                                                      I know. We didn’t hire the guy either.

                                                                                                      1. 1

                                                                                                        o, sorry :(

                                                                                                        1. 1

                                                                                                          What do you hypothesize his motivation was for such a weird fabrication? Was he proseletizing rust in an interview? Were you hiring for rust devs and this just.. came out in the interview as a point of conversational commentary?

                                                                                                          Just curious. Seems a weird thing for a candidate to raise during an interview!

                                                                                                          1. 3

                                                                                                            I think he was trying to sound smart and “in the know.” He brought it up when we mentioned that some of our code is in Go, and mentioned that some of his personal projects were in Rust.

                                                                                                            Like I said above, we ended up not extending him an offer, for a lot of reasons.

                                                                                                            1. 1

                                                                                                              Also very easy to check?

                                                                                                        2. 0

                                                                                                          Hopefully not. I don’t want to deal with the influx of twenty-somethings fresh from university who believe starting at Google instantly added 20 points to their (already enlarged) IQ.

                                                                                                      2. 5

                                                                                                        It makes sense to support C++ and not C. C++ makes it easier to write safer code (RAII) and provides a higher abstraction level (templates).

                                                                                                        1. 4

                                                                                                          Those are useful features, but that’s the subset of C++ that isn’t “C functions, compiled with the C++ compiler, and avoiding undecorated casts”. If I’m allowed to use C++ then I’m allowed to use C.

                                                                                                          My original point remains: “it makes sense” is not “here are our axioms, and here are the conclusions derived from those axioms”. Programming language choice is usually - and rightly - a popularity contest based on what the team finds interesting, and I see no difference here when I look beyond the dressing.

                                                                                                          I compare this to an experience I had at FB, where the systems developers chose to question their language choice (mostly C++, some Java). Arguments were put forward in favour of and against Rust, D, Go, and maybe some other things that I’ve forgotten. Then the popularity contest was run in the form of a vote, and C++ won, so they decided to double down on C++. Not because [list of bullet points], but because it won the popular vote.

                                                                                                          1. 5

                                                                                                            Idiomatic C++ is qualitatively different from C, though the language is technically a superset. That in itself is enough to have a group disallow C but allow C++.

                                                                                                            You are right that it’s an imprecise process. But, popularity is a feature in itself. A popular language has more developers, more help information online, and more libraries. Why wouldn’t popularity be an important feature?

                                                                                                      1. 1

                                                                                                        Larry advocated modeless computing and succeeded in popularizing that paradigm. I wonder how the world would have looked like if modes had won out. Modes let you switch contexts, which I find fairly natural, but adds some mental state you have to track. Modeless computing requires more meta keys. This has become a nuisance on Macs, with their 4-5 meta keys.

                                                                                                        Anyone have any ideas on what alternate futures modal computing would have led to?

                                                                                                        1. 4

                                                                                                          Anyone have any ideas on what alternate futures modal computing would have led to?

                                                                                                          The future of programming is vi switching in and out of stomp mode on a human face, forever.

                                                                                                        1. 26

                                                                                                          One flipside of this is that, as a consultant, I’m incentivized to not talk publicly about my rates. If they’re publicly available, then companies can use that in bargaining against me. I try to charge as much as the company is willing to pay. Let’s say that’s 10 dollars a day.¹ If they know that I normally charge clients 8 dollars, they know I’m probably willing to settle for 8 dollars, and they can refuse unless I lower to the standard rate. They’re confident I’ll lower my rate than lose the business entirely, saving them money.

                                                                                                          The downside to this is that consultants don’t know how much other consultants charge. There was a Twitter thread a while back where a high-profile speaker raised her keynote fee to 5k plus T&E, and several people chimed in that she was vastly undercharging.

                                                                                                          ¹ (no I’m not charging 10 dollars a day, it’s at least 15.)

                                                                                                          1. 6

                                                                                                            If you think that your salary is too low, there’s also an incentive to not let your prospective next employers know. I know I’ve made that mistake once, and it won’t help you to get good offers.

                                                                                                            1. 1

                                                                                                              Yeah. It’d be nice to disown my comment about my salary just so this couldn’t happen to me. Is that an option, @pushcx?

                                                                                                              1. 1

                                                                                                                The disown link appears next to my comments if I go to “Your Threads.”

                                                                                                                1. 1

                                                                                                                  Interesting, I don’t see it. I’ll check the codebase for what causes it to appear.

                                                                                                                  1. 1

                                                                                                                    Looks like it is time bound? Would make sense as I also can’t disown some comments anymore from two days ago. But I certainly had this button after creating them for some time. But it should be at least 1 day to trigger ?!

                                                                                                                    Edit: It doesn’t make any sense, I can still disown comments from 29 days ago, but not from 2?

                                                                                                                    1. 2

                                                                                                                      It’s deleteable for a period (currently 14 days), then disownable. Disowning is seen as a way to let people walk away from old conversations without turning those discussions into an unreadable mess, not as a way to post anonymously.

                                                                                                            2. 6

                                                                                                              yeah, definitely - i think i didn’t write this as clearly as i could have - the point i was trying to make is that how and with who you talk about your salary matters, and talking about it 1:1 with a co-worker is probably much more useful than just dumping how much you make on twitter. this is a good example of why that can be.

                                                                                                              1. 3

                                                                                                                Twitter has such a variety of content and outrage that it’s not a great place often for many things, but for fora (such as this) that focus on actionable stuff for practitioners I think these datapoints can still be can be illuminating.

                                                                                                                Unionization is a pipedream in tech I think is a flawed idea in many ways but the first step to any sort of progress is people share their salary information. Limiting that information to your coworkers is I think a good way to breed resentment and upset a working environment, and also kinda reinforces a sort of provincialism that is already systemic in the tech industry: a bunch of GOOG or MSFT folks in NYC or SF or Seattle kvetching about their perf bonuses and kitchenette cutbacks to each other may be missing the bigger picture of flyover country and other nations.

                                                                                                              2. 3

                                                                                                                This is a really good point. I’ve noticed that recruiters are very aggressive in asking your current compensation. They would love to be able to look up your twitter handle and figure out how much you’re making, so they can figure out the lower bound that will make you switch jobs. Instead of putting it out on twitter, share information with your peers through private channels. Unlike consulting, general jobs have enough information through tools like levels.fyi to know what you’re worth, so posting about it on public channels is harmful to you

                                                                                                                1. 3

                                                                                                                  Information asymmetry prevents efficient price discovery and leads to one party making more money from the other. :)

                                                                                                                  1. 1

                                                                                                                    as a consultant, I’m incentivized to not talk publicly about my rates. If they’re publicly available, then companies can use that in bargaining against me.

                                                                                                                    Then thank you for taking a risk with this comment!

                                                                                                                    no I’m not charging 10 dollars a day, it’s at least 15.

                                                                                                                    I see what you did there ;)