1. 7

    I think this misses one approach: write the build script in the main language of the project (Java in this case). This requires a bit of code to kick of bootstrap spiral, which I think is a #! for Java11+ and a tiny shell script to compile and then invoke before that.

    More generally, I wish language build systems had the following capabilities:

    • building language’s notion of libraries/binaries/other artifacts with a set of rigid conventions
    • CLI affordance for running an arbitrary custom program in the language as a part of the build
    • Which is restricted to the end projects (such that dependencies can’t have custom builds)

    It should be easy to bootstrap arbitrary custom build almost out of nothing; when that is solved, you don’t need a second language to orchestrate the build.

    1. 1

      Is there some recommended post which explains the overall algorithm for getting from a math formula to canvas with pixels?

      1. 3

        SILE doesn’t mess about much with pixels, the default canvas output is (vector based) PDF. Here is a minimalist example. If you put the following in a text file called math.sil:

        \begin{document}
        \script[src=packages/math]
        \math{2 + 2 = 4}
        \end{document}
        

        Then assuming you have the default math font Libertinus Math on your system, the process of rendering it from the command line looks like this:

        $ sile math.sil
        SILE v0.12.0 (Lua 5.4)
        <math.sil>
        options.family = Libertinus Math
        [1]
        

        That output shows it successfully rendered 1 page which you will find in a file called math.pdf with your formula drawn on the page canvas (not in pixels, but you get the idea). In this tweet you can see a side-by-side of a more advanced math input and output.

        The project readme and manual not only have installation instructions but also two was to run sile without installing it at all (via Docker or Nix). Packages are available for Homebrew, Ubuntu, Arch Linux, NixOS, and others.

        Note the input can also be XML, like this math.xml you can render with sile math.xml:

        <sile>
          <script src="packages/math"></script>
          <math>2 + 2 = 4</math>
        </sile>
        
        1. 1

          Ah, I realized that I am being confusing here, sorry! What I am interested in is the algorithm the tool internally use to position math glyphs, not the algorithm to use the tool as a black box.

          I understand (in very general terms) how we go from a paragraph of utf8 encoded text to a picture of paragraph (selecting gliphs for symbols/ligatures, arranging glyphs into a line, line-breaking and hyphenating it into a paragraph). What I don’t understand even on conceptual level is how to do this with a math formula, where the input is a tree, and not a sequence, and the output is two dimensional, instead of being just a sequence of lines.

          1. 3

            In that case you probably already understand more than you think you do. Rendering math is not so very different from rendering text. But at the same time you probably underestimate how complicated it is to “shape” input text into a regular paragraph. Consider what happens when a word in the paragraph is drawn with a bigger font. The height of the entire line has to grow to accommodate it, right? What happens if the line is all n’s (“nnnnnn…”) vs. if if has letters with ascenders and descenders (“afdgL…”)? The line has to move anything above or below that word just a touch more to accommodate the shape of those letters. Down to the letter level positions of letters change relative to each other. Kerning space between v and a might be different than m and a. Those minute adjustments may end up changing where a line wraps. Naive implementations (M$ Word, browsers) only make an attempt to adjust in one directly, content only ever gets pushed later and later, previous lines do not get adjusted to work better with later ones. More advanced typesetting (LaTeX, SILE, etc.) will analyze whole paragraphs or pages to find the best combination of breaks and space adjustments to fit everything together. Most languages are even more complex, for example many languages have parts that stack on top of each other to form characters (Javanese example.

            All of this means that the process of shaping text (even non-math text) as you know it already involves 2D space and lots of positioning rules based on letter, accent, syllable, word, line, and other group relationships. This process even for a normal single word (much less paragraph) is already much more complex than people suspect. See for example Text Rendering Hates You.

            One you have all the tooling in place to do that properly, math isn’t a far reach. All the parts are already there. For SILE’s implementation it just builds on tooling that already exists. On a very general level, MathML is a widely used language for marking up pieces of a formula and describes how each piece relates to others around it. The symbols and glyphs themselves then have context (for example “inside a subscript” or “a function symbol 4 lines tall”) and font (with OpenType math features) takes care of serving up the right glyph shapes for that context. SILE then takes those glyph shapes, uses the MathML descriptions for how bits of a formula relate, and translates it to sizes and positions for the boxes it already uses to position bits of text.

            1. 1

              Thank you for the thorough reply, it is really helpful!

              “a function symbol 4 lines tall”

              That was the key for me! So by the time we get to SILE, we already have such “structural” information. And that information presumably can be computed from formula AST, and is independent of particular front, size and various other presentation properties. For some reason, I thought that deterring that “4 lines tall” needs to be done in SILE, but now I see that that can be done independently.

              1. 2

                Correct, by the time final placement of glyphs is done lots questions have been answered outside of SILE. SILE reads the input, figures out the font size, paper properties etc., groups things based on their relation in MathML, then asks the font (via a font shaper, Harfbuzz) for relevant info. For example it might ask “How tall is a 4 line integral symbol in Libertinus Math at 16pt?”, and it would get an answer back, then use that height to know where to typeset the related box of glyphs in a subscript.

      1. 43

        Here’s a fun anecdote (which is one of the many anecdotes that eventually educated my guess that UI intuitiveness is largely bullshit).

        In my corner of the world, we don’t have folders. I mean we don’t use them. Neither me, nor my computer-using peers from like 25+ years ago, when we were just learning about computers, had ever seen a real-life folder. The first time I actually saw one, for real, was at some point in 2008 or so, I think, when an American prof was visiting the research lab I was working at and he had a few with him.

        We do use things that are kind of like folders, but they look nothing like the icon. To make matters worse, one of the words we use to refer to those things that are kind of like folders is the same one used to translate the English word ‘file’. There is no direct equivalent for ‘file’ – we’d just call it a document instead, or, if it’s literally just one page, the literal translation of the word we have is ‘paper’.

        None of the mainstream operating systems were localized for our country until the mid ’00s or so. So a whole generation of computer users, including mine, grew up using computers that:

        • Used the term “folder” for something that we’d never seen and which we would’ve called a “file”
        • Used the term “file” for something you put into “folders”, which was basically the other way ’round for us (a file is something you put things in, not a thing you put in something else!)

        It took about 30 seconds to explain what each thing was and we all went our merry ways and used computers productively – which we still do, for that matter. The “local” equivalent for “file” now actually has a double meaning: it still denotes what Americans would call a folder (albeit a special kind of folder, more or less – specifically, one whose content is sorted in a particular way), but it’s understood that, if you’re referring to it in a computer context, it means something else.

        The fact that so many companies are spending so much time trying to make files and folders and their organization more intuitive is, IMHO, just another symptom of so many companies in our industry having ran out of relevant things to do in exchange for the money you pay them, so they settle for changing the part that’s easiest to bikeshed and most visible when it’s modified – i.e. the interface.

        What this article describes as novel – people just dumping all their files in a single directory, typically the desktop, since it’s the one you can reach with the least clicks, or maybe “Documents” if you have enough things – basically describes how everyone who’s not a computer nerd has been using personal computers since hard drives became cheap enough to render floppies relevant only for file transfer. So… roughly 30 years or so? In these 30 years, I’ve seen maybe half a dozen neatly-organised collections of folders and files, all of them belonging to programmers like myself, or people in other technical fields (e.g. various flavours of engineering). At best, most people will maybe have some folders on their desktop called “Work”, “Personal” and “Shit”, which they never refer to as “folders” but as “Work”, “Personal”, and “Shit”. That’s the apex of document organisation.

        The one difference that even cheaper local storage, along with cloud storage has brought, is that concepts like “multiple drives” are now pretty much irrelevant for many computer users.

        Also, I like to point out things like these:

        “As much as I want them to be organized and try for them to be organized, it’s just a big hot mess,” Vogel says of her files. She adds, “My family always gives me a hard time when they see my computer screen, and it has like 50 thousand icons.”

        every time someone in UX wants to sell me on a “clean” and “polished” interface that can show like twelve icons at a time, at most, all of them huge and with acres of space between them. It’s incredible how a field that literally has “user” in its name is so incredibly disconnected from how users, uhm, use computers.

        1. 16

          a file is something you put things in, not a thing you put in something else!

          This is actually true in English too. Trying to teach people about “files” and “directories” (or, later, “folders”) in the ‘90s was really hard: no-one could understand why the document was called a file when a file is a thing you store documents in. I used to describe it as a “file” for bytes, or something like that, but I don’t really know why they called it a file.

          1. 19

            I think it may date to mainframe operating systems or COBOL where what a “file” contained was “records”.

          2. 7

            What I find interesting is the disconnect between real-world organisation and the attempts of translating this (and failing) into 2D on a screen. I have the desktop organisation problem of “icons accumulating” -> moved into “world domination 3” and creating a 4 when that is too full. Neither are explored for what they are – it is either “browse as thumbnails” or grep and nothing in between. “Minimalism” didn’t work, “skeuomorphism” didn’t work, ontologies didn’t work. xdg-user-dirs makes me want to punch someone – there’s more documents in Downloads than in Documents.

            At the same time there is, at the least, a few hundred ‘data stores’ in my lab in various bins, and if I close my eyes and think a little bit, I can mentally walk through what is on the cupboards in the bathroom and most everything in the kitchen and closets with fair accuracy. Nothing of this translates to the desktop.

            1. 6

              I found that while ontologies don’t work, as in, they don’t work for every case, they do work in certain situations and they emerge organically when needed.

              Personal example: while my downloads/documents are a dumping ground, my actual documents (invoices, contracts, etc.) are uploaded with relevant tags and dates to Zoho docs.

              Business: almost every company I’ve seen which has a shared drive has a pretty good tree of documents not handled by a specialised services. For example invoices/plans per customer, documentation per project, etc. I’m not aware of any of them getting someone to plan/organise it. It’s just what happens naturally when you have a team of people who need to refer to those files daily.

              1. 5

                there’s more documents in Downloads than in Documents.

                On some systems this is a consequence of web browsers configured to download all files in a fixed place (eg. ~/Downloads). I always make sure to disable this option and get the browser to ask me where to place each document. They often go to /tmp, which will be wiped on reboot, but for the ones I plan to keep I have to place them in the directory tree right away.

                I understand that this “everything goes to Downloads” option was chosen as a default because most users don’t have an establihed folder hierarchy they care about – so asking them to place each document is a burden to them – but it also reinforces this tendency to have dumping grounds instead of structure. I wonder what a good UI design to nudge people towards more organization (when useful) would be.

                1. 5

                  RISC OS has always worked like that: when you ask to save a new file (from any program, not just a browser) there is no default location. It just shows a little save box with the file icon and name, and you have to drag it to a folder somewhere:

                  https://www.riscosopen.org/wiki/documentation/show/Quick%20Guide:%2011.%20Drag%20to%20Save

                  I tried to get Linux to work like that a long time ago (http://rox.sourceforge.net/desktop/node/66.html) but it didn’t catch on, and most Linux desktops copied the Windows UI where everything ends up in a big unordered mess by default.

                  1. 2

                    I tried to get Linux to work like that a long time ago (http://rox.sourceforge.net/desktop/node/66.html) but it didn’t catch on

                    This is great as one of options. If I already have opened a file manager with given folder, I would be happy if I can just drag the icon from an application to save the file. But in other cases when I have no file manager window opened, I want to save the file through the standard save dialog, because starting a file manager and dragging the icon will be cumbersome and annoying.

                    I would appreciate such draggable icon especially in the screenshot application. However it does not fully replace the save dialog.

                    1. 3

                      because starting a file manager and dragging the icon will be cumbersome and annoying

                      On RISC OS the file manager is always running, and you usually just keep open the folder(s) for the project you’re working on. For power users, the drag bit can get annoying; I actually wrote a little utility (called TopSave) that made it save to the top-most folder in the window stack if you press Return in a save box with no existing path.

                      You really don’t want a file-manager inside the save box though because:

                      1. It takes up a load of room, possibly covering the folder you want to drag to.
                      2. It will inevitably open in the wrong place, whereas you probably already have the project folder open (and if not, you can get to it faster using your regular desktop shortcuts).
                      3. If the filer is in the save box, then it goes away as soon as you finish the save operation. Then you can’t easily use the same folder for the next part of the project (possibly with a different application).

                      The Unix shell actually feels similar in some ways, as you typically start by cding to a project directory and then running a bunch of tools. The important point is that the directory persists, and you apply multiple tools to it, rather than documents living inside apps.

                      1. 1

                        Disclaimer: I wrote a utility for Windows to deal with this in the stock file dialogs: https://github.com/NattyNarwhal/OpenWindows

                2. 5

                  This is one of the things I miss about “proper” spatial file managers, clunky as they were. The file-and-folder organisation method inherently retains the limits of the real-life equivalent that it drew inspiration from which, limited though it may be, is actually pretty reliable (or should I say “was” already?) – IMHO if it was good enough to develop antibiotics or send people to the Moon, it’s probably good enough for most people today, too.

                  But in addition to all of those limits, it also ended up with a few of its own, that greatly diminished its usefulness – such as the fact that all folders look the same, or that they have no spatial cues. There’s no efficient computer equivalent for “grab the red folder called ‘World Domination through Model Order Reduction’ from that big stack on the left of the middle shelf” – what would require sifting through four or five folders at the top of a stack devolves into a long sequence of clicks and endless scrolling.

                  Users unsurprisingly replicated the way these things are used IRL, with some extra quirks to work around the additional limitations, like these huge “World Domination” folders for things that you probably want to keep but aren’t really worth the effort of properly filing inside a system that not only makes it hard to file things in the first place, but makes it even harder to retrieve them afterwards.

                  Spatial file managers alleviated at least some of that, not very well, but better than not at all. They fell out of use (for a lot of otherwise valid reasons, though many of them relevant mostly for low-res screens) quite quickly, unfortunately. The few that remain today are pretty much useless in spatial mode because their “clean” interfaces don’t lend themselves easily to browsing more than a handful of items at a time. It was not even close to the efficiency of cupboards and shelves, but it was a little better.

                  Most of the software industry went the other way and doubled down on filing instead, drawing inspiration from libraries, and giving people systems that worked with mountains of metadata and rigid hierarchical systems, on top of which they sprinkled tags and keyword searches to alleviate the many problems of rigid hierarchical systems. IMHO this is very much short-sighted: it works for libraries and librarians because filing things is literally part of a librarian’s job, and librarians have not just a great deal of experience managing books & co. but also an uncanny amount of specialised education and training, they don’t just sit in the library looking at book covers. And even they rely enormously on spatial cues. Besides being mostly unworkable for people who aren’t librarians, most of these systems are also pretty inefficient when dealing with information which is unlike that which goes in a library – well-organised, immutable (books may have subsequent editions but you don’t edit the one in a library) bundles of information on a fixed set of topics, with carefully-curated references to other similar works, which you have to find on demand for other people for the next 30 years or so. Things you work with on a daily basis are nothing like that.

                  1. 4

                    But in addition to all of those limits, it also ended up with a few of its own, that greatly diminished its usefulness – such as the fact that all folders look the same, or that they have no spatial cues.

                    One of the nice things about OS/2’s Workplace Shell was that you could customize the appearance of folders — not just the icon, but also the background of its open window, and probably some other things I don’t remember. Pretty sure that MacOS 8 (maybe later versions of System 7?) let you at least color-code folders.

                    1. 6

                      Mac OS has supported custom folder icons since 1991, and special folders like home, Downloads, etc. have special icons by default. Colors have been supported since about 1987, but lately they’ve been repurposed as tags, and these days the folder itself isn’t colored, there’s just a colored dot next to it.

                      Pre-X, folders had persistent window positions and sizes, so when you reopened a folder it kept the same place onscreen. This really helped you use visual memory. Unfortunately the NeXT folks never really “got” that, and this behavior was lost in 10.0.

                      1. 4

                        Yep! On the Linux side, Konqueror and I think Nautilus up to a point allowed this, too, IIRC Finder dropped it a long time ago. Most file managers dropped it, lest users would commit design heresy and ruin the consistency of the UI by indulging in such abominable sin as customising their machines. Most contemporary file managers just have basic support for changing folder icons (and quite poorly – I don’t know of any popular manager that adequately handles network mounts or encrypted folders, for example).

                      2. 1

                        I have a few ongoing “experiments” in this space. They are quite slow moving as they all take 90% engine-development, 10% implementing the concept. Experiment is a bit of a misnomer as the budget for modelling, generalisation and qualitative user studies is quite ehrm, anaemic.

                        simple - A damage control form of the ‘most of the software industry’ form.

                        1. User-defined namespaces (so a tag / custom root).
                        2. Indexing and searching is done per/namespace and not the finder-i-can’t-find-her. Don’t want to search my company docs when it is my carefully curated archives of alt.sex.stories-repository I am after.
                        3. Navigation-map, forcing a visual representation to be sampled for each document, stitched together into larger tilemaps.

                        wilder - A form of what the mobile phone- space does (ignoring Android EXTERNAL_STORAGE_SDCARD etc.)

                        1. Application and data goes together (me suggesting coupling? wth..) VM packaged, I might need 4 versions of excel with absolutely no routable network interfaces way too often. Point is, the software stays immutable, VMM snapshot / restore becomes data storage controls. “File Association” does not bleed outside the VM.
                        2. Application (or guest additions for the the troublemakers) responsible for export/import/search.
                        3. Leverage DnD/Clipboard like semantics. I take it you are familiar, but for the sake of it - DnD etc. involve type negotiation already: source presents sets of possible export types, sink filters that list, best match is sent. Replace the sink with a user-chosen interface (popup, context-sensitive trigger, whatever) .

                        all-the-way-to-11: That it would take me this long to mention AR/VR.

                        1. The VR WM I have was built with this in mind, a workspace (or well, safe-space) is a memory palace.
                        2. The layouter (WM policy) is room-scale (Vive and so on).
                        3. Each model added represents either a piece of data as is, or as an expandable iconic representation of something from the simple/wilder cases.
                        1. 1

                          Indexing and searching is done per/namespace

                          Why have ‘namespace’ as built in instead of an arbitrary user-definable tag?

                          forcing a visual representation to be sampled for each document

                          Probably some interesting things to be done with text viz, following cantordust. But I don’t know if you can get it to be both distinctive and stay stable as a document changes. And of course the whole zoo of other non-image formats—audio, zip/tar/, iso, subtitles, executables, random noise, …—need to be handled. And if you’re not careful with your processing that’s a DOS.

                          1. 2

                            Why have ‘namespace’ as built in instead of an arbitrary user-definable tag?

                            Externally defined tag so that it can be combined with system services, e.g. mounting daemon triggered arcan_db add_appl_kv arcan ns_some_guid some_user_tag

                            Probably some interesting things to be done with text viz, following cantordust. But I don’t know if you can get it to be both distinctive and stay stable as a document changes. And of course the whole zoo of other non-image formats—audio, zip/tar/, iso, subtitles, executables, random noise, …—need to be handled. And if you’re not careful with your processing that’s a DOS.

                            You mean like Senseye? quite certain that one went a lot further than cantor :-P. I didn’t exactly stop working on it – just stopped publicising/open sourcing.

                      3. 4

                        Nothing of this translates to the desktop

                        You wander into TikTok and hit a specific icon and scroll for 3 pages and the thing you want is now on screen.

                        Poor Unix. All this time with a single root while users gravitate to stuff stored under multiple roots/apps.

                      4. 4

                        The “folder” metaphor was already sort of niche when folks at Xerox PARC invented it for the Star in the late 70s. People who worked in offices used them, but I’m sure a lot of the US population didn’t. And the metaphor never worked for hierarchies anyway. Still, it was useful for its initial target audience.

                        Humans just aren’t good at mental models of hierarchies or recursive structures. We use them anyway, of course, because they’re essential, but they don’t come naturally.

                        1. 6

                          You’ve made me realize I actually use folders like real folders. I have a mostly flat Documents folder, and the only folders inside it exist to group a small amount of related documents. Taxes 2021, Camera Manuals, and so on and so forth. Nothing nested more than 1 level deep, no folders used as “categories” or any other kind of hierarchical concept.

                          Finding Generic Form Name.pdf without context of a folder would be so annoying. I definitely don’t rename things, so they have to at least be in folders.

                          1. 4

                            In my corner of the world, we don’t have folders. I mean we don’t use them. Neither me, nor my computer-using peers from like 25+ years ago, when we were just learning about computers, had ever seen a real-life folder.

                            As I understand it, “folder” means this sort of thing, but what do you call this sort of thing, which is more common (even today)? In e.g. Dutch there are separate words for this (“map” or “ordner” for the second one, “map” typically used to translate “folder”), but I’m not sure about English?

                            Either way, I think this doesn’t really matter; it’s essentially about the mental model of a hierarchical file structure, and whether you call it “folder” with an etymology some people may not follow or something else isn’t all that important.

                            I don’t think hierarchies are all that unintuitive; there are many (simple) ones in every-day life: in a library you have “fiction” further subdivided in categories, and “science” further divided in categories, etc. On a restaurant menu it’s the same: “starter/vegetarian/[..]”. In Amazon.com there’s a whole structure for products, etc. These are essentially not all that different.

                            1. 6

                              That sort of thing we call a binder.

                              1. 2

                                Vegetarian / vegan is more like a tag than a component of a hierarchy. Same with other common menu tags like spicy, gluten-free, and so on. They can apply to any menu item regardless of category.

                                On Amazon I rarely use the category hierarchy, and stuff I’m looking for often legitimately falls under multiple categories in the hierarchy.

                                1. 2

                                  In the Nordic languages and Icelandic, we also say “mappe”/“mapp”/“mappa” for a folder, but I would say that’s the name of the first physical thing. The second thing is definitely a “ringperm” (no computing analogy).

                                  But what about a file? We have the word “fil”, which in its physical form is the same tool as an English “file” – the prison escape tool. Just unambiguous. Maybe an unfortunate analogy, but at the same time so nonsensical that there is no confusion – people take it from context, and you can always say computer file (”datafil”) to be precise. Edit: LOL, you do the same in Dutch: “computerbestand

                                  1. 2

                                    Oh, that second kind of thing is even cooler: the word we use for it is a portmanteau of the words used for “library” and “shelf”. Due to its extensive use in public administration, this object is so loathed that I doubt anyone would try to use it in an interface, except maybe in order to sabotage their own company :-P.

                                2. 4

                                  That’s a very interesting perspective. If you don’t mind sharing, what is your native tongue?

                                  FWIW I use a hierarchical structure in my documents and a date-based structure for photos. I can’t bear the idea I need to run a program that eats dozens of gigabytes of disk space and more than an entire CPU code to index all of those things just so I can press a shortcut and type in a few letters of the file I am looking for. I hate to say it, but with my minimal investment in a mental model, the man pages for ‘find’ and ‘rg’, and a graphical preview in my file browser have largely eliminated the ongoing cost of file indexing for me. I started on computers with a Z80 and an 80386 processor. “Waste” is a hard coded thing to shed at every opportunity for me.

                                  1. 3

                                    |one of the many anecdotes that eventually educated my guess that UI intuitiveness is largely bullshit

                                    In humans, intuition is bullshit. Everything is learned.

                                    Somebody quipped “The only intuitive interface is the nipple; after that it’s all learned.” It might have been Bruce Ediger. Doesn’t matter who it was: turns out that humans don’t have much of an intuition for nipples, either. Breastfeeding techniques need to be learned – babies have an instinct to suck on something that’s tickling their lower lip, but mothers have no instincts about it at all. There is a chain of teaching that goes back to, very likely, the first primates that held their babies off the ground – and we only know it exists, rather than being “intuitive” or “instinct”, because of the successful marketing efforts of formula companies in the twentieth century.

                                    1. 3

                                      There is no direct equivalent for ‘file’ – we’d just call it a document instead

                                      I might be misunderstanding something (not a native English speaker, and reading between the lines), but it seems that you think that “document” and “file” are synonyms in English in their “physical” sense.

                                      What “file” means in English is, first, a verb with meaning to organize, or to submit: there’s “fileing cabinet”.

                                      As a noun, file is literally a folder :)

                                      Definition of file (Entry 5 of 8)

                                      1 : a device (such as a folder, case, or cabinet) by means of which papers are kept in order

                                      https://www.merriam-webster.com/dictionary/file#other-words

                                      1. 4

                                        it seems that you think that “document” and “file” are synonyms in English in their “physical” sense.

                                        Ah, no, I only meant this in the “computer” sense :). Way back (this was in the age of Windows 98, pretty much) when I tried to explain what folders and files were, the question that always popped up was “this is just an image I drew in Paint, how is this a file”, followed closely by “is this a file, as Windows Explorer claims it is, or a document, as Word calls it?”. Hence this… weird thing.

                                    1. 4

                                      One rarely mentioned technique for dealing with errors: remove the possibility of the error at all.

                                      Using the article’s example

                                      // library
                                      pub fn count_words(lines: &mut dyn Iterator<Item = String>) -> u32 { 
                                          ...
                                      }
                                      
                                      // binary 
                                      fn main() {
                                          ...
                                          let mut err = Ok(());
                                          let mut iter = reader
                                              .lines()
                                              .map_while(|line| line.map_err(|it| err = Err(it)).ok());
                                          let wordcount = wordcount::count_words(&mut iter);
                                          err?;
                                          ...
                                      }
                                      
                                      1. 7

                                        This is typical for “actor” libraries in languages with async/await. It gives you a very different programming model than that of Erlang: a model where it’s easier to express concurrent processes, but much harder to manage concurrent access to the state correctly (https://matklad.github.io/2021/04/26/concurrent-expression-problem.html).

                                        Prescriptivist in me screams that calling this “actor model” is wrong, but realist feels that it’s more productive to use something like “strict actror model” for things like Erlang or https://uazu.github.io/stakker

                                        1. 4

                                          I feel like in Erlang/Elixir you would just not put the authorization in an async block. The whole point of “actors” is that it’s a “unit of concurrency”. If you hadn’t put authorization in an async block, then it would be obvious that nothing could reenter. (In elixir, if you do a stdlib Task.async call that spawns a thread still nothing could reenter, so long as your await is in the same function block)

                                          I guess that’s not the case in swift? If you perform an async call, it instantiates suspend-back-to-the-actor points that you didn’t ask for? Man, what a footgun.

                                          1. 2

                                            There are two different approaches to re-entrancy in actors, and IMHO each comes with its own footguns.

                                            In the Erlang style (which I admittedly have not used), actors can easily deadlock by calling each other in a cycle. That kind of nastiness is one of the reasons I switched to using (async-based) actors.

                                            1. 6

                                              I’ve been doing elixir for four years now and I can count on my hands the number of times I’ve deadlocked, and 0 times in prod.

                                              Part of it is that genserver comes with a sensible timeout default, and part of it is that you just usually aren’t writing concurrent code; your concurrency is much, much higher level and tied to the idea of failure domains.

                                              If you’re running into deadlocks that often in BEAM languages *especially elixir, which gives you task, which shouldn’t deadlock unless you do something way too clever) you’re probably overusing concurrency.

                                              1. 3

                                                Actors can’t deadlock unless you allow selective receive (which is not part of Hewitt’s original model), but it’s difficult to imagine a pragmatic actor implementation that doesn’t allow it.

                                            2. 4

                                              Prescriptivist in me screams that calling this “actor model” is wrong, but realist feels that it’s more productive to use something like “strict actror model” for things like Erlang or https://uazu.github.io/stakker

                                              It’s funny because apparently the creator of the actor model says that Erlang doesn’t implement it and the creators of Erlang agree.

                                              Edit: Eh, that might have come off as snide. I mean to say it’s funny because it feels like a no true Scotsman fallacy.

                                              “Akka implemented the actor model in Scala.”

                                              “Well, that’s a library! Swift made it a part of the language!”

                                              “Swift allows suspend points within it’s actors. Erlang is the strict actor model”.

                                              “No, Erlang doesn’t even implement the actor model! But Pony does!”

                                            1. 5

                                              Define your destructors outside the class declaration so they don’t get inlined by the compiler in both functions — the caller and the callee that returns the optional — to avoid binary size increase.

                                              As we recently found out, that’s a problem for Rust as well: https://github.com/rust-lang/rust/issues/88438

                                              1. 3

                                                Interesting. Both langauages have similar challenges.

                                                One huge advantage of Rust is that the compiler can statically determine a destructor is not necessary after moving a value. That helps with the problem I’ve described in the article.

                                                1. 1

                                                  AFAIK the code to statically determine whether a destructor needs to be run after a move in Rust is:

                                                  return false;
                                                  
                                              1. 3

                                                Context: error resilient parsing is the secret sauce of modern IDEs. It’s a simple idea, but it isn’t well know. This video shows how this is done in rust-analyzer. It’s a part of an on-going series, but should be relatively understandable in isolation. Ideally, this should be a separate blog post, but I thought this would be interesting to some folks as is!

                                                1. 2
                                                  1. 1

                                                    This is a fantastic presentation! I’ve been reading the first part over the weekend. Would you post it as a separate story? If you do, please link from here so I don’t forget to upvote. (And if for some reason you don’t want to, I’ll post it.)

                                                      1. 1

                                                        Great!

                                                  1. 13

                                                    Genuine comment (never used Nix before): is it as good as it seems? Or is it too good to be true?

                                                    1. 51

                                                      I feel like Nix/Guix vs Docker is like … do you want the right idea with not-enough-polish-applied, or do you want the wrong idea with way-too-much-polish-applied?

                                                      1. 23

                                                        Having gone somewhat deep on both this is the perfect description.

                                                        Nix as a package manager is unquestionably the right idea. However nix the language itself made some in practice regrettable choices.

                                                        Docker works and has a lot of polish but you eat a lot of overhead that is in theory unnecessary when you use it.

                                                      2. 32

                                                        It is really good, but it is also full of paper cuts. I wish I had this guide when learning to use nix for project dependencies, because what’s done here is exactly what I do, and it took me many frustrating attempts to get there.

                                                        Once it’s in place, it’s great. I love being able to open a project and have my shell and Emacs have all the dependencies – including language servers, postgresql with extensions, etc. – in place, and have it isolated per project.

                                                        1. 15

                                                          The answer depends on what are you going to use nix for. I use NixOS as my daily driver. I am running a boring Plasma desktop. I’ve been using it for about 6 years now. Before that, I’ve used windows 7, a bit of Ununtu, a bit of MacOS, and Arch before. For me, NixOS is a better desktop than any of the other, by a large margin. Some specific perks I haven’t seen anywhere else:

                                                          NixOS is unbreakable. When using windows or arch, I was re-installing the system from scratch a couple of times a year, because it inevitably got into a weird state. With NixOS, I never have to do that. On the contrary, the software system outlives the hardware. I’ve been using what feels the same instance of NixOS on six different physical machines now.

                                                          NixOS allows messing with things safely. That’s a subset of previous point. In Arch, if I installed something temporarily, that inevitably was leaving some residuals on the system. With NixOS, I install random on-off software all the time, I often switch between stable, unstable, and head versions of packages together, and that just works and easy rollbackabe via entry in a boot menu.

                                                          NixOS is declarative. I store my config on GitHub, which allows me to hop physical systems while keeping the OS essentially the same.

                                                          NixOS allows per-project configuration of environment. If some project needs a random C++ package, I don’t have to install it globally.

                                                          Caveats:

                                                          Learning curve. I am a huge fan of various weird languages, but “getting” NixOS took me several months.

                                                          Not everything is managed by NixOS. I can use configuration.nix to say declaratively that I want Plasma and a bunch of applications. I can’t use NixOS to configure plasma global shortcuts.

                                                          Running random binaries from the internet is hard. On the flip side, packaging software for NixOS is easy — unlike Arch, I was able to contribute updates to the packages I care about, and even added one new package.

                                                          1. 1

                                                            NixOS is unbreakable. When using windows or arch, I was re-installing the system from scratch a couple of times a year, because it inevitably got into a weird state. With NixOS, I never have to do that. On the contrary, the software system outlives the hardware. I’ve been using what feels the same instance of NixOS on six different physical machines now.

                                                            How do you deal with patches for security issues?

                                                            1. 8

                                                              I don’t do anything special, just run “update all packages” command from time to time (I use the rolling release version of NixOS misnamed as unstable). NixOS is unbreakable not because it is frozen, but because changes are safe.

                                                              NixOS is like git: you create a mess of your workspace without fear, because you can always reset to known-good commit sha. User-friendliness is also on the git level though.

                                                              1. 1

                                                                Ah I see. That sounds cool. Have you ever had found an issue on updating a package, rolled back, and then taken the trouble to sift through the changes to take the patch-level changes but not the minor or major versions, etc.? Or do you just try updating again after some time to see if somebody fixed it?

                                                                1. 4

                                                                  In case you are getting interested enough to start exploring Nix, I’d personally heartily recommend trying to also explore the Nix Flakes “new approach”. I believe it fixes most pain points of “original” Nix; two exceptions not addressed by Flakes being: secrets management (will have to wait for different time), and documentation quality (which for Flakes is now at even poorer level than that of “Nix proper”).

                                                                  1. 2

                                                                    I didn’t do exactly that, but, when I was using non-rolling release, I combined the base system with older packages with a couple of packages I kept up-to-date manually.

                                                            2. 9

                                                              It does what it says on the box, but I don’t like it.

                                                              1. 2

                                                                I use Nixos, and I really like it, relative to how I feel about Unix in general, but it is warty. I would definitely try it, though.

                                                              1. 2

                                                                I am curious about this definition:

                                                                enum class byte : unsigned char {} ;
                                                                

                                                                Does it give you an ability to “access raw memory occupied by other objects” by virtue of inheriting from char, or is some compiler special casing required? If I make my own byte which inherits from char, will this opt me out of TBAA?

                                                                1. 2

                                                                  No, std::byte gets special dispensation, like [unsigned] char, to free itself from strict aliasing in the standard. If you do it yourself, it won’t work: https://gcc.godbolt.org/z/Whs1Y9drE

                                                                  1. 2

                                                                    This is not a real inheritance, this is a specification of the underlying type (which can only be a fundamental integral type). The main use is to prevent the change of the underlying representation when changing the enumerator set/values. And, yes, you still need an explicit cast.

                                                                    EDIT: See http://eel.is/c++draft/dcl.enum, specifically verses 5-10.

                                                                  1. 9

                                                                    Is there some write up which spells out Alpine+Rust technical problem exactly? From this article, I infer the following:

                                                                    • Alpine has two year support cycle, and they need to stick to the same version of the compiler throughout the cycle.
                                                                    • Rust, however, releases every 6 weeks, and only officially supports the latest stable. Eg, a security issue found in the compiler will be backported to the current stable, but not to the stable from two years ago.

                                                                    Would this be a fair summary of the problem?

                                                                    1. 31

                                                                      It sounds like it. Note that Clang has the same problem: LLVM has a release every 6 months and upstream supports only the latest version. In FreeBSD, we maintain our own backports of critical issues and move the base system’s toolchain to newer versions forward in minor releases (a major release series has about a five-year support lifecycle).

                                                                      This is not ideal from a stability perspective because sometimes a newer clang can’t compile things an older clang could but this also needs to be balanced against the desire of people to actually compile stuff: If we didn’t ship to a base system compiler that supported C++17 in FreeBSD 12 (EOL 2024) then by the end of its support lifecycle the base system compiler would be a waste of space and everyone would use one from ports (most of the stuff that I work on has been C++17 for a while and is moving to C++20 at the moment).

                                                                      Note that the reason that we have clang in the base system at all is that POSIX requires a cc binary. Without that, we’d probably move to supporting clang only in ports (where we carry multiple versions and anyone who wants to install an old one can, putting up with the bugs in the old one instead of the bugs in the new one).

                                                                      I don’t have a huge amount of sympathy with a lot of these complaints because they often come from policies that the packaging system is enforcing. You want to maintain a complete set of packages for two years with security backports? That means that you are committing to back-porting security fixes for two years. If you’re RedHat then you might have customers who are willing to do that (so that they can then run Alpine / Debian / Ubuntu in containers because nothing in their host OS is sufficiently recent to be useable) but if you don’t then you need to ask yourself why you want to make that commitment. Can you relax it and say that you’ll bump the version of the Rust compiler?

                                                                      Packaging policies should first be driven by what is possible, then by what users want. Promising to do the impossible or promising to stick to some arbitrary standards that users don’t actually care about doesn’t help anyone.

                                                                      1. 17

                                                                        I don’t have a huge amount of sympathy with a lot of these complaints because they often come from policies that the packaging system is enforcing. You want to maintain a complete set of packages for two years with security backports? That means that you are committing to back-porting security fixes for two years.

                                                                        Exactly! As someone who’s been on the receiving end of, “Bbbbut but we promised client X we would provide these guarantees even though we have no way to enforce them and it’s not our lane! We (and by ‘we’ I mean you) have to find a way to provide said guarantees!”

                                                                        Note that in the OP, and unlike in my anecdote, it looks like Alpine is doing the right thing here by moving those packages they can’t support into community, so kudos on making the hard uncomfortable calls.

                                                                        1. 3

                                                                          I don’t have a huge amount of sympathy with a lot of these complaints because they often come from policies that the packaging system is enforcing. You want to maintain a complete set of packages for two years with security backports? That means that you are committing to back-porting security fixes for two years.

                                                                          Of course it’s a policy that someone made up. The point is that it’s a useful policy for users. I think software developers in general often have quite different interests from the people who run their software—they want to move fast and break things, while as a sysadmin I just don’t have the bandwidth to keep up with everything breaking all the time. Distros don’t backport just for the sake of it; they’re bridging a gap between what FOSS developers want to make and what users need.

                                                                          But also, in general, they do backport. I think it’s reasonable to complain when upstream projects make that intractably difficult, though, to the extent that it’s reasonable for anyone to complain about something that’s free. As a user who relies on stable distros, if they weren’t complaining for me, I’d be complaining myself.

                                                                          1. 5

                                                                            The point is that it’s a useful policy for users

                                                                            No, the point is that it may be useful in some cases for some users in some contexts and probably was useful for a large subset of visible users at the time that the policy was created.

                                                                            I think software developers in general often have quite different interests from the people who run their software—they want to move fast and break things, while as a sysadmin I just don’t have the bandwidth to keep up with everything breaking all the time

                                                                            As a developer, I want access to the latest versions of my tools and libraries. That’s easy.

                                                                            As a sysadmin, I don’t want security vulnerabilities in the things that I deploy. I also don’t want unexpected changes that break things for my users. In the LTS model, these two constraints very often come into conflict. Security vulnerabilities are easy to fix in the mainline branch but if you want back-ports then someone needs to do that work. If you want volunteers to do that, then you’re asking them to do work. If you’re RedHat (IBM) then you’ve got a load of customers who are paying you to pay engineers to do that. If you’re FreeBSD or Alpine? You have users demanding it but not being willing to do the work or pay for it, so you have to ask why you’re devoting effort to it (in the case of FreeBSD, most of the big companies that use it run -HEAD so don’t care about this at all).

                                                                            As a user who relies on stable distros, if they weren’t complaining for me, I’d be complaining myself.

                                                                            How much are you / your employer paying (and who are they paying) to ensure that you have support for a stable distro?

                                                                          2. 2

                                                                            the reason that we have clang in the base system at all is that POSIX requires a cc binary

                                                                            I thought it has more to do with the base/ports split and the “base builds base” tradition. It’s not impossible to just ship system images with a pkg-installed llvm :)

                                                                            (Also who cares about that aspect of POSIX. Not the Linux distros where you have to pacman -Sy gcc, haha)

                                                                        1. 31

                                                                          Python is as messy, if not messier, than Node

                                                                          I’d say Node is, like, not messy at all. Due to Node being younger, its ecosystem developed and matured in an age where per-project isolated local dependencies were becoming the norm.

                                                                          Python is arguably uniquely messy due to the invention of virtualenv. I’m not aware of any other language community where the default way to make isolated projects for ages was a thing that made a whole local prefix with symlinks to the interpreter and whatnot and a script you’d have to run to bring it into $PATH. Python’s closest cousin (in terms of 2000s web dev evolution anyway), Ruby, did it all correctly relatively early on — the Bundler experience is just like npm/cargo/stack/etc.

                                                                          But forget virtualenv, the real messiness of Python packaging is the freaking layering. Distutils, setuptools, pip, virtualenv (or the new project-based managers) — it’s all there and which part is handled by which layer gets confusing quickly.

                                                                          1. 10

                                                                            The thing that worries me is that I don’t think the core Python devs get it: the packaging situation is so bad that it very well may kill the language. It should be all hands on deck, this is the most important thing for us to fix. But I don’t see that happening at all, and instead there’s fiddling with esoterica like walrus and match…

                                                                            1. 13

                                                                              So one interesting twist here is that many core devs work for big companies that use their own build and deployment systems for Python. (e.g. I worked with probably a dozen core Python devs at Google many years ago.) So they may not feel the issue on a daily basis. I certainly feel it more now that I do more open source, although I was aware of it back then whenever I had to install NumPy, etc.

                                                                              From what I hear Jane St. is in a similar situation with OCaml and OPAM. They sponsor the open source package manager, but they don’t actually use it themselves! Because they use a monorepo like Google. monorepo means “no version constraint solving”, which simplifies the problem drastically (it’s an NP complete problem).


                                                                              I also think the problem is more complicated than “core devs don’t get it”. It’s more like the solutions being very constrained by what happened in the past. For a long time the import system / algorithm itself was a very big wart and there was a huge effort to clean it up.

                                                                              I will say that I think most of these problems were known before Python 3, and I wish there was a hard break then, but that was already 12-15 years ago at this point. And then people would have probably liked the 2->3 transition even less, etc.

                                                                              1. 5

                                                                                So one interesting twist here is that many core devs work for big companies that use their own build and deployment systems for Python. (e.g. I worked with probably a dozen core Python devs at Google many years ago.) So they may not feel the issue on a daily basis. I certainly feel it more now that I do more open source, although I was aware of it back then whenever I had to install NumPy, etc.

                                                                                This is definitely the case with my cloudy overlords, although I will say that this may be changing. I think some people are recognizing that there is wisdom in not rolling their own and allowing devs to leverage familiar interfaces for packaging.

                                                                              2. 4

                                                                                For what it’s worth, this is the #1 reason I keep not trying Python. It’s just a huge headache and I can’t care enough about it.

                                                                                1. 4

                                                                                  I’ve been using Python professionally since 2010, I used to absolutely love it, and I’ve finally just reached the point where I no longer consider it an acceptable choice for a greenfield project of any scale, no exceptions. Someone will always come around to suggest that it’s open source, and you should fix what you don’t like, but it’s been my experience that the community culture has settled into a state that no project or ecosystem ever recovers from, which is when issues start being responded to with justifications along the lines of “this can’t change because of fundamental technical deficiency X, which because of its magnitude we’ve decided to stop treating as a technical deficiency in favor of treating it as a fundamental invariant of the universe, in spite of ample evidence that there are better ways to do it.” Either that technical deficiency is truly unsurmountable, in which case the tool is de jure broken and I have no good reason to use it, or that technical deficiency is surmountable but there will never be any will to fix it, in which case the tool is de facto broken and I have no good reason to use it. I feel deep sadness about this, but at this point there is too little that is exceptional about Python to justify putting effort into fixing what’s broken. No tool lasts forever, maybe we should just accept that it’s time to sunset this one.

                                                                                  1. 0

                                                                                    Thats a great point, and a good answer to “Why don’t you just fix it yourself” that is thrown anytime you complain about any open source project, especially established ones like Python.

                                                                                    Python especially has this cultish mentality “All if perfect how dare you suggest otherwise”

                                                                                2. 3

                                                                                  The thing that worries me is that I don’t think the core Python devs get it: the packaging situation is so bad that it very well may kill the language. It should be all hands on deck, this is the most important thing for us to fix. But I don’t see that happening at all, and instead there’s fiddling with esoterica like walrus and match…

                                                                                  Python’s governance is pretty transparent. Do you have concrete suggestions for improvement? If you do, consider coming up with even a proof of concept implementation and creating a PEP.

                                                                                  Be the change you want to see in the world :)

                                                                                3. 8

                                                                                  the real messiness of Python packaging is the freaking layering. Distutils, setuptools, pip, virtualenv (or the new project-based managers) — it’s all there

                                                                                  Yup +100 to this … This is why I sometimes download tarballs with shell scripts and use “python setup.py build” instead. That’s only one layer :)

                                                                                  That approach doesn’t work all the time, e.g. if you have a big NumPy stack, or a big web framework with transitive dependencies.

                                                                                  On the other hand, if it works, then you know exactly what your dependencies are, and you can archive the tarballs somewhere for reproducible builds, etc.

                                                                                  1. 3

                                                                                    the Bundler experience is just like npm/cargo/stack/etc

                                                                                    Thatk’s mostly splitting hairs, but for me there’s a big difference with cargo when it comes to experience: cargo run / cargo build just work, while bundle exec, npm run require running install command manually.

                                                                                    1. 1

                                                                                      There’s a lot that node gets right simply by virtue of being a younger language and community than Python or Ruby by a long shot, so it could benefit from observing the effect of critical decisions over time.

                                                                                      Unfortunately that lack of maturity can express itself in various ways, some merely cosmetic, some less so.

                                                                                      Overall even as someone like myself who doesn’t love the language, having now learned it I can appreciate that there’s a lot of interesting work going into that community that bears watching and learning from, whatever your first choice in programming languages may be.

                                                                                    1. 7

                                                                                      Great article, it clarified my own thinking!

                                                                                      One interesting issue here is how to merge the two. Say, you are a distro, and you want to package (by compiling from source) a bunch of programs, some of which are written in Rust. You probably want there to be a single canonical version of regex or rustls crate (module in article’s terminology) which is used by all Rust programs. I don’t think we know how to do that properly though!

                                                                                      The simplest approach is to just trust program’s lockfiles, but then you get duplication if one lockfile says regex 1.0.92 and another uses regex 1.0.93. I think that’s the approach used by nix to package Rust.

                                                                                      A different approach is to package each individual module as a separate package, but that is a lot of work, pollutes package namespace (when the users searches for a program, they see a bunch of libraries), and hits the impedance mismatch between flexible versioning of module manager and rigid versioning of program manager. I think that’s the approach used by Nix to package Haskell.

                                                                                      1. 4

                                                                                        One interesting issue here is how to merge the two.

                                                                                        I come from the C/C++ background (so shared libraries) and Debian instead of Nix and to me it seems the right approach is to use module manager for development and program manager for end-user delivery. To support this, the module manager should allow “detaching” the package manager part from the “package manager-build system” combo used during development and replacing it with the program manager’s package manager. Then the folks working on the program manager (e.g., Debian developers) can decide which version(s) of the regex library to package and programs that depend on it. That’s the approach we’ve adopted in build2.

                                                                                      1. 5

                                                                                        One thing about Ada I’d like explained better is dealing with safety of dynamically allocated memory. The usual answer I see is “you just avoid allocating memory” which is underwhelming. Everyone already avoids allocating memory, so this feels like dodging the question.

                                                                                        1. 3

                                                                                          I must say, after I’ve learned some Ada I also became somewhat more skeptical about Ada’s safety even without memory allocation. Ada has uninitialized variables, data races and mutable aliasing-related type confusion out of the box — you actually need SPARK to guarantee even program integrity.

                                                                                          I don’t know how big of a deal that is in practice: intuitively, it still seems way way better that C’s collection of dangerously sharp objects.

                                                                                          1. 1

                                                                                            Yes, it’s nowhere near as safe as the evangelists say. Its semantics are equivalent in power to any C-style language. Of course it’s better than C because it avoids a lot of undefined behavior and has less “features” designed to circumvent the type system, but semantically it is very weak. It’s no wonder that Ada is being replaced by C++ and not some other language with safer inherent semantics. The type of C++ static analysis tools people who use Ada use already dull C++’s sharp corners, and the difference in semantic power is minimal.

                                                                                            1. 1

                                                                                              Ada definitely isn’t safe, but it’s harder to shoot myself in the foot than with C or C++.

                                                                                              mutable aliasing-related type confusion

                                                                                              Thanks, that’s an interesting example which still works on FSF GNAT 10.3! That’s a very atypical combination of infrequently used features (variant record, fields with vtables in a variant, defaulted discriminant, global state with a procedure operating on that state) to get there. I’m keeping that on hand to remind myself how I can be bitten by it. I don’t think that expression, M := should error at compile-time, discriminants are supposed to only be bound once, but only since it has a defaulted discriminant, the unassigned value M : Magic is allowed and also that assignment.

                                                                                              uninitialized variables

                                                                                              The compiler is usually pretty good about ensuring you initialize things if you forget, with a few exceptions, like mentioned above. The one time this bit me was in a defaulted aggregate, where I set all remaining fields in a record to their “default” values, and one didn’t have one: Var := (FieldA => ValueA, FieldB => ValueB, others => <> ). Supposedly pragma Normalize_Scalars is supposed to help with this.

                                                                                              data races

                                                                                              Ada’s protected objects help deal with data races in a pretty interesting way, which can include arbitrarily complex conditional guards on access, in addition to operating like a mutex. You have to make sure your data is set up that way though.

                                                                                              “you just avoid allocating memory”

                                                                                              Yeah, it seems really, really weird to hear people say that, and I was originally very skeptical about it. After a while of working in Ada, you realize it’s similar to C++ where you wrap you have containers and other resources usually wrapped in RAII (controlled) types and just forget about it. The few times I’ve needed allocations, I’ve used a RAII-based reference-counted pointer. The other reason you don’t deal with many allocations is that you can return variable-length arrays with their bounds on the stack, and implementations are required to check for stack overflow.

                                                                                            2. 2

                                                                                              Ada is not memory safe in the presence of dynamic memory deallocation. There is no solution to this problem in Ada. Even the deallocation function is named Unchecked_Deallocation .

                                                                                            1. 20

                                                                                              If you’re going to use (much more expensive) ref-counted heap objects instead of direct references, you might as well be using Swift. (Or Nim, or one of the other new-ish native-compiling languages.) Rust’s competitive advantage is the borrow checker and the way it lets you use direct pointers safely.

                                                                                              The author should at least have pointed out that there’s a significant runtime overhead to using their recommended technique.

                                                                                              1. 20

                                                                                                No, this a fatalistic take almost like “if you use dyn you may as well use Python”.

                                                                                                Swift’s refcounting is always atomic, but Rust can also use faster non-atomic Rc. Swift has a few local cases where it can omit redundant refcounts, but Rust can borrow Rc‘s content and avoid all refcounts within a scope, even if object’s usage is complex, and that’s a guarantee not dependent on a Sufficiently Smart Compiler.

                                                                                                Swift doesn’t mind doing implicit heap allocations, and all class instances are heap-allocated. Rust doesn’t allocate implicitly and can keep more things on the stack. Swift uses dynamic dispatch quite often, even in basic data structures like strings. In Rust direct inlineable code is the norm, and monomorphisation is a guarantee, even across libraries.

                                                                                                So there’s still a lot more to Rust, even if you need to use Arc in a few places.

                                                                                                1. 9

                                                                                                  Uhu. It seems to me that there are two schools of thought here.

                                                                                                  One says: .clone(), Rc and RefCell to make life easier.

                                                                                                  The other says: the zen of Rust is ownership: if you express a problem as a tree with clear ownership semantics, then the architecture of your entire application becomes radically simpler. Not every problem has clean ownership mapping, but most problems do, even if it might not be obvious for the start.

                                                                                                  I don’t know what approach is better for learning Rust. For writing large-scale production apps, I rather strongly feel that the second one is superior. Arcs and Mutexes make the code significantly harder to understand. The last example, a struct where every filed is an Arc, is a code smell to me: I always try to push arcs outwards in such cases, and have an Arc of struct rather than a struct of arcs.

                                                                                                  It’s not that every Arc and mutex is a code smell: on the contrary, there’s usually a couple of Arcs and Mutexes at the top level which are the linch-pin of the whole architecture. Like, the whole rust-analyzer is basically an Arc<RwLock<GlobalState>> plus cancellation. But just throwing arcs and interior mutability everywhere makes it harder to note these central pieces of state management.

                                                                                                  1. 3

                                                                                                    I’ve always felt that the order of preference for new code is:

                                                                                                    1. make it work
                                                                                                    2. make it pretty
                                                                                                    3. make it fast/resource-efficient

                                                                                                    (some people may choose to wedge in “make it correct” somewhere there, but I think that’s either mostly a pipe dream or already part of 1.)

                                                                                                    That would mean that you always use the easiest possible techniques in phases 1 and 2 and in phase 3 do something more clever but only if the easy techniques turned out to be a bottleneck.

                                                                                                    I’m guessing the easiest technique in Rust terms would be copying a lot.

                                                                                                    1. 2

                                                                                                      I tend to agree about that ordering, but I’ve also found that heap allocation and copying is frequently a bottleneck, so much so that I keep it in mind even in steps 1-2. (Of course this applies to pretty low-level performance sensitive code, but that’s the kind of domain Rust gets used for.)

                                                                                                    2. 3

                                                                                                      I completely agree. If you don’t need precise control over memory, including the ability to pass around refs to memory safely, then the sane choice is to use a well-designed garbage collected language.

                                                                                                      Maybe you’re building something where half needs to control memory and the other half doesn’t. I guess something like this could make sense then.

                                                                                                      1. 2

                                                                                                        Swift isn’t exactly “available” on many Linux distributions due to its overengineered build system. The same goes for dotnet. Both of these languages are extremely fickle and run many versions behind the latest stable release offered on the natively supported OS (macOS for Swift and Windows for dotnet).

                                                                                                        To build Rust is comparatively sane and a breath of fresh air.

                                                                                                        1. 1

                                                                                                          Well, there is OCaml of course. On Linux with a reasonable machine, compiling it from scratch with a C toolchain should take just a handful of minutes. Of course, setting up the OCaml platform tools like opam, dune, etc., will take a few minutes more.

                                                                                                      1. 6

                                                                                                        The thing which comes to mind is https://github.com/apenwarr/redo; it’s not make and not toy, but interesting!

                                                                                                        1. 1

                                                                                                          I second this, it’s such a pleasure to use it once you grok it. I have used it for my masters thesis for the experimental setup, which turned out to be a huge help.

                                                                                                          Although I have to admit that having only the granularity up to a file was sometimes not enough as a simple change in one file could cause a rerun of the entire experiment eventually the code in question was not changed at all. But all in all I recommend it and I am a bit annoyed that I haven’t set it up for my current work.

                                                                                                        1. 7

                                                                                                          it takes advantage of compilation model based on proper modules (crates)

                                                                                                          Hang on a second, crates are not ‘proper’ modules. Crates are a collection of modules. If any file in the crate changes, the entire crate needs to be recompiled.

                                                                                                          ‘Proper’ modules are ones where the unit of compilation is the file, and the build can be parallelized and made much more incremental.

                                                                                                          The first advice you get when complaining about compile times in Rust is: “split the code into crates”.

                                                                                                          When files are compilation units, you split the code into files, which is standard development practice.

                                                                                                          Keeping Instantiations In Check…If you care about API ergonomics enough to use impl trait, you should use inner trick — compile times are as big part of ergonomics, as the syntax used to call the function.

                                                                                                          I think something has gone off the rails if, in the normal course of writing code, you need to use sophisticated techniques like these to contain build times. Hopefully it gets better in the future.

                                                                                                          Great article though.

                                                                                                          1. 10

                                                                                                            Yeah, agree that calling something “proper” without explaining your own personal definition of proper is wrong. What I’ve meant by “proper modules” is more-or-less two things:

                                                                                                            • code is paresd/typechecked only once (no header files)
                                                                                                            • there are explicit up-front dependencies between components, there’s no shared global namespace a-la PYTHONPATH or CLASSPATH

                                                                                                            So, I wouldn’t say that pre C++20 compilation model has “proper modules”.

                                                                                                            That being said – yes, that’s a very important point that the (old) C++ way of doing things is embarrassingly parallel, and that’s huge deal. One of the problems with Rust builds is that, unlike C++, it is not embarrassingly parallel.

                                                                                                            I’d actually be curious to learn what’s the situation with C++20 – how template compilation actually works with modules? Are builds still as parallel? I’ve tried reading a couple of articles, but I am still confused about this.

                                                                                                            And yeah, it would be better if, in addition to crates being well-defined units with specific interfaces, it would be possible to naively process every crate’s constituting module in parallel.

                                                                                                            I think something has gone off the rails if, in the normal course of writing code, you need to use sophisticated techniques like these to contain build times.

                                                                                                            To clarify, there’s an or statement, in normal application code one should write just

                                                                                                            pub fn read(path: &Path) -> io::Result<Vec<u8>> {
                                                                                                              let mut file = File::open(path)?;
                                                                                                              let mut bytes = Vec::new();
                                                                                                              file.read_to_end(&mut bytes)?;
                                                                                                              Ok(bytes)
                                                                                                            }
                                                                                                            

                                                                                                            There’s no need to make this template at all, unless you are building a library.

                                                                                                            But I kinda disagree with the broader assertion. Imo, in Rust you absolutely should care about compile times when writing application code, the same way you should care what to put in a header file when writing C++. I think we simply don’t know how to make a language which is both fast to run and fast too compile. If you chose Rust, you choose a bunch of accidental complexity, including slower built times. If you don’t care about performance that much, you probably should choose a different language.

                                                                                                            That being said, I would love to read some deeper analysis of D performance though – my understanding is that it, like C++ and Rust, chose “slow compiler” approach, but at the same time compiles as fast as go? So maybe we actually do know how to build fast fast to compile languages, just not too well?

                                                                                                            1. 2

                                                                                                              I’d actually be curious to learn what’s the situation with C++20 – how template compilation actually works with modules?

                                                                                                              I believe pretty much like in Rust: templates are compiled to some intermediate representation and then used during instantiation.

                                                                                                              Are builds still as parallel?

                                                                                                              No, again the situation is pretty much like in Rust: a module interface is compiled into BMI (binary module interface; equivalent to Rust’s crate metadata) and any translation unit that imports said module cannot start compiling before the BMI is available.

                                                                                                              I also agree that C++20 module’s approximate equivalent in Rust is a crate (and not a module).

                                                                                                              BTW, a question on monomorphization: aren’t argument’s lifetimes also turn functions into templates? My understanding is that while in C++ we have type and value template parameters, in Rust we also have lifetime template parameters which turn Rust into an “almost everything is a template” kind of language. But perhaps I am misunderstanding things.

                                                                                                              1. 5

                                                                                                                No, again the situation is pretty much like in Rust: a module interface is compiled into BMI (binary module interface; equivalent to Rust’s crate metadata) and any translation unit that imports said module cannot start compiling before the BMI is available.

                                                                                                                Thank you! This is much more helpful (and much shorter) than the articles I’ve mentioned.

                                                                                                                BTW, a question on monomorphization: aren’t argument’s lifetimes also turn functions into templates

                                                                                                                That’s an excellent question! One of the invariants of Rust compiler is lifetime parametricity – lifetimes are completely erased after type checking, and code generation doesn’t depend on lifetimes in any way. As a special case, “when the value is dropped” isn’t affected by lifetimes. Rather the opposite – the drop location is fixed, and compiler tries to find a lifetime that’s consistent with this location.

                                                                                                                So, while in the type system type parameters, value parameters and lifetimes are treated similarly, when generating machine code types work like templates in C++, and lifetimes roughly like generics in Java. That’s the reason why specialization takes so much time to ship – it’s very hard to make specialization not depend on lifetimes.

                                                                                                              2. 1

                                                                                                                Well your definition is interesting, because I actually don’t (exactly) agree.

                                                                                                                • code is paresd/typechecked only once (no header files)

                                                                                                                In OCaml you have ‘header’ or rather interface files, which are parsed/typechecked only once. They’re the equivalent of C++ precompiled headers except the compiler does it automatically.

                                                                                                                • there are explicit up-front dependencies between components, there’s no shared global namespace a-la PYTHONPATH or CLASSPATH

                                                                                                                Again in OCaml, there are dependencies between modules, but they are implicit. They just fail the build if modules are not compiled in the right order with the right dependencies. Fortunately the build system (dune) takes care of all that.

                                                                                                                Also OCaml has a shared global namespace–all source code files automatically become globally-visible modules within the project. Again fortunately the build system provides namespacing within projects to prevent name clashes.

                                                                                                                Another example of ‘proper modules’ is Pascal’s units, which actually satisfies both your above criteria (no header files, and explicit dependencies between units), and provides embarrassingly-parallel compilation.

                                                                                                                I think we simply don’t know how to make a language which is both fast to run and fast too compile.

                                                                                                                That may well be true.

                                                                                                                D

                                                                                                                From what I’ve heard, D compiles fast. And I assume it runs fast too. OCaml is pretty similar e.g. in some benchmarks it has similar single-core performance to Rust.

                                                                                                                1. 1

                                                                                                                  Yeah (and here I would probably be crucified), I’d say that OCaml doesn’t have proper modules :-) Compilation in some order into a single shared namespace is not modular.

                                                                                                                  Rust’s approach with an explicit DAG of crates which might contain internal circular dependencies is much more principled. Though, it’s sad that it lost crate interface files at some point.

                                                                                                                  1. 1

                                                                                                                    I’d say that OCaml doesn’t have proper modules :-)

                                                                                                                    Heh, OK then ;-)

                                                                                                                    Compilation in some order into a single shared namespace is not modular.

                                                                                                                    That’s exactly what Rust crates end up doing. It just shifts the concept of ‘namespace’ into the crate names. Same with Java, C#, etc. Rust:

                                                                                                                    use serde::{Serialize, Deserialize};
                                                                                                                    

                                                                                                                    OCaml:

                                                                                                                    Serde.Serialize.blabla
                                                                                                                    
                                                                                                                    1. 2

                                                                                                                      It just shifts the concept of ‘namespace’ into the crate names

                                                                                                                      Not exactly: Rust crates don’t have names. The name is a property of the dependency edge between two crates. The same crate can be known under different names in two of its reverse dependencies, and that same name can refer to different crates in different crates.

                                                                                                                      1. 1

                                                                                                                        I think this is a distinction without a difference. Rust crates have names. They’re defined in the Cargo.toml file. E.g. https://github.com/serde-rs/serde/blob/65e1a50749938612cfbdb69b57fc4cf249f87149/serde/Cargo.toml#L2

                                                                                                                        [package]
                                                                                                                        name = "serde"
                                                                                                                        

                                                                                                                        And these names are referenced by their consumers.

                                                                                                                        The same crate can be known under different names in two of its reverse dependencies

                                                                                                                        But they have to explicitly rename the crate though, i.e. https://stackoverflow.com/a/51508848/20371 , which makes it a moot point.

                                                                                                                        and that same name can refer to different crates in different crates.

                                                                                                                        Same in OCaml. Different projects can have toplevel modules named Config and have them refer to different actual modules. If there is a conflict it will break the build.

                                                                                                                        1. 2

                                                                                                                          If there is a conflict it will break the build.

                                                                                                                          The build will work in Rust. If, eg, serde some day publishes serde version 2.0, then a project will be able to use both serdes at the same time.

                                                                                                                          So, eg, you can depend on two libraries, A and B, which both depend on serde. Then, if serde published 2.0 and A updates but but B does not, your build will continue to work just fine.

                                                                                                                          Not sure if OCaml can do that, but I think Java (at least pre modules)/C/Python can’t, without some hacks.

                                                                                                                          1. 1

                                                                                                                            That is a design choice. OCaml makes the choice to not allow duplicate dependencies with the same name.

                                                                                                              3. 4

                                                                                                                What do you mean when you say ‘proper’ modules?

                                                                                                                I understand the author mean a compilation unit, so a boundary around which you can count on compilation being separable (and therefore likely also cacheable). In C and C++, that happens to be a file (modulo some confusion about the preprocessor). In Rust, it is a collection of files called a crate. Once you accept that, everything else you say about file-based compilation units holds for Rust crates too: you can parallelize compilation of crates, you can cache compiled crates, and you get faster compile times from splitting code into crates.

                                                                                                                1. 1

                                                                                                                  I defined ‘proper’ modules in my comment second paragraph.

                                                                                                                1. 12

                                                                                                                  The post sorely missies a benchmark measuring actual performance difference. Given that message_is_compressed(message) is probably not message.startswith("{"), the actual JSON parsing should completely dominate the check, and I would be surprised if this indeed makes a measurable difference.

                                                                                                                  1. 3

                                                                                                                    True, I felt the same writing the post, since since it’s from a year ago, and the reason for blogging it was a friend’s chat I didn’t have the chance to measure it.

                                                                                                                    Will try to measure it next time. But the point is -even though not measured- a string scan easily adds up in overhead over millions of messages. For ~0.02% of the cases?

                                                                                                                    1. 4

                                                                                                                      What do you mean by “a string scan”? What was the nature of the check to see if a message was compressed? Was it trialing an unpacking of the message?

                                                                                                                      I ask because you’ve brought it up a lot of times, once in reference to checking a single char at the start of the string, and it feels like you’ve been bitten by something related hard enough that you’ve got a visceral reaction here, rather than measuring things out?

                                                                                                                  1. 16

                                                                                                                    Regarding this comment on your website:

                                                                                                                    My goto style when dealing with python dictionaries. In my latest one, I was building a dictionary of lists: try dict[k].append(element), except dict[k] = [element]. Glad to know the EAFP fancy name that I can throw around my peers. And thanks for the good insights on the performance cuts this approach entails.

                                                                                                                    Use a defaultdict. If the key does not exist it defaults to a given type.

                                                                                                                    from collections import defaultdict
                                                                                                                    
                                                                                                                    dict_of_list = defaultdict(list)
                                                                                                                    dict_of_list[key].append(element)
                                                                                                                    

                                                                                                                    It is more readable and faster!

                                                                                                                    1. 8

                                                                                                                      Or setdefault method for the dict built-in: d.setdefault(key, []).append(element)

                                                                                                                      1. 5

                                                                                                                        That causes the python vm to build a new empty list for each element, and make two function calls instead of one. This adds up when looping, and may fragment the heap.

                                                                                                                        >>> import dis
                                                                                                                        >>> def asd(): return d.setdefault(key, []).append(element)
                                                                                                                        >>> dis.dis(asd)
                                                                                                                          1           0 LOAD_GLOBAL              0 (d)
                                                                                                                                      2 LOAD_METHOD              1 (setdefault)
                                                                                                                                      4 LOAD_GLOBAL              2 (key)
                                                                                                                                      6 BUILD_LIST               0
                                                                                                                                      8 CALL_METHOD              2
                                                                                                                                     10 LOAD_METHOD              3 (append)
                                                                                                                                     12 LOAD_GLOBAL              4 (element)
                                                                                                                                     14 CALL_METHOD              1
                                                                                                                                     16 RETURN_VALUE
                                                                                                                        >>> def asd(): return d[key].append(element)
                                                                                                                        >>> dis.dis(asd)
                                                                                                                          1           0 LOAD_GLOBAL              0 (d)
                                                                                                                                      2 LOAD_GLOBAL              1 (key)
                                                                                                                                      4 BINARY_SUBSCR
                                                                                                                                      6 LOAD_METHOD              2 (append)
                                                                                                                                      8 LOAD_GLOBAL              3 (element)
                                                                                                                                     10 CALL_METHOD              1
                                                                                                                                     12 RETURN_VALUE
                                                                                                                        

                                                                                                                        I believe the defaultdict simply overrides the __missing__ method of dict, effectively making it a ‘ask for forgiveness’ implementation.