1. 4

    I feel kinda jaded. Every so often somebody will say ‘look at this horrifying bit of c syntax (that we’re stuck with because some 1970s compiler supported it by accident)’; or awful ioctl archeology, horrifying vt * or … but it doesn’t really faze anymore because I’ve seen it a thousand times. (Adam and bjorn, else-thread, I’m sure can say the same.) God, what a mess computers are. I guess this is ‘acceptance’.

    1. 43

      Here’s a fun anecdote (which is one of the many anecdotes that eventually educated my guess that UI intuitiveness is largely bullshit).

      In my corner of the world, we don’t have folders. I mean we don’t use them. Neither me, nor my computer-using peers from like 25+ years ago, when we were just learning about computers, had ever seen a real-life folder. The first time I actually saw one, for real, was at some point in 2008 or so, I think, when an American prof was visiting the research lab I was working at and he had a few with him.

      We do use things that are kind of like folders, but they look nothing like the icon. To make matters worse, one of the words we use to refer to those things that are kind of like folders is the same one used to translate the English word ‘file’. There is no direct equivalent for ‘file’ – we’d just call it a document instead, or, if it’s literally just one page, the literal translation of the word we have is ‘paper’.

      None of the mainstream operating systems were localized for our country until the mid ’00s or so. So a whole generation of computer users, including mine, grew up using computers that:

      • Used the term “folder” for something that we’d never seen and which we would’ve called a “file”
      • Used the term “file” for something you put into “folders”, which was basically the other way ’round for us (a file is something you put things in, not a thing you put in something else!)

      It took about 30 seconds to explain what each thing was and we all went our merry ways and used computers productively – which we still do, for that matter. The “local” equivalent for “file” now actually has a double meaning: it still denotes what Americans would call a folder (albeit a special kind of folder, more or less – specifically, one whose content is sorted in a particular way), but it’s understood that, if you’re referring to it in a computer context, it means something else.

      The fact that so many companies are spending so much time trying to make files and folders and their organization more intuitive is, IMHO, just another symptom of so many companies in our industry having ran out of relevant things to do in exchange for the money you pay them, so they settle for changing the part that’s easiest to bikeshed and most visible when it’s modified – i.e. the interface.

      What this article describes as novel – people just dumping all their files in a single directory, typically the desktop, since it’s the one you can reach with the least clicks, or maybe “Documents” if you have enough things – basically describes how everyone who’s not a computer nerd has been using personal computers since hard drives became cheap enough to render floppies relevant only for file transfer. So… roughly 30 years or so? In these 30 years, I’ve seen maybe half a dozen neatly-organised collections of folders and files, all of them belonging to programmers like myself, or people in other technical fields (e.g. various flavours of engineering). At best, most people will maybe have some folders on their desktop called “Work”, “Personal” and “Shit”, which they never refer to as “folders” but as “Work”, “Personal”, and “Shit”. That’s the apex of document organisation.

      The one difference that even cheaper local storage, along with cloud storage has brought, is that concepts like “multiple drives” are now pretty much irrelevant for many computer users.

      Also, I like to point out things like these:

      “As much as I want them to be organized and try for them to be organized, it’s just a big hot mess,” Vogel says of her files. She adds, “My family always gives me a hard time when they see my computer screen, and it has like 50 thousand icons.”

      every time someone in UX wants to sell me on a “clean” and “polished” interface that can show like twelve icons at a time, at most, all of them huge and with acres of space between them. It’s incredible how a field that literally has “user” in its name is so incredibly disconnected from how users, uhm, use computers.

      1. 16

        a file is something you put things in, not a thing you put in something else!

        This is actually true in English too. Trying to teach people about “files” and “directories” (or, later, “folders”) in the ‘90s was really hard: no-one could understand why the document was called a file when a file is a thing you store documents in. I used to describe it as a “file” for bytes, or something like that, but I don’t really know why they called it a file.

        1. 19

          I think it may date to mainframe operating systems or COBOL where what a “file” contained was “records”.

        2. 7

          What I find interesting is the disconnect between real-world organisation and the attempts of translating this (and failing) into 2D on a screen. I have the desktop organisation problem of “icons accumulating” -> moved into “world domination 3” and creating a 4 when that is too full. Neither are explored for what they are – it is either “browse as thumbnails” or grep and nothing in between. “Minimalism” didn’t work, “skeuomorphism” didn’t work, ontologies didn’t work. xdg-user-dirs makes me want to punch someone – there’s more documents in Downloads than in Documents.

          At the same time there is, at the least, a few hundred ‘data stores’ in my lab in various bins, and if I close my eyes and think a little bit, I can mentally walk through what is on the cupboards in the bathroom and most everything in the kitchen and closets with fair accuracy. Nothing of this translates to the desktop.

          1. 6

            I found that while ontologies don’t work, as in, they don’t work for every case, they do work in certain situations and they emerge organically when needed.

            Personal example: while my downloads/documents are a dumping ground, my actual documents (invoices, contracts, etc.) are uploaded with relevant tags and dates to Zoho docs.

            Business: almost every company I’ve seen which has a shared drive has a pretty good tree of documents not handled by a specialised services. For example invoices/plans per customer, documentation per project, etc. I’m not aware of any of them getting someone to plan/organise it. It’s just what happens naturally when you have a team of people who need to refer to those files daily.

            1. 5

              there’s more documents in Downloads than in Documents.

              On some systems this is a consequence of web browsers configured to download all files in a fixed place (eg. ~/Downloads). I always make sure to disable this option and get the browser to ask me where to place each document. They often go to /tmp, which will be wiped on reboot, but for the ones I plan to keep I have to place them in the directory tree right away.

              I understand that this “everything goes to Downloads” option was chosen as a default because most users don’t have an establihed folder hierarchy they care about – so asking them to place each document is a burden to them – but it also reinforces this tendency to have dumping grounds instead of structure. I wonder what a good UI design to nudge people towards more organization (when useful) would be.

              1. 5

                RISC OS has always worked like that: when you ask to save a new file (from any program, not just a browser) there is no default location. It just shows a little save box with the file icon and name, and you have to drag it to a folder somewhere:

                https://www.riscosopen.org/wiki/documentation/show/Quick%20Guide:%2011.%20Drag%20to%20Save

                I tried to get Linux to work like that a long time ago (http://rox.sourceforge.net/desktop/node/66.html) but it didn’t catch on, and most Linux desktops copied the Windows UI where everything ends up in a big unordered mess by default.

                1. 2

                  I tried to get Linux to work like that a long time ago (http://rox.sourceforge.net/desktop/node/66.html) but it didn’t catch on

                  This is great as one of options. If I already have opened a file manager with given folder, I would be happy if I can just drag the icon from an application to save the file. But in other cases when I have no file manager window opened, I want to save the file through the standard save dialog, because starting a file manager and dragging the icon will be cumbersome and annoying.

                  I would appreciate such draggable icon especially in the screenshot application. However it does not fully replace the save dialog.

                  1. 3

                    because starting a file manager and dragging the icon will be cumbersome and annoying

                    On RISC OS the file manager is always running, and you usually just keep open the folder(s) for the project you’re working on. For power users, the drag bit can get annoying; I actually wrote a little utility (called TopSave) that made it save to the top-most folder in the window stack if you press Return in a save box with no existing path.

                    You really don’t want a file-manager inside the save box though because:

                    1. It takes up a load of room, possibly covering the folder you want to drag to.
                    2. It will inevitably open in the wrong place, whereas you probably already have the project folder open (and if not, you can get to it faster using your regular desktop shortcuts).
                    3. If the filer is in the save box, then it goes away as soon as you finish the save operation. Then you can’t easily use the same folder for the next part of the project (possibly with a different application).

                    The Unix shell actually feels similar in some ways, as you typically start by cding to a project directory and then running a bunch of tools. The important point is that the directory persists, and you apply multiple tools to it, rather than documents living inside apps.

                    1. 1

                      Disclaimer: I wrote a utility for Windows to deal with this in the stock file dialogs: https://github.com/NattyNarwhal/OpenWindows

              2. 5

                This is one of the things I miss about “proper” spatial file managers, clunky as they were. The file-and-folder organisation method inherently retains the limits of the real-life equivalent that it drew inspiration from which, limited though it may be, is actually pretty reliable (or should I say “was” already?) – IMHO if it was good enough to develop antibiotics or send people to the Moon, it’s probably good enough for most people today, too.

                But in addition to all of those limits, it also ended up with a few of its own, that greatly diminished its usefulness – such as the fact that all folders look the same, or that they have no spatial cues. There’s no efficient computer equivalent for “grab the red folder called ‘World Domination through Model Order Reduction’ from that big stack on the left of the middle shelf” – what would require sifting through four or five folders at the top of a stack devolves into a long sequence of clicks and endless scrolling.

                Users unsurprisingly replicated the way these things are used IRL, with some extra quirks to work around the additional limitations, like these huge “World Domination” folders for things that you probably want to keep but aren’t really worth the effort of properly filing inside a system that not only makes it hard to file things in the first place, but makes it even harder to retrieve them afterwards.

                Spatial file managers alleviated at least some of that, not very well, but better than not at all. They fell out of use (for a lot of otherwise valid reasons, though many of them relevant mostly for low-res screens) quite quickly, unfortunately. The few that remain today are pretty much useless in spatial mode because their “clean” interfaces don’t lend themselves easily to browsing more than a handful of items at a time. It was not even close to the efficiency of cupboards and shelves, but it was a little better.

                Most of the software industry went the other way and doubled down on filing instead, drawing inspiration from libraries, and giving people systems that worked with mountains of metadata and rigid hierarchical systems, on top of which they sprinkled tags and keyword searches to alleviate the many problems of rigid hierarchical systems. IMHO this is very much short-sighted: it works for libraries and librarians because filing things is literally part of a librarian’s job, and librarians have not just a great deal of experience managing books & co. but also an uncanny amount of specialised education and training, they don’t just sit in the library looking at book covers. And even they rely enormously on spatial cues. Besides being mostly unworkable for people who aren’t librarians, most of these systems are also pretty inefficient when dealing with information which is unlike that which goes in a library – well-organised, immutable (books may have subsequent editions but you don’t edit the one in a library) bundles of information on a fixed set of topics, with carefully-curated references to other similar works, which you have to find on demand for other people for the next 30 years or so. Things you work with on a daily basis are nothing like that.

                1. 4

                  But in addition to all of those limits, it also ended up with a few of its own, that greatly diminished its usefulness – such as the fact that all folders look the same, or that they have no spatial cues.

                  One of the nice things about OS/2’s Workplace Shell was that you could customize the appearance of folders — not just the icon, but also the background of its open window, and probably some other things I don’t remember. Pretty sure that MacOS 8 (maybe later versions of System 7?) let you at least color-code folders.

                  1. 6

                    Mac OS has supported custom folder icons since 1991, and special folders like home, Downloads, etc. have special icons by default. Colors have been supported since about 1987, but lately they’ve been repurposed as tags, and these days the folder itself isn’t colored, there’s just a colored dot next to it.

                    Pre-X, folders had persistent window positions and sizes, so when you reopened a folder it kept the same place onscreen. This really helped you use visual memory. Unfortunately the NeXT folks never really “got” that, and this behavior was lost in 10.0.

                    1. 4

                      Yep! On the Linux side, Konqueror and I think Nautilus up to a point allowed this, too, IIRC Finder dropped it a long time ago. Most file managers dropped it, lest users would commit design heresy and ruin the consistency of the UI by indulging in such abominable sin as customising their machines. Most contemporary file managers just have basic support for changing folder icons (and quite poorly – I don’t know of any popular manager that adequately handles network mounts or encrypted folders, for example).

                    2. 1

                      I have a few ongoing “experiments” in this space. They are quite slow moving as they all take 90% engine-development, 10% implementing the concept. Experiment is a bit of a misnomer as the budget for modelling, generalisation and qualitative user studies is quite ehrm, anaemic.

                      simple - A damage control form of the ‘most of the software industry’ form.

                      1. User-defined namespaces (so a tag / custom root).
                      2. Indexing and searching is done per/namespace and not the finder-i-can’t-find-her. Don’t want to search my company docs when it is my carefully curated archives of alt.sex.stories-repository I am after.
                      3. Navigation-map, forcing a visual representation to be sampled for each document, stitched together into larger tilemaps.

                      wilder - A form of what the mobile phone- space does (ignoring Android EXTERNAL_STORAGE_SDCARD etc.)

                      1. Application and data goes together (me suggesting coupling? wth..) VM packaged, I might need 4 versions of excel with absolutely no routable network interfaces way too often. Point is, the software stays immutable, VMM snapshot / restore becomes data storage controls. “File Association” does not bleed outside the VM.
                      2. Application (or guest additions for the the troublemakers) responsible for export/import/search.
                      3. Leverage DnD/Clipboard like semantics. I take it you are familiar, but for the sake of it - DnD etc. involve type negotiation already: source presents sets of possible export types, sink filters that list, best match is sent. Replace the sink with a user-chosen interface (popup, context-sensitive trigger, whatever) .

                      all-the-way-to-11: That it would take me this long to mention AR/VR.

                      1. The VR WM I have was built with this in mind, a workspace (or well, safe-space) is a memory palace.
                      2. The layouter (WM policy) is room-scale (Vive and so on).
                      3. Each model added represents either a piece of data as is, or as an expandable iconic representation of something from the simple/wilder cases.
                      1. 1

                        Indexing and searching is done per/namespace

                        Why have ‘namespace’ as built in instead of an arbitrary user-definable tag?

                        forcing a visual representation to be sampled for each document

                        Probably some interesting things to be done with text viz, following cantordust. But I don’t know if you can get it to be both distinctive and stay stable as a document changes. And of course the whole zoo of other non-image formats—audio, zip/tar/, iso, subtitles, executables, random noise, …—need to be handled. And if you’re not careful with your processing that’s a DOS.

                        1. 2

                          Why have ‘namespace’ as built in instead of an arbitrary user-definable tag?

                          Externally defined tag so that it can be combined with system services, e.g. mounting daemon triggered arcan_db add_appl_kv arcan ns_some_guid some_user_tag

                          Probably some interesting things to be done with text viz, following cantordust. But I don’t know if you can get it to be both distinctive and stay stable as a document changes. And of course the whole zoo of other non-image formats—audio, zip/tar/, iso, subtitles, executables, random noise, …—need to be handled. And if you’re not careful with your processing that’s a DOS.

                          You mean like Senseye? quite certain that one went a lot further than cantor :-P. I didn’t exactly stop working on it – just stopped publicising/open sourcing.

                    3. 4

                      Nothing of this translates to the desktop

                      You wander into TikTok and hit a specific icon and scroll for 3 pages and the thing you want is now on screen.

                      Poor Unix. All this time with a single root while users gravitate to stuff stored under multiple roots/apps.

                    4. 4

                      The “folder” metaphor was already sort of niche when folks at Xerox PARC invented it for the Star in the late 70s. People who worked in offices used them, but I’m sure a lot of the US population didn’t. And the metaphor never worked for hierarchies anyway. Still, it was useful for its initial target audience.

                      Humans just aren’t good at mental models of hierarchies or recursive structures. We use them anyway, of course, because they’re essential, but they don’t come naturally.

                      1. 6

                        You’ve made me realize I actually use folders like real folders. I have a mostly flat Documents folder, and the only folders inside it exist to group a small amount of related documents. Taxes 2021, Camera Manuals, and so on and so forth. Nothing nested more than 1 level deep, no folders used as “categories” or any other kind of hierarchical concept.

                        Finding Generic Form Name.pdf without context of a folder would be so annoying. I definitely don’t rename things, so they have to at least be in folders.

                        1. 4

                          In my corner of the world, we don’t have folders. I mean we don’t use them. Neither me, nor my computer-using peers from like 25+ years ago, when we were just learning about computers, had ever seen a real-life folder.

                          As I understand it, “folder” means this sort of thing, but what do you call this sort of thing, which is more common (even today)? In e.g. Dutch there are separate words for this (“map” or “ordner” for the second one, “map” typically used to translate “folder”), but I’m not sure about English?

                          Either way, I think this doesn’t really matter; it’s essentially about the mental model of a hierarchical file structure, and whether you call it “folder” with an etymology some people may not follow or something else isn’t all that important.

                          I don’t think hierarchies are all that unintuitive; there are many (simple) ones in every-day life: in a library you have “fiction” further subdivided in categories, and “science” further divided in categories, etc. On a restaurant menu it’s the same: “starter/vegetarian/[..]”. In Amazon.com there’s a whole structure for products, etc. These are essentially not all that different.

                          1. 6

                            That sort of thing we call a binder.

                            1. 2

                              Vegetarian / vegan is more like a tag than a component of a hierarchy. Same with other common menu tags like spicy, gluten-free, and so on. They can apply to any menu item regardless of category.

                              On Amazon I rarely use the category hierarchy, and stuff I’m looking for often legitimately falls under multiple categories in the hierarchy.

                              1. 2

                                In the Nordic languages and Icelandic, we also say “mappe”/“mapp”/“mappa” for a folder, but I would say that’s the name of the first physical thing. The second thing is definitely a “ringperm” (no computing analogy).

                                But what about a file? We have the word “fil”, which in its physical form is the same tool as an English “file” – the prison escape tool. Just unambiguous. Maybe an unfortunate analogy, but at the same time so nonsensical that there is no confusion – people take it from context, and you can always say computer file (”datafil”) to be precise. Edit: LOL, you do the same in Dutch: “computerbestand

                                1. 2

                                  Oh, that second kind of thing is even cooler: the word we use for it is a portmanteau of the words used for “library” and “shelf”. Due to its extensive use in public administration, this object is so loathed that I doubt anyone would try to use it in an interface, except maybe in order to sabotage their own company :-P.

                              2. 4

                                That’s a very interesting perspective. If you don’t mind sharing, what is your native tongue?

                                FWIW I use a hierarchical structure in my documents and a date-based structure for photos. I can’t bear the idea I need to run a program that eats dozens of gigabytes of disk space and more than an entire CPU code to index all of those things just so I can press a shortcut and type in a few letters of the file I am looking for. I hate to say it, but with my minimal investment in a mental model, the man pages for ‘find’ and ‘rg’, and a graphical preview in my file browser have largely eliminated the ongoing cost of file indexing for me. I started on computers with a Z80 and an 80386 processor. “Waste” is a hard coded thing to shed at every opportunity for me.

                                1. 3

                                  |one of the many anecdotes that eventually educated my guess that UI intuitiveness is largely bullshit

                                  In humans, intuition is bullshit. Everything is learned.

                                  Somebody quipped “The only intuitive interface is the nipple; after that it’s all learned.” It might have been Bruce Ediger. Doesn’t matter who it was: turns out that humans don’t have much of an intuition for nipples, either. Breastfeeding techniques need to be learned – babies have an instinct to suck on something that’s tickling their lower lip, but mothers have no instincts about it at all. There is a chain of teaching that goes back to, very likely, the first primates that held their babies off the ground – and we only know it exists, rather than being “intuitive” or “instinct”, because of the successful marketing efforts of formula companies in the twentieth century.

                                  1. 3

                                    There is no direct equivalent for ‘file’ – we’d just call it a document instead

                                    I might be misunderstanding something (not a native English speaker, and reading between the lines), but it seems that you think that “document” and “file” are synonyms in English in their “physical” sense.

                                    What “file” means in English is, first, a verb with meaning to organize, or to submit: there’s “fileing cabinet”.

                                    As a noun, file is literally a folder :)

                                    Definition of file (Entry 5 of 8)

                                    1 : a device (such as a folder, case, or cabinet) by means of which papers are kept in order

                                    https://www.merriam-webster.com/dictionary/file#other-words

                                    1. 4

                                      it seems that you think that “document” and “file” are synonyms in English in their “physical” sense.

                                      Ah, no, I only meant this in the “computer” sense :). Way back (this was in the age of Windows 98, pretty much) when I tried to explain what folders and files were, the question that always popped up was “this is just an image I drew in Paint, how is this a file”, followed closely by “is this a file, as Windows Explorer claims it is, or a document, as Word calls it?”. Hence this… weird thing.

                                  1. 1

                                    From the linked video:

                                    The stack [interpreter] has done three instructions, whereas the register [interpreter] has only done two instructions

                                    Generally, stack machines need more instructions, in order to shuffle data around. You see this especially with getlocal and storelocal (or equivalent) instructions, which are completely obviated on a register machine.

                                    And ‘reasonable encoding’ there are much more compact encodings. There are also much more extensive encodings (which you almost certainly want). ‘Increment’ and ‘add immediate’ are almost certainly things you want, on either style of vm.

                                    [compiler is not cheap]

                                    Both truffle/graal and jitter are able to automatically generate a compiler from a ‘cheap’ interpreter. I believe pypy does similarly. Additionally, compilers are not actually so hard as the video makes them out to be; a very dumb compiler is similarly complex to a very smart interpreter, and similarly performant as well.

                                    1. 1

                                      Would you have a link for Jitter? I searched, but not sure what I am finding is related.

                                      1. 2
                                        1. 1

                                          Thanks!

                                    1. 1

                                      Threads don’t exist in standard C++

                                      praise Boehm

                                      1. 6

                                        I think this also demonstrates the importance of iterating on your language design early – which Swift did in an aggressive way in version 2 and 3, to the point that the 1.0 language looks more like Kotlin or something. I think it also points out that backward compatibility in every aspect can’t be sacred.

                                        There is an anecdote by the famous K or R, about how one of C’s warts was apparent to them in the mid-seventies, “but by then, the compiler was used on over a dozen sites, and we couldn’t distribute a breaking change”. Instead, such warts spread to millions of sites over the course of decades.

                                        1. 4

                                          Yeah, to me the takeaway is that if you’re going to make a “popular” language, you will need an automatic code rewriting tool for the early versions, or else you’ll be stuck with a lot of junk when you do stabilize.

                                          1. 4

                                            There is an anecdote by the famous K or R, about how one of C’s warts was apparent to them in the mid-seventies, “but by then, the compiler was used on over a dozen sites, and we couldn’t distribute a breaking change”. Instead, such warts spread to millions of sites over the course of decades.

                                            I believe that was about the use of tabs in make, by stuart feldman.

                                            1. 2

                                              That would explain why I never found a reference.

                                              1. 1

                                                And fixing it could well have killed it.

                                            1. 2

                                              ‘the DragonFlyBSD scheduler will use hyperthread pairs for correlated clients and servers’ out of curiosity, do any other operating systems do this as well?

                                              1. 1

                                                I wonder if this is a good idea. Hyperthreaded pairs share compute and provide a benefit only in case of an empty pipeline due to stalling memory operations. Maybe using nearby distinc cores instead would be better? Same CCD for AMD Nearby cores on the same ringbus for ringbus Intels same column/row for mesh Intels

                                                1. 1

                                                  If async: shared L1 makes shmem ops much faster.

                                                  If sync: whenever server is working, that means client is waiting for it, and vice versa.

                                              1. 5

                                                Dragonfly has gone a long way since; now they’re trading blows with Linux in the performance front, despite the tiny team, particularly when contrasting it with Linux’s huge developer base and massive corporate funding.

                                                This is no coincidence; it has to do with SMP leveraged through concurrent lockfree/lockless servers instead of filling the kernel with locks.

                                                1. 3

                                                  This comparison, which seems pretty reasonable, makes it look like it’s still lagging behind.

                                                  1. 7

                                                    What I don’t like about Phoronix benchmark results generally is that they lack depth. It’s all very well to report MP3 encoding test running for 32 seconds on FreeBSD/DragonflyBSD and only 7 seconds on Ubuntu, but that raises a heck of a question: why is there such a huge difference for a CPU-bound test?

                                                    Seems quite possible that the Ubuntu build is using specialised assembly, or something like that, which the *BSD builds don’t activate for some reason (possibly even because there’s an overly restrictive #ifdef in the source code). Without looking into the reason for these results, it’s not really a fair comparison, in my view.

                                                    1. 3

                                                      Yes. This is well worth a read.

                                                      Phoronix has no rigour; it’s a popular website. A benchmark is useless if it is not explained and defended. I have no doubt that the benchmarks run in TLA were slower under freebsd and dragonflybsd, but it is impossible to make anything of that if we do not know:

                                                      1. Why

                                                      2. What is the broader significance

                                                      1. 4

                                                        The previous two comments are fair, but at the end of the day it doesn’t really change that LAME will run a lot slower on your DragonflyBSD installation than it does on your Linux installation.

                                                        I don’t think these benchmarks are useless, but they are limited: they show what you can roughly expect in the standard stock installation, which is what the overwhelming majority of people – including technical people – use. This is not a “full” benchmark, but it’s not a useless benchmark either, not for users of these systems anyway. Maybe there is a way to squeeze more performance out of LAME and such, but who is going to look at that unless they’re running some specialised service? I wouldn’t.

                                                    2. 1

                                                      This comparison, newer and from the same website, makes it look as the system that’s ahead (see geometric mean @ last page).

                                                      Not that I’m a fan of that site’s benchmarks.

                                                      1. 2

                                                        I haven’t done the math, but it seems like most of DragonFlyBSD’s results come from the 3 “Stress-NG” benchmarks, which incidentally measures “Bogo Ops/s”.

                                                        Here’s the benchmark page: https://openbenchmarking.org/test/pts/stress-ng

                                                        I don’t know why Phoronix uses a version called 0.11.07 when the latest on the page seems to be 1.4.0, but maybe that’s just a display issue.

                                                        1. 1

                                                          Christ @ benchmarking with Bogo anything.

                                                  1. 11

                                                    Ah.

                                                    My pet hobby horse.

                                                    Let me ride it.

                                                    It’s a source of great frustration to me that formal methods academia, compiler writers and programmers are missing the great opportunity of our life time.

                                                    Design by Contract.

                                                    (Small Digression: The industry is hopelessly confused by what is meant by an assert. And subtle disagreements about what is meant or implied by different programmers is an unending source of programmers talking passed each other).

                                                    You’re welcome to your own opinion, but for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment.

                                                    By an assert in the following I mean, it’s a programmer written boolean expression, that if it ever evaluates to false, the programmer knows that the preceding code has an unknown bug that can only be fixed or handled by a new version of the code.

                                                    If it evaluates to true, the programmer full expects the subsequent code to work and that code will fully rely on the assert expression being true.

                                                    In fact, if the assert expression is false, the programmer is certain that the subsequent code will fail to work, so much so, there is no point in executing it.

                                                    So going back to DbC and formal methods.

                                                    Seriously. Writing postconditions is harder than just writing the code. Formal methods are way harder than just programming.

                                                    But we can get 90% of the benefit by specializing the postconditions to a few interesting cases…. aka. Unit testing.

                                                    So where can Formal Methods really help?

                                                    Assuming we’re choosing languages that aren’t packed with horrid corner cases… (eg. Signed integer overflow in C)…

                                                    Given a Design by Contract style of programming, where every function has a bunch of precondition asserts and a bunch of specializations of the postconditions……

                                                    My dream is a future where formal methods academics team up with compiler writers and give us…

                                                    • Assuming that every upstream assert expression is true, if it can be determined that any downstream assert will fail, the compile will fail with a useful warning.
                                                    • Where for every type, the programmer can associate an invariant expression, and the compiler will attempt to verify that it is true at the end of the constructor, and the start and end of every public method and at the start of the destructor. If it can’t, it will fail the compile with a warning.
                                                    • Wherever a type is used, the invariant expression can be used in this reasoning described above.

                                                    So far, you might say, why involved the compiler? Why not a standalone linter?

                                                    Answer is simple… allow the optimizer to rely on these expressions being true, and make any downstream optimizations and simplifications based on the validity of these expressions.

                                                    A lot of optimizations are base on dataflow analysis, if the analysis can be informed by asserts, and the analysis can check the asserts, and be made more powerful and insightful by relying on these asserts… then we will get a massive step forward in performance.

                                                    My experience of using a standalone linter like splint… is it forces you to write in a language that is almost, but not quite like C. I’d much rather whatever is parsed as valid (although perhaps buggy) program in the language by the compiler, is parsed and accepted as a valid program by the linter (although hopefully it will warn if it is buggy), and vice versa.

                                                    I can hear certain well known lobste.rs starting to the scream about C optimizers relying on no signed integer overflow since that would be, according the standard, undefined and resulting in generated assembler that results in surprised pikachu faced programmers.

                                                    I’m not talking about C. C has too much confused history.

                                                    I’m talking about a new language that out of the gate takes asserts to have the meaning I describe and explains carefully to all users that asserts have power, lots and lots of power, to both fail your compile AND optimize your program.

                                                    1. 5

                                                      As someone who has been using Frama-C quite a lot lately, I can’t but agree with this. There’s potential for a “faster than C” language that is also safer than C because you have to be explicit with things like overflow and proving that some code can’t crash. Never assume. Instead, prove.

                                                      1. 3

                                                        for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment

                                                        Didactic/pedagogical critique: in such a case, it may be more appropriate to introduce a new term rather than using one which has a common lay meaning.

                                                        My dream is a future where formal methods academics team up with compiler writers and give us […]

                                                        Sounds a lot like symbolic execution.

                                                        1. 3

                                                          using one which has a common lay meaning.

                                                          Does assertions have a different meaning than the one given here?

                                                          1. 4

                                                            I have colleagues for whom it means, “Gee, I didn’t think that input from outside the system was possible, so I want to know about it if I see it in unit test, and log it in production, but I must still handle it as a possibility”.

                                                            When I personally put in an assert at such a point, I mean, “a higher layer has validated the inputs already, and such a value is by design not possible, and this assert documents AND checks that is true, so in my subsequent code I clearly don’t and won’t handle that case”.

                                                            I have also seen debates online where people clearly use it to check for stuff during debugging, and then assume it is compiled out in production and hence has no further influence or value in production.

                                                            1. 1

                                                              GTest (Google Test), which I’ve recently had to use for C++ school assignments, refers to this in their macro names as EXPECT. Conditions whose failure is fatal are labelled with ASSERT. This makes intuitive sense to me: if you expect something to be true you accept its potential falsehood, whereas when you assert something to be true you reject its potential falsehood.

                                                              1. 1

                                                                TIL! Thanks! I only knew about assertions from the contract perspective.

                                                              2. 1

                                                                A common design choice is that assertions are evaluated in test environments, but not production. In that case, plus a test environment that you’re not confident fully covers production use cases, you might use assertions for hypotheses about the system that you’re not confident you can turn into an error yet.

                                                                I’m not sure that’s a good idea, but it’s basically how we’ve used assertions at my current company.

                                                              3. 2

                                                                Alas, my aim there is to point out people think there is a common lay meaning…. but I have been involved in enough long raging arguments online and in person to realize… everybody means whatever they damn want to mean when they write assert. And most get confused and angry when you corner them and ask them exactly what they meant.

                                                                However, the DbC meaning is pretty clear and for decades explicitly uses the term “assert”… except a lot of people get stuck on their own meaning of “assert” and conclude DbC is useless.

                                                                Sounds a lot like symbolic execution.

                                                                Ahh, there is that black or white thinking again that drives me nuts.

                                                                Symbolic execution and program proving is a false aim. The halting problem and horrible things like busy beavers and horrible fuzzy requirements at the UX end of things make it certain that automated end to end program proving simply will never happen.

                                                                That said, it can be incredibly useful. It’s limited for sure, but within it’s limits it can be extraordinarily valuable.

                                                                Odds on symbolic execution is going to fail on a production scale system. Not a chance.

                                                                However, it will be able to reason from assert A to assert B that given A, B will fail in these odd ball corner cases… ie. You have a bug. Hey, thats’ your grandfathers lint on steroids!

                                                              4. 2

                                                                You might find the Lean theorem proving language meets some of your requirements. As an example:

                                                                structure Substring :=
                                                                ( original : string )
                                                                ( offset length : ℕ )
                                                                ( invariant : offset + length ≤ original.length )
                                                                

                                                                In order to construct an instance of this Substring type, my code has to provide proof of that invariant proposition. Any function that consumes this type can rely on that invariant to be constrained by the compiler, and can also make use of that invariant to prove proofs about the function’s postcondition.

                                                              1. 2

                                                                @hwayne ping! I am sure you have things to say.

                                                                1. 7

                                                                  Fun fact, lobsters doesn’t actually notify you when you do that. But I saw the question and I should really write up my thoughts!

                                                                  1. 3

                                                                    Yes, please do! This thread needs your thoughts.

                                                                    1. 3

                                                                      This is going long and will probably have to be a newsletter

                                                                      1. 1

                                                                        I am looking forward to these thoughts and / or a newsletter!

                                                                        1. 1

                                                                          I look forward to it… but please address the very real world issue of DbC interacting (in both directions) with compilers, linters and optimizers and unit tests.

                                                                          I’ve been writing code in longish list of languages, from mainframe days to deeply embedded systems, in government scientific research institutes to commercial and industrial.

                                                                          I promise you, as I described above this is the route for formal methods to make a huge, and growing and very real impact on the world.

                                                                    1. 2

                                                                      Nice, I think there was a patch for chibicc that added inline asm support. The last bits are compiling a libc and kernel for either chibicc or cproc and we can have a full linux userland + kernel built without a C compiler implemented in C++.

                                                                      1. 3

                                                                        I mean, it sounds like there’s good reasons that GCC and LLVM are written in C++…

                                                                        1. 5

                                                                          Maybe it’s the same “arbitrary constraints are fun” mindset that drives a lot of stuff aroun here?

                                                                          Me, I’m working on an implementation of Doom in BCPL whose source code will not contain the letter “e”. Because reasons!

                                                                          1. 5

                                                                            To quote the cproc author:

                                                                            However, if you are asking why I’d want to be able to build the oasis userspace with a compiler written in 7000 lines of C as opposed to one written in 5 million lines of C++, then I don’t really know how to answer.

                                                                            Anyway…

                                                                            Do you regularly compile gcc or clang from source? Do you know how annoying it is to require more than 16GB of ram and 30 minutes a pop? Do you know how unnecessarily hard this makes bootstrapping a package tree/distro/architecture?

                                                                            It’s not arbitrary to want to reduce overall complexity of your computer system - especially removing projects too large for small teams to realistically maintain.

                                                                            If you can maintain your system with a small team you can side step huge amounts of bureaucracy and tailor the solutions to your own problems. I don’t think its arbitrary to want to avoid depending on projects out of your control.

                                                                            1. 3

                                                                              Those are valid points.

                                                                              However, I’d still worry whether these little compilers generate high quality code. Can they compete with all the work that goes into optimization in Clang/GCC? Saving time and/or space in the built product is worth some pain while building.

                                                                              Oh, and do they offer sanitizers? IMHO developing in C/++ these days without using address and UB sanitizers is foolhardy. Those things save my bacon so often. (Of course that only applies to the dev & test cycle, not deployment.)

                                                                              1. 4

                                                                                However, I’d still worry whether these little compilers generate high quality code.

                                                                                cproc generates code about on par with -O2 and sometimes -O1.

                                                                                Oh, and do they offer sanitizers? IMHO developing in C/++ these days without using address and UB sanitizers is foolhardy.

                                                                                I use these tools too, it would be nice to add, and there is no evidence that it would be impossible to do so in a reasonable amount of effort and code. As an example, tcc already comes with a bounds checking flag https://bellard.org/tcc/tcc-doc.html#Bounds.

                                                                                I would also say I think having simpler options doesn’t mean I advocate throwing away the advanced options when they make sense.

                                                                            2. 1

                                                                              thats awesome!

                                                                            3. 1

                                                                              such as? The only reason I can think of is because if you already use C++, it makes sense to have your C++ compiler in C++, and a C compiler (with a lot of baggage) sort of falls out by accident when you do this.

                                                                              To quote the cproc author:

                                                                              However, if you are asking why I’d want to be able to build the oasis userspace with a compiler written in 7000 lines of C as opposed to one written in 5 million lines of C++, then I don’t really know how to answer.

                                                                            4. 3

                                                                              we can have a full linux userland + kernel built without a C compiler implemented in C++

                                                                              Already done, with tcc. And doesn’t cproc still lack its own preprocessor?

                                                                              1. 2

                                                                                I don’t think tcc can build the linux kernel anymore.

                                                                                1. 2

                                                                                  Indeed, due to missing asm goto. Clang had the same problem until a year or two ago.

                                                                            1. 1

                                                                              Is it just me, or is unveil a terrible choice of name? It normally means “remove a veil”, “disclose” or “reveal”. Its function is almost exactly the opposite - it removes access to things! As the author says:

                                                                              Let’s start with unveil. Initially a process has access to the whole file system with the usual restrictions. On the first call to unveil it’s immediately restricted to some subset of the tree.

                                                                              Reading the first line of the man page I can see how it might make sense in some original context, but this is the opposite of the kind of naming you want for security functions…

                                                                              1. 3

                                                                                Is it just me, or is unveil a terrible choice of name? It normally means “remove a veil”, “disclose” or “reveal”. Its function is almost exactly the opposite - it removes access to things!

                                                                                It explicitly grants access to a list of things, starting from the empty set. If it’s not called, everything is unveiled by default.

                                                                                1. 3

                                                                                  I am not a native speaker, so I cannot comment if the verb itself is a good choice or not :)

                                                                                  As a programmer who uses unveil() in his own programs, the name makes total sense. You basically unveil selected path to the program. If you then change your code to work with other files, you also have to unveil these files to your program.

                                                                                  1. 2

                                                                                    OK, I understand - it’s only for the first usage it actually restricts, and immediately also unveils, after that it continues to unveil.

                                                                                  2. 2

                                                                                    “Veiling” is not a standard idea in capability theory, but borrowed from legal practice. A veiled fact or object is ambient, but access to it is still explicit and tamed. Ideally, filesystems would be veiled by default, and programs would have to statically register which paths they intend to access without further permission. (Dynamic access would be delegated by the user as usual.)

                                                                                    I think that the main problem is that pledges and unveiling are performed as syscalls after a process has started, but there is no corresponding phase before the process starts where pledges are loaded from the process’s binary and the filesystem is veiled.

                                                                                    1. 1

                                                                                      Doing it as part of normal execution implements separate phases of pledge/unveil boundaries in a flexible way. The article gives the example of opening a log file, and then pledging away your ability to open files, and it’s easy to imagine a similar process for, say, a file server unveiling only the public root directory in between loading its configuration and opening a listen socket.

                                                                                      1. 1

                                                                                        I think that the main problem is that pledges and unveiling are performed as syscalls after a process has started, but there is no corresponding phase before the process starts where pledges are loaded from the process’s binary and the filesystem is veiled.

                                                                                        Well the process comes from somewhere. Having a chain-loader process/executable that sanitises the inherited environment and sets up for the next fits well with the established execution model. It’s explicitly prepared for this in pledge(, execpromises).

                                                                                        1. 2

                                                                                          You could put it in e.g. an elf header, or fs-level metadata (like suid). Which also fits well with the existing execution model.

                                                                                          Suid is a good comparison, despite being such an abomination, because under that model the same mechanism can double as a sandbox.

                                                                                          Chainloader approach is good, but complexity becomes harder to wrangle with explicit pledges if you want to do djb-style many communicating processes. On the other hand, file permissions are distant from the code, and do not have an answer for ‘I need to wait until runtime to figure out what permissions I need’.

                                                                                          1. 1

                                                                                            Not going too far into the static/dynamic swamp shenanigans (say setting a different PT_INTERP and dlsym:ing out a __constructor pledge/unveil) - there’s two immediate reasons why I’d prefer not to see it as a file-meta property.

                                                                                            1. Filesystem legacy is not pretty, and accidental stripping of meta on a move to incompatible file-system would have a fail-silent-dangerous (stripping sudo is not dangerous versus stripping pledge setup).
                                                                                            2. Pledge- violations go kaboom, then you need to know that this was what happened (dmesg etc.) and you land in core_pattern like setups. The choice of chain-loader meanwhile takes the responsibility of attribution/communication so x11 gets its dialog or whatever, isatty() a fprintf and others a syslog and so on.
                                                                                      2. 1

                                                                                        Like Linux’s unshare

                                                                                      1. 4

                                                                                        This post made me wonder - is there are serious, production level software being written in array languages these days? APL was created in different times and I understand it did have some use. But outside of gimmicks and fun does anyone now work with K, J and others? (And not just to keep legacy apps alive) What would be the reasons to do that?

                                                                                        1. 11

                                                                                          Yes. K is widely used in the trading and finance world. Dyalog APL is a modern commercial vendor with at least enough paying customers to support the company – I don’t know what the numbers are. And J has at least some customers, I get the feeling fewer but don’t know for sure. These languages are definitely not just gimmicks.

                                                                                          1. 5

                                                                                            Do you know why they’re chosen over languages with more available experience / support / better integrated IDEs?

                                                                                            1. 9

                                                                                              For q/kdb+, there’s a huge amount of momentum and a very specific niche. Nobody is writing huge applications in q (okay, some people are, but not many). Instead, for example, they’re writing very specialized programs that analyze a billion stock trades and try to find patterns. The language is designed to make exactly that kind of thing concise and fast.

                                                                                              1. 6

                                                                                                What lorddimwit said for K. Also this page might help: https://kx.com/resources/use-cases/.

                                                                                                In the case of APL/J I’m not sure what the story is today. One part of the mythology was that a small team of experienced APL/J developers could be more productive than a team 5-10x the size in a mainstream language, because you are operating at a much higher level of abstraction and writing much less code. I’m sure such teams did exist, but even as a huge fan of these languages I am very skeptical of the claim generalizing to developers in general. First, because these languages are much harder to learn fluently than mainstream languages. Second, I think it’s hard to separate out the “the language” effect from the effects of “you just had very smart group of people” and “everyone is highly fluent in a single language” and “everyone is passionate about the language.”

                                                                                                Nevertheless, that is one of the selling points, and that central value proposition is still reflected on Dyalog’s homepage:

                                                                                                Dyalog delivers an APL-based development environment that allows both subject matter experts and IT specialists to efficiently convert ideas into software solutions.

                                                                                                1. 5

                                                                                                  these languages are much harder to learn fluently than mainstream languages.

                                                                                                  When I was a kid, a friend showed me some APL he was writing, and also the “game of life” example. So I got hooked, and ended up writing about a thousand lines of APL. APL was easy. Not much to it. Functional programming was harder for me to learn (as a kid): the idea of recursive functions was a tough nut for me to crack. And this was before the era of modern, hyper complex programming languages, like Rust and C++, which are many orders of magnitude more complicated than APL.

                                                                                                  C++ has float vs int, unsigned vs signed, char, short, int, long, long long vs int8_t, int16_t, int32_t, int64_t, etc, with complex conversion rules, and with different laws of arithmetic for different number types. 7/2==3 but 7.0/2.0==3.5, and so on. APL just has numbers, that’s it.

                                                                                                  C++ has data abstraction with classes, single inheritance vs multiple inheritance, private vs protected vs public inheritance, constructors, destructors, and on and on. APL has a fixed set of primitive data types with no way to define new ones. You learn those basic data types (which are very simple) and you are done.

                                                                                                  C++ has 14 ways to initialize a variable, APL has one. C++ has something like 23 different levels of operator precedence, which few people are ever able to memorize, APL has maybe 2 levels (verbs vs adverbs).

                                                                                                  As a professional programmer, I’ve written hundreds of thousands of lines of C++. I still have no idea what SFINAE means, and I don’t care. I just use google, online C++ references, and copy/paste snippets like everybody else. I gave up hope years ago of understanding C++ in the deep way that I used to understand APL, or even C.

                                                                                                  The claim that APL is “much harder to learn” is absurd, if you just compare language complexity. So this claim needs a lot of caveats and qualifications. The word “fluently” is used as a qualification. Okay, what’s different about APL is that you use different idioms to write code than you do in mainstream languages. If you have already invested years of your life to become fluent in hypercomplex mainstream idioms, then you may feel that you have to start over from scratch to learn and use APL fluently, which is a different experience than switching from one mainstream language to another. Mainstream languages tend to be mostly the same, with minor differences in syntax, but they all mostly support the same idioms.

                                                                                                  1. 3

                                                                                                    Yes, C++ is a much bigger language.

                                                                                                    If you have already invested years of your life to become fluent in hypercomplex mainstream idioms, then you may feel that you have to start over from scratch to learn and use APL fluently, which is a different experience than switching from one mainstream language to another. Mainstream languages tend to be mostly the same, with minor differences in syntax, but they all mostly support the same idioms.

                                                                                                    This is basically where I am coming from. I think if you took a young person with no prior experience and taught them APL or J in high school (which has been done) you won’t have a big problem.

                                                                                                    But for an experienced programmer “array thinking” is a huge paradigm shift. Learning J, for me, was much harder than learning Haskell, for example. There is a much steeper “unlearning” process too.

                                                                                                    You only have to learn about 50 primitives, but many of those primitives don’t map 1-1 to standard functions in other languages (eg, J’s copy verb # and the filtering idiom #~ based on it). And to be fluent, there are many combinations and idioms you have to learn as well, not to mention hooks and forks and longer trains.

                                                                                                    Getting to the point where you can frame a problem as an array problem, and quickly write up an idiomatic solution in the same amount of time that you could do the same in Python, say – that’s what I’m claiming is going to take much longer.

                                                                                                    1. 2

                                                                                                      APL has maybe 2 levels (verbs vs adverbs).

                                                                                                      • control structures

                                                                                                      • (line) separators

                                                                                                      • guards

                                                                                                      • copulae

                                                                                                      • verbs

                                                                                                      • adverbs

                                                                                                      • stranding

                                                                                                      • parentheses, brackets, braces

                                                                                                      And all the demons of regular precedence show up again in numeric literals. For instance: 1e¯2j3 is (1e(¯2))j3; nars and j have as many as 4 levels.

                                                                                                      Still orders of magnitude simpler than most other languages, though.

                                                                                                      1. 1

                                                                                                        Footnote: adverbs are accompanied by conjunctions (or, to use the traditional vocabulary, operators may be monadic or dyadic).

                                                                                                  2. 1

                                                                                                    In my very much newbie view, performance (automatic SIMD, although not automatic multi-core parallelism) and succintness. (There is really something magical about the core of your program taking eg. 8 characters instead of 200 lines. The function itself is frequently is shorter than the name you’d give it.)

                                                                                                2. 2

                                                                                                  There is a list of companies using those language here.

                                                                                                  1. 1

                                                                                                    I’d consider Julia to be a successor of APL. It is widely used in scientific computing and seems to be starting to displace Fortran for a bunch of things.

                                                                                                  1. 2

                                                                                                    May also find my build scripts (written for freebsd, but probably compatible) of interest.

                                                                                                    1. 1

                                                                                                      You didn’t seem to add hostdefs and netdefs. I also had to fix jmf and pcre a bit. I have it here, https://github.com/jxy/jsource/tree/freebsd-j9 currently behind upstream 1040 commits. Some tests fail, like x:2^_53 is incorrect, but otherwise it works fine.

                                                                                                    1. 18

                                                                                                      Neat idea. I’m not sure this is a captcha, but rather just a rate limiter.

                                                                                                      1. 13

                                                                                                        So much this. A proof-of-work scheme will up the ante, but not the way you think. People need to be able to do the work on the cheap (unless you want to put mobile users at a significant disadvantage) and malware/spammers can outscale you significantly.

                                                                                                        Ever heard of parasitic computing? TLDR: It’s what kickstarted monero. Any website (or an ad in that website) can run arbitrary code on the device of every visitor. You can even shard the work, do it relatively low-profile if you have the scale. Even if pre-computing is hard, with ad networks and live-action during page views an attacker can get challenges solved just-in-time.

                                                                                                        1. 9

                                                                                                          The way I look at it, it’s meant to defeat crawlers and spam bots; they attempt to cover the whole internet, they want to spend 99% of their time parsing and/or spamming, but if this got popular enough to prompt bot authors to take the time to actually implement WASM/WebWorkers or a custom Scrypt shim for it, they might still end up spending 99% of their time hashing instead.

                                                                                                          Something tells me they will probably give up and start knocking on the next door down the lane. And if I can force bot authors to invest in a $1M USD+ /year black hat “distributed computing” project so they can more effectively spam Cialis and Micheal Kors Handbags ads, maybe that’s a good thing? I never made $1M a year in my life, probably never will, I would be glad to be able to generate that much value tho.

                                                                                                          If it comes down to a targeted attack on a specific site, captchas can already be defeated by captcha farm services or various other exploits (https://twitter.com/FGRibreau/status/1080810518493966337). Defeating that kind of targeted attack is a whole different problem domain.

                                                                                                          This is just an alternate approach to put the thumb screws on the bot authors in a different way, without requiring the user to read, stop and think, submit to surveillance, or even click on anything.

                                                                                                          1. 9

                                                                                                            This sounds very much like greytrapping. I first saw this in OpenBSD’s spamd: the first time you got an SMTP connection from an IP address, it would reply with a TCP window size of 1, one byte per second, with a temporary failure error message. The process doing this reply consumed almost no resources. If the connecting application tried again in a sensible amount of time then it would be allowed to talk to the real mail server.

                                                                                                            When this was first introduced, it blocked around 95% of spam. Spammers were using single-threaded processes to send mail and so it also tied each one up for a minute or so, reducing the total amount of spam in the world. Then two things happened. The first was that spammers moved to non-blocking spam-sending things so that their sending load was as small as the server’s. The second was that they started retrying failed addresses. These days, greytrapping does almost nothing.

                                                                                                            The problem with any proof-of-work CAPTCHA system is that it’s asymmetric. CPU time on botnets is vastly cheaper than CPU time purchased legitimately. Last time I looked, it was a few cents per compromised machine and then as many cycles as you can spend before you get caught and the victim removes your malware. A machine in a botnet (especially one with an otherwise-idle GPU) can do a lot of hash calculations or whatever in the background.

                                                                                                            Something tells me they will probably give up and start knocking on the next door down the lane. And if I can force bot authors to invest in a $1M USD+ /year black hat “distributed computing” project so they can more effectively spam Cialis and Micheal Kors Handbags ads, maybe that’s a good thing?

                                                                                                            It’s a lot less than $1M/year that they spend. All you’re really doing is pushing up the electricity consumption of folks with compromised computers. You’re also pushing up the energy consumption of legitimate users as well. It’s pretty easy to show that this will result in a net increase in greenhouse gas emissions, it’s much harder to show that it will result in a net decrease in spam.

                                                                                                            1. 2

                                                                                                              These days, greytrapping does almost nothing.

                                                                                                              postgrey easily kills at least half the SPAM coming to my box and saves me tonnes of CPU time

                                                                                                              1. 1

                                                                                                                The problem with any proof-of-work CAPTCHA system is that it’s asymmetric. [botnets hash at least 1000x faster than the legitimate user]

                                                                                                                Asymmetry is also the reason why it does work! Users probably have at least 1000x more patience than a typical spambot.

                                                                                                                I have no idea what the numbers shake out to / which is the dominant factor, and I don’t really care; the point is that I can still make the spammers lives hell & get the results I want right now (humans only past this point) even though I’m not willing to let Google/CloudFlare fingerprint all my users.

                                                                                                                If botnets solving captchas ever becomes a problem, wouldn’t that be kind of a good sign? It would mean the centralized “big tech” panopticons are losing traction. Folks are moving to a more distributed internet again. I’d be happy to step into that world and work forward from there 😊.

                                                                                                              2. 5

                                                                                                                captchas can already be defeated by […] or various other exploits (https://twitter.com/FGRibreau/status/1080810518493966337)

                                                                                                                An earlier version of google’s captcha was automated in a similar fashion: they scraped the images and did a google reverse image search on them!

                                                                                                                1. 3

                                                                                                                  I can’t find a link to a reference, but I recall a conversation with my advisor in grad school about the idea of “postage” on email where for each message sent to a server a proof of work would need to be done. Similar idea of reducing spam. It might be something in the literature worth looking into.

                                                                                                                  1. 3

                                                                                                                    There’s Hashcash, but there are probably other systems as well. The idea is that you add a X-Hashcash header with a comparatively expensive hash of the content and some headers, making bulk emails computationally expensive.

                                                                                                                    It never really caught on; I used it for a while years ago, but I’ve never received an email with this header since 2007 (I just checked). It seems used in Bitcoin nowadays according to the Wikipedia page, but it started out as an email thing. Kind of ironic really.

                                                                                                                    1. 1

                                                                                                                      “Internet Mail 2000” from Daniel J. Bernstein? https://en.m.wikipedia.org/wiki/Internet_Mail_2000

                                                                                                                  2. 2

                                                                                                                    That is why we can’t have nice things… It is really heartbreaking how almost all technology advance can and will be turned for something evil.

                                                                                                                    1. 1

                                                                                                                      The downsides of a global economy for everything :-(

                                                                                                                  3. 3

                                                                                                                    Captchas are essentially rate limiters too, given enough determination from abusers.

                                                                                                                    1. 4

                                                                                                                      Maybe. The difference I would make is that a captcha attempts to assert that the user is human where this scheme does not.

                                                                                                                      1. 2

                                                                                                                        I mean, objectively, yes. But, since spammers are automating passing the “human test” captchas, what is the value of that assertion? Our “human test” captchas come at the cost of impeding actual humans, and are failing to protect us from the sophisticated spammers, anyway. This proposed solution is better for humans, and will still prevent less sophisticated attackers.

                                                                                                                        If it can keep me from being frustrated that there are 4 pixels on the top left tile that happen to actually be part of the traffic light than by all means, sign me the hell up!

                                                                                                                  1. 4

                                                                                                                    There had been talks of adding a Zig-style defer to C at one point, and I’d pay real American money to have a *? type to distinguish nullable pointers.

                                                                                                                    This isn’t to belittle the work of the Committee, more just “C23 is great, here’s something for C25.”

                                                                                                                    1. 3
                                                                                                                      1. 2

                                                                                                                        I mean that works. But god help you if you ever accidentally use return instead of Return .

                                                                                                                        Also, having to put Deferral in the start of every single scope you want to be able to use Defer in is a bit stupid.

                                                                                                                        I’d prefer an actual defer statement in the language. It’d be much cheaper at runtime too, because the compiler would just know what code to run on return, rather than having to iterate through a list of (non-standard) label pointers.

                                                                                                                        1. 2

                                                                                                                          god help you if you ever accidentally use return instead of Return

                                                                                                                          Perhaps a good idea to #define return Return, if using it extensively.

                                                                                                                          having to put Deferral in the start of every single scope you want to be able to use Defer in is a bit stupid

                                                                                                                          Indeed. Though note it is per-function, not per-scope.

                                                                                                                          much cheaper at runtime too, because the compiler would just know what code to run on return, rather than having to iterate through a list of (non-standard) label pointers

                                                                                                                          If control flow is complex, the compiler will have to do exactly the same thing. If control flow is simple, I expect it to be folded down to the same thing—conditional constant propagation is whack!

                                                                                                                          I’d prefer an actual defer statement in the language

                                                                                                                          Me too!

                                                                                                                        2. 1

                                                                                                                          __attribute__((cleanup)) and no custom Return required :)

                                                                                                                          1. 1

                                                                                                                            Yes, but that is nonstandard. Also: death to gobject!

                                                                                                                            1. 1

                                                                                                                              __attribute__((cleanup)) works well when the only thing you want your deferred functions to do is to somehow free an object when it goes out of scope. It’s useless for other use cases like unlocking a mutex at the end of the scope or anything else which doesn’t fit the extremely narrow scope of __attribute__((cleanup)).

                                                                                                                              Also, it requires defining a function out-of-line, with a name and everything, which is both more work for the programmer and, in many cases, harder to read than a proper defer.

                                                                                                                              1. 2

                                                                                                                                I wrote some macros that wrapped __attribute__((cleanup)) for handling locks in the Objective-C runtime many years ago and have not had to modify them. This code needs to be exception safe, because there are places where Objective-C or C++ will throw exceptions through C stack frames. The out-of-line function is implemented once and used everywhere where you’d want to use a lock, you just use the same kind of pattern that you’d use with RAII in C++.

                                                                                                                                If you want to run something that’s not just a function, then your compiler is going to end up doing closure construction in the middle, at which point you probably want to just expose closures in your language. And at that point, you may as well use C++ where it’s trivial to write a trivial Defer template class that takes a lambda in its constructor and invokes it in the destructor, something like this:

                                                                                                                                template<typename T> struct Defer
                                                                                                                                {
                                                                                                                                        T fn;
                                                                                                                                        Defer(T &&f) : fn(std::move(f)) {}
                                                                                                                                        ~Defer() { fn(); }
                                                                                                                                };
                                                                                                                                

                                                                                                                                In general, I’d consider something like this to be an antipattern though. Running arbitrary deferred code makes it very hard to reason about control flow. If you encapsulate cleanup code, either in destructors if you have them or in macros that wrap __attribute__((cleanup)) if that’s all that you have, then the set of things that can run on return / unwind is small and well defined.

                                                                                                                                1. 1

                                                                                                                                  at which point you probably want to just expose closures in your language

                                                                                                                                  Well, funny that you mention ObjC — coincidentally, clang brought Objective-C closures to C with -fblocks

                                                                                                                                  1. 2

                                                                                                                                    I am very aware of this, having written a blocks runtime before Apple open sourced theirs.

                                                                                                                                    Blocks are a mess for C, because C doesn’t have any notion of a destructor or any equivalent. A block that captures an Objective-C object pointer will release it when the closure is destroyed, dropping is refcount and allowing it to be deallocated. A block that captures a C++ object will run its destructor when the block is destroyed. This can be used in C++ with smart pointers so that an object either has ownership transferred to the block with a unique pointer or has a reference owned by the block with shared pointer. When the block is destroyed, all captured resources are destroyed.

                                                                                                                                    In contrast, when a block captures a C pointer, there is no place to deallocate it. The block itself is reference counted, so can happily capture heap objects and reuse them across multiple invocations, but it cannot then destroy them implicitly on deallocation. This means that blocks are fine in C for downward funargs (i.e. you can pass a block down the stack as long as it isn’t captured) but you can’t pass a block up the stack or have it captured by a function that it is passed down to.

                                                                                                                                    You can’t fix that without introducing something like a destructor into C and, as with so many other things in C, why would you bother when you can just use C++ and have mature support for all of these things in existing compilers?

                                                                                                                                    1. 1

                                                                                                                                      <sarcasm>Clearly, blocks in C should instead take an IUnknown* and decrement the reference count.</sarcam>

                                                                                                                          2. 2

                                                                                                                            pay real American money to have a *? type to distinguish nullable pointers

                                                                                                                            Also: @ for non-nullable pointers, # for length-accompanied slices. A man can dream.

                                                                                                                            1. 2

                                                                                                                              @ for non-nullable pointers

                                                                                                                              I suppose they could borrow (no pun intended) C++’s references syntax…

                                                                                                                              1. 8

                                                                                                                                Please don’t. Call-by-reference is a horrible idea. It should be obvious at the call site if the callee can mutate the passed variable.

                                                                                                                                And in C++ I cannot have, for instance, a reference to a reference. And it’s easy to accidentally turn a reference into a value, and then back into a reference but one which no longer refers to the original object. What’s wanted is an honest-to-god pointer, but one with minimal type-level assurance that it always points to one object.

                                                                                                                            2. 2

                                                                                                                              Did the talks on defer stop? There was a presentation on the proposal by Jens Gustedt and Rober Seacord the last 20th of March of 2021: https://www.youtube.com/watch?v=Y74i_1khQX8 I guess they’re still going for it.

                                                                                                                              For others that would be interested in alternatives, the following quote is from the “Related work” section of the documentation for my cedro C pre-processor that has, among other features, an unrestricted defer-style feature without limit on number or complexity of deferred code. I presented it a month ago here in Lobsters: https://lobste.rs/s/18axic/c_programming_language_extension_cedro

                                                                                                                              Apart from the already mentioned «A defer mechanism for C», there are macros that use a for loop as for (allocation and initialization; condition; release) { actions } [a] or other techniques [b].

                                                                                                                              [a] “P99 Scope-bound resource management with for-statements” from the same author (2010), “Would it be possible to create a scoped_lock implementation in C?” (2016), ”C compatible scoped locks“ (2021), “Modern C and What We Can Learn From It - Luca Sas [ ACCU 2021 ] 00:17:18”, 2021

                                                                                                                              [b] “Would it be possible to create a scoped_lock implementation in C?” (2016), “libdefer: Go-style defer for C” (2016), “A Defer statement for C” (2020), “Go-like defer for C that works with most optimization flag combinations under GCC/Clang” (2021)

                                                                                                                              Compilers like GCC and clang have non-standard features to do this like the __cleanup__ variable attribute.

                                                                                                                              1. 4

                                                                                                                                Robert on twitter:

                                                                                                                                Definitely not going to make C23. We need to publish a TR/TS first.

                                                                                                                            1. 10

                                                                                                                              Are they finally going to fix the abomination that is C11 atomics? As far as I can tell, WG14 copied atomics from WG21 without understanding them and ended up with a mess that causes problems for both C and C++.

                                                                                                                              In C++11 atomics, std::atomic<T> is a new, distinct type. An implementation is required to provide a hardware-enforced (or, in the worst case, OS-enforced) atomic boolean. If the hardware supports a richer set of atomics, then it can be used directly, but a std::atomic<T> implementation can always fall back to using std::atomic_flag to implement a spinlock that guards access to larger types. This means that std::atomic<T> can be defined for all types and be reasonably efficient (if you have a futex-like primitive then, in the uncontended case it’s almost as fast as T and in the contended state it doesn’t consume much CPU time or power spinning).

                                                                                                                              Then WG14 came along and wanted to define _Atomic(T) to be compatible with std::atomic<T>. That would require the C compiler and C++ standard library to agree on data layout and locking policy for things larger than the hardware-supported atomic size, but it’s still feasible. Then they completely screwed up by making all of the arguments to the functions declared in stdatomic.h take a volatile T* instead of an _Atomic(T)*. For historical reasons, the representation of volatile T and T have to be the same, which means that _Atomic(T) and T must have the same representation and there is nowhere that you can stash a lock. The desire to make _Atomic(T) and std::atomic<T> interchangeable means that C++ implementers are stuck with this.

                                                                                                                              Large atomics are now implemented by calls to a library but there is no way to implement this in a way that is both fast and correct, so everyone picks fast. The atomics library provides a pool of locks and acquires one keyed on the address. That’s fine, except that most modern operating systems allow virtual addresses to be aliased and so there are situations (particularly in multi-process situations, but also when you have a GC or similar doing exciting virtual memory tricks) where simple operations _Atomic(T) are not atomic. Fixing that would requiring asking the OS if a particular page is aliased before performing an operation (and preventing it from becoming aliased during the operation), at which point you may as well just move atomic operations into the kernel anyway, because you’re paying system call for each one.

                                                                                                                              C++20 has worked around this by defining std::atomic_ref, which provides the option of storing the lock out-of-line with the object, at the expense of punting the determination of the sharing set for an object to the programmer.

                                                                                                                              Oh, and let’s not forget the mtx_timedlock fiasco. Ignoring decades of experience in API design, WG14 decided to make the timeout for a mutex the wall-clock time, not the monotonic clock. As a result, it is impossible to write correct code using C11’s mutexes because the wall-clock time may move arbitrarily. You can wait on a mutex with a 1ms timeout and discover that the clock was wrong and after it was reset in the middle of your ‘get time, add 1ms, timedwait’ sequence, you’re now waiting a year (more likely, you’re waiting multiple seconds and now the tail latency of your distributed system has weird spikes). The C++ version of this API gets it right and allows you to specify the clock to use, pthread_mutex_timedlock got it wrong and ended up with platform-specific work-arounds. Even pthreads got it right for condition variables, C11 predictable got it wrong.

                                                                                                                              C is completely inappropriate as a systems programming language for modern hardware. All of these tweaks are nice cleanups but they’re missing the fundamental issues.

                                                                                                                              1. 3

                                                                                                                                Then they completely screwed up by making all of the arguments to the functions declared in stdatomic.h take a volatile T* instead of an _Atomic(T)*. For historical reasons, the representation of volatile T and T have to be the same, which means that _Atomic(T) and T must have the same representation and there is nowhere that you can stash a lock.

                                                                                                                                I’m not too familiar with atomics and their implementation details, but my reading of the standard is that the functions in stdatomic.h take a volatile _Atomic(T) * (i.e. a pointer to volatile-qualified atomic type).

                                                                                                                                They are described with the syntax volatile A *object, and earlier on in the stdatomic.h introduction it says “In the following synopses: An A refers to one of the atomic types”.

                                                                                                                                Maybe I’m missing something?

                                                                                                                                1. 2

                                                                                                                                  Huh, it looks as if you’re right. That’s how I read the standard in 2011 when I added the atomics builtins to clang, but I reread it later and thought that I’d initially misunderstood. It looks as if I get to blame GCC for the current mess then (their atomic builtins don’t require _Atomic-qualified types and their stdatomic.h doesn’t check it).

                                                                                                                                  Sorry WG14, you didn’t get atomics wrong, you just got mutexes and condition variables wrong.

                                                                                                                                  That said, I’ve no idea why they felt the need to make the arguments to these functions volatile and _Atomic. I am not sure what a volatile _Atomic(T)* actually means. Presumably the compiler is not allowed to elide the load or store even if it can prove that no other thread can see it?

                                                                                                                                  1. 1

                                                                                                                                    I’ve no idea why they felt the need to make the arguments to these functions volatile and _Atomic

                                                                                                                                    I’ve no idea; but a guess: they want to preserve the volatility of arguments to atomic_*. That is, it should be possible to perform operations on variables of volatile type without losing the ‘volatile’. I will note that the c++ atomics contain one overload with volatile and one without. But if that’s the case, why the committee felt they could get away with being polymorphic wrt type, but not with being polymorphic wrt volatility is beyond me.

                                                                                                                                    There is this stackoverflow answer from a committee member, but I did not find it at all illuminating.

                                                                                                                                    not allowed to elide the load or store even if it can prove that no other thread can see it?

                                                                                                                                    That would be silly; a big part of the impetus for atomics was to allow the compiler to optimize in ways that it couldn’t using just volatile + intrinsics. Dead loads should definitely be discarded, even if atomic!


                                                                                                                                    One thing that is clear from this exchange: there is a massive rift between specifiers, implementors, and users. Thankfully the current spec editor (JeanHeyd Meneide, also the author of the linked post) seems to be aware of this and to be acting to improve the situation; so we will see what (if anything) changes.

                                                                                                                                    1. 3

                                                                                                                                      One thing that is clear from this exchange: there is a massive rift between specifiers, implementors, and users. Thankfully the current spec editor (JeanHeyd Meneide, also the author of the linked post) seems to be aware of this and to be acting to improve the situation; so we will see what (if anything) changes.

                                                                                                                                      It’s not really clear to me how many implementers are left that care:

                                                                                                                                      • MSVC is a C++ compiler that has a C mode. The authors write in C++ and care a lot about C++.
                                                                                                                                      • Clang is a C++ compiler that has C and Objective-C[++] modes. The authors write in C++ and care a lot about C++.
                                                                                                                                      • GCC includes C and C++ compilers with separate front ends, it’s primarily C so historically the authors have cared a lot about C, but for new code it’s moving to C++ and so the authors increasingly care about C++.

                                                                                                                                      That leaves things like PCC, TCC, an so on, and a few surviving 16-bit microcontroller toolchains, as the only C implementations that are not C++ with C as an afterthought.

                                                                                                                                      I honestly have no idea why someone would choose to write C rather than C++ these days. You end up writing more code, you have a higher cognitive load just to get things like ownership right (even if you use nothing from C++ other than smart pointers, your live is significantly better than that of a C programmer), you don’t get generic data structures, and you don’t even get more efficient code because the compilers are all written in C++ and so care about C++ optimisation because it directly affects the compiler writers.

                                                                                                                                      C++ is not seeing its market eroded by C but by things like Rust and Zig (and, increasingly, Python and JavaScript, since computers are fast now). C fits in a niche that doesn’t really exist anymore.

                                                                                                                                      1. 2

                                                                                                                                        I honestly have no idea why someone would choose to write C rather than C++ these days.

                                                                                                                                        For applications, perhaps, but for libraries and support code, ABI stability and ease of integration with the outside world are big ones. It’s also a much less volatile language in ways that start to really matter if you are deploying code across a wide range of systems, especially if old and/or embedded ones are included.

                                                                                                                                        Avoiding C++ (and especially bleeding edge revisions of it) avoids a lot of real life problems, risks, and hassles. You lose out on a lot of power, of course, but for some projects the kind of power that C++ offers isn’t terribly important, but the ability to easily run on systems 20 years old or 20 years into the future might be. There’s definitely a sort of irony in C being the real “write once, run anywhere” victor, but… in many ways it is.

                                                                                                                                        C fits in a niche that doesn’t really exist anymore.

                                                                                                                                        It might not exist in the realm of trendy programming language debates on the Internet, but we’re having this conversation on systems largely implemented in it (UNIX won after all), so I think it’s safe to say that it very much exists, and will continue to for a long time. That niche is just mostly occupied by people who don’t tend to participate in programming language debates. One of the niche’s best features is being largely insulated from all of that noise, after all.

                                                                                                                                        It’s a very conservative niche in a way, but sometimes that’s appropriate. Hell, in the absolute worst case scenario, you could write your own compiler if you really needed to. That’s of course nuts, but it is possible, which is reassuring compared to languages like C++ and Rust where it isn’t. More realistically, diversity of implementation is just a good indicator of the “security” of a language “investment”. Those implementations you mention might be nichey, but they exist, and you could pretty easily use them (or adapt them) if you wanted to. This is a good thing. Frankly I don’t imagine any new language will ever manage to actually replace C unless it pulls the same thing off. Simplicity matters in the end, just in very indirect ways…

                                                                                                                                        1. 4

                                                                                                                                          For applications, perhaps, but for libraries and support code, ABI stability and ease of integration with the outside world are big ones. It’s also a much less volatile language in ways that start to really matter if you are deploying code across a wide range of systems, especially if old and/or embedded ones are included.

                                                                                                                                          I’d definitely have agreed with you 10 years ago, but the C++ ABI has been stable and backwards compatible on all *NIX systems, and fairly stable on Windows, for over 15 years. C++ provides you with some tools that allow you to make unstable ABIs for your libraries, but it also provides tools for avoiding these problems. The same problems exist in C: you can’t add a field to a C structure without breaking the ABI, just as you can’t add a field to a C++ class without breaking the ABI.

                                                                                                                                          I should point out that most of the things that I work on these days are low-level libraries and C++17 is the default tool for all of these.

                                                                                                                                          You lose out on a lot of power, of course, but for some projects the kind of power that C++ offers isn’t terribly important, but the ability to easily run on systems 20 years old or 20 years into the future might be.

                                                                                                                                          Neither C nor C++ guarantees this, in my experience old C code needs just as much updating as C++ code, and it’s often harder to do because C code does not encourage clean abstractions. This is particularly true when talking about running on new platforms. From my personal experience, we and another group have recently written memory allocators. Ours is written in C++, theirs in C. This is what our platform and architecture abstractions look like. They’re clean, small, and self-contained. Theirs? Not so much. We’ve ported ours to CHERI, where the hardware enforces strict pointers and bounds enforcement on pointers with quite a small set of changes, made possible (and maintainable when most of our targets don’t have CHERI support) by the fact that C++ lets us define pointer wrapper types that describe high-level semantics of the associated pointer and a state machine for which transitions are permitted, porting theirs would require invasive changes.

                                                                                                                                          It might not exist in the realm of trendy programming language debates on the Internet, but we’re having this conversation on systems largely implemented in it (UNIX won after all), so I think it’s safe to say that it very much exists, and will continue to for a long time.

                                                                                                                                          I’m writing this on a Windows system, where much of the kernel and most of the userland is C++. I also post from my Mac, where the kernel is a mix of C and C++, with more C++ being added over time, and the userland is C for the old bits, C++ for the low-level new bits, and Objective-C / Swift for the high-level new bits. The only places either of these systems chose C were parts that were written before C++11 was standardised.

                                                                                                                                          Hell, in the absolute worst case scenario, you could write your own compiler if you really needed to.

                                                                                                                                          This is true for ISO C. In my experienced (based in part on building a new architecture designed to run C code in a memory-safe environment and working on defining a formal model of the de-facto C standard), there is almost no C code that is actually ISO C. The language is so limited that anything nontrivial ends up using vendor extensions. ‘Portable’ C code uses a load of #ifdefs so that it can use two or more different vendor extensions. There’s a lot of GNU C in the world, for example.

                                                                                                                                          Reimplementing GNU C is definitely possible (clang, ICC, and XLC all did it, with varying levels of success) but it’s hard, to the extent that of these three none actually achieve 100% compatibility to the degree that they can compile, for example, all of the C code in the FreeBSD ports tree out of the box. They actually have better compatibility with C++ codebases, especially post-C++11 codebases (most of the C++ codebases that don’t work are ones that are doing things so far outside the standard that they have things like ‘works with G++ 4.3 but not 4.2 or 4.4’ in their build instructions).

                                                                                                                                          More realistically, diversity of implementation is just a good indicator of the “security” of a language “investment”. Those implementations you mention might be nichey, but they exist, and you could pretty easily use them (or adapt them) if you wanted to.

                                                                                                                                          There are a few niche C compilers (e.g. PCC / TCC), but almost all of the mainstream C compilers (MSVC, GCC, Clang, XLC, ICC) are C++ compilers that also have a C mode. Most of them are either written in C++ or are being gradually rewritten in C++. Most of the effort in ‘C’ compiler is focused on improving C++ support and performance.

                                                                                                                                          By 2018, C++17 was pretty much universally supported by C++ compilers. We waited until 2019 to move to C++17 for a few stragglers, we’re now pretty confident being able to move to C++20. The days when a new standard took 5+ years to support are long gone for C++. Even a decade ago, C++11 got full support across the board before C11.

                                                                                                                                          If you want to guarantee good long-term support, look at what the people who maintain your compiler are investing in. For C compilers, the folks that maintain them are investing heavily in C++ and in C as an afterthought.

                                                                                                                                          1. 3

                                                                                                                                            I’d definitely have agreed with you 10 years ago, but the C++ ABI has been stable and backwards compatible on all *NIX systems, and fairly stable on Windows, for over 15 years. C++ provides you with some tools that allow you to make unstable ABIs for your libraries, but it also provides tools for avoiding these problems. The same problems exist in C: you can’t add a field to a C structure without breaking the ABI, just as you can’t add a field to a C++ class without breaking the ABI.

                                                                                                                                            The C++ ABI is stable now, but the problem is binding it from other languages (i.e. try binding a mangled symbol), because C is the lowest common denominator on Unix. Of course, with C++, you can just define a C-level ABI and just use C++ for everything.

                                                                                                                                            edit

                                                                                                                                            Reimplementing GNU C is definitely possible (clang, ICC, and XLC all did it, with varying levels of success) but it’s hard, to the extent that of these three none actually achieve 100% compatibility to the degree that they can compile, for example, all of the C code in the FreeBSD ports tree out of the box. They actually have better compatibility with C++ codebases, especially post-C++11 codebases (most of the C++ codebases that don’t work are ones that are doing things so far outside the standard that they have things like ‘works with G++ 4.3 but not 4.2 or 4.4’ in their build instructions).

                                                                                                                                            It’s funny no one ever complains about GNU’s extensions to C being so prevalent that it makes implementing other C compilers hard, yet loses their minds over say, a Microsoft extension.

                                                                                                                                            1. 2

                                                                                                                                              The C++ ABI is stable now, but the problem is binding it from other languages (i.e. try binding a mangled symbol), because C is the lowest common denominator on Unix. Of course, with C++, you can just define a C-level ABI and just use C++ for everything.

                                                                                                                                              That depends a lot on what you’re binding. If you’re using SWIG or similar, then having a C++ API can be better because it can wrap C++ types and get things like memory management for free if you’ve used smart pointers at the boundaries. The binding generator doesn’t care about name mangling because it’s just producing a C++ file.

                                                                                                                                              If you’re binding to Lua, then you can use Sol2 and directly surface C++ types into Lua without any external support. With something like Sol2 in C++, you write C++ classes and then just expose them directly from within C++ code, using compile-time reflection. There are similar things for other languages.

                                                                                                                                              If you’re trying to import C code into a vaguely object-oriented scripting language then you need to implement an object model in C and then write code that translates from your ad-hoc language into the scripting language’s one. You have to explicitly write all memory-management things in the bindings, because they’re API contracts in C but part of the type system in C++.

                                                                                                                                              From my personal experience, binding modern C++ to a high-level language is fairly easy (though not quite free) if you have a well-designed API, binding Objective-C (which has rich run-time reflection) is trivial to the extent that you can write completely generic bridges, and binding C is possible but requires writing bridge code that is specific to the API for anything non-trivial.

                                                                                                                                              1. 1

                                                                                                                                                Right; I suspect it’s actually better with a binding generator or environments where you have to write native binding code (i.e. JNI/PHP). It’s just annoying for the ad-hoc cases (i.e. .NET P/Invoke).

                                                                                                                                                1. 2

                                                                                                                                                  On the other hand, if you’re targeting .NET on Windows then you can expose COM objects directly to .NET code without any bridging code and you can generate COM objects directly from C++ classes with a little bit of template goo.

                                                                                                                                2. 2

                                                                                                                                  Looks like Hans Boehm is working on it, as mentioned in the bottom of the article. They are apparently “bringing it back up to parity with C++” which should fix the problems you mentioned.

                                                                                                                                  1. 4

                                                                                                                                    That link is just Hans adding a <cstdatomic> to C++ that adds a #define _Atomic(T) std::atomic<T>. This ‘fixes’ the problem by letting you build C code as C++, it doesn’t fix the fact that C is fundamentally broken and can’t be fixed without breaking backwards source and binary compatibility.

                                                                                                                                1. 10

                                                                                                                                  That seems strange that elementaryOS developers would try to deny others the same rights that were essential in allowing them to make and distribute elementaryOS in the first place!

                                                                                                                                  The right they wanted to deny others was the right to distribute CDs using the name ‘elementaryOS’, which could reflect on them as an entity. They restricted nobody’s right to do anything with the software itself.

                                                                                                                                  1. 4

                                                                                                                                    The right they wanted to deny others was the right to distribute CDs using the name ‘elementaryOS’, which could reflect on them as an entity.

                                                                                                                                    But that was already covered by GPLv2:

                                                                                                                                    If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors’ reputations.

                                                                                                                                    And on

                                                                                                                                    They restricted nobody’s right to do anything with the software itself.

                                                                                                                                    that is not what I understand by

                                                                                                                                    Both distros have rules on their various subreddits (/r/elementaryos and /r/zorinos) that users cannot post links and sometimes information on how to build your own is removed. If someone builds their own .iso or shares the information to do so, they will have their post deleted and be banned.

                                                                                                                                    To me, it seems they are limiting people’s ability to distribute of a variation of their GPL’ed product, which goes against the GPLv2 (and v3) text:

                                                                                                                                    Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients’ exercise of the rights granted herein.

                                                                                                                                    But in the end, I think the problem is more of moral origin (do not delete posts) than licensing one.

                                                                                                                                    <sarcasm> Maybe a GPLv4 will address this kind of problem. </sarcasm>

                                                                                                                                    1. 4

                                                                                                                                      This is basically the less serious version of the RHEL agreement. Redhat distributes GPL binaries if you sign a support contract with them. Even though Redhat distributes the source to all of their programs, the support contract comes with very strict conditions against redistributing those binaries, and basically voids the contract.

                                                                                                                                      Many people say that this goes against the spirit of the GPL, but I’m not one of them.

                                                                                                                                      1. 2

                                                                                                                                        Many people say that this goes against the spirit of the GPL […]

                                                                                                                                        As far as I understand the GPLv2 §3, the Red Hat’s restriction violates the GPL:

                                                                                                                                        You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above […]

                                                                                                                                        1. 4

                                                                                                                                          You are allowed to redistribute the binaries. Redhat does not prevent you from distributing the binaries. All they do is stop providing support if they find out you have done so.

                                                                                                                                  1. 1

                                                                                                                                    Direct (“exact”) IEEE-754 comparison

                                                                                                                                    This can also be seen as a special case of tolerant comparison (any of relative/absolute/ulp), where the tolerance happens to be 0.

                                                                                                                                      1. 17

                                                                                                                                        This is just “Java is a miserable nightmare to program in”, then it just descends into parroting the party line on Unix philosophy with nonsensical statements to back it up. “printf” vs. “System.out.println” is not a great reason.

                                                                                                                                        1. 25

                                                                                                                                          Yup. I’ve worked in large Java code bases, large Python code bases, and large C/C++ code bases. The reasons given in this article are imagined nonsense.

                                                                                                                                          • Remember syntax, subroutines, library features: absolutely no evidence to support this. In all my years I have yet to see even a correlation between the use of IDEs and the inability to remember something about the language. Even if the IDE is helping you out, you should still be reading the code you write. (And if you don’t, then an IDE is not the problem.) This claim is a poor attempt at an insult.
                                                                                                                                          • Get to know your way around a Project: If anything, IDEs make this simpler and better. I worked in GCC /binutils without tags or a language server for a while and let me just say that without them, finding the declaration or definition with grep is much less efficient.
                                                                                                                                          • Avoid long IDE startup time / Achive better system performance: Most people I know who use IDEs shut them down or switch projects about once a week, if that. This is just whining.
                                                                                                                                          • Less Code, Better Readability: Tell this to the GCC and Binutils developers, who almost certainly didn’t use an IDE yet still managed to produce reams of nearly unreadable code. Yet another nonsense claim from the “Unix machismo” way of thinking.

                                                                                                                                          The other points made in the article are just complaints about Java and have nothing to do with an IDE.

                                                                                                                                          1. 6

                                                                                                                                            Avoid long IDE startup time / Achive better system performance: Most people I know who use IDEs shut them down or switch projects about once a week, if that. This is just whining.

                                                                                                                                            It’s also not even true. In particular tmux and most terminals top out at a few MB/s throughput and stop responding to input when maxed so if you accidentally cat a huge file you might as well take a break. Vim seems to be O(n^5) in line length and drops to seconds per frame if you open a few MB of minified json, and neovim (i.e. vim but a DIY IDE) is noticeably slower at basically everything even before you start adding plugins. Nevermind that the thing actually slowing my PC down is the 5 web browsers we have to run now anyway.

                                                                                                                                            1. 2

                                                                                                                                              Vim seems to be O(n^5) in line length and drops to seconds per frame if you open a few MB of minified json

                                                                                                                                              Obscenely long lines are not a very realistic use pattern. Minified JSON and JS is a rare exception. Vim uses a paging system that deals very well with large files as long as they have reasonable line sizes (this includes compressed/encrypted binary data, which will usually have a newline every 100-200 bytes). I just opened a 3gb binary archive in vim; it performed well and was responsive, and it used only about 10MB of memory.

                                                                                                                                              1. 3

                                                                                                                                                A modern $100 SSD can read a few MB in 1 millisecond, $100 of RAM can hold that file in memory 5 million times, and a $150 CPU can memcpy that file in ~100 nanoseconds.

                                                                                                                                                If they did the absolute dumbest possible implementation, on a bad computer, it would be 4-5 orders of magnitude faster than it is now.

                                                                                                                                              2. 2

                                                                                                                                                Oh yes, I didn’t even mention this seemingly ignored fact. I can’t speak for Vim and friends, but Emacs chokes horribly on large files (much less so with M-x find-file-literally) and if there are really long lines, which are not as uncommon as you might think, then good luck to you.