1. 13
    1. 12

      sharing of high-level abstractions of data between documents or applications

      COM? :D

      Seriously though, this is a fundamentally hard problem. Every app represents high level concepts in its own way, and the higher-lever the concept, the more differences there are.

      1. 2

        The thing about this suggestion is it kind of shows how hard a design problem an OS is. The particular facilities an OS provide make a kind of sense but there are lots of other things that I think it’s natural to want and which quite a few have attempted to add to the basic OS functionality.

        It’s interesting that memory allocation is split between the OS and whatever language one uses. Exactly how and why is an interesting problem.

      2. 1

        Yeah, when I read “Text is just one example. Pictures are another. You can probably think of more. Our operating systems do not support sharing of high-level abstractions of data between documents or applications.” I was like “umm copy/paste?”

        There are ways to share that stuff. Copy/paste (and drag and drop and related things) actually kinda do solve it. And the formats you use there can be written to disks too - .bmp, .rtf, and .wav files directly represent the Windows formats.

        Like I agree there are weaknesses and a lot of reimplementations, but it is plainly false to say there is no support.

    2. 8

      Windows and OS X are just as bad. They, too, offer no higher-level model than byte sequences to their applications.

      Windows richly and deeply supports Unicode. From the C runtime down to the public APIs of the OS. https://docs.microsoft.com/en-us/cpp/c-runtime-library/unicode-the-wide-character-set

      1. 7

        Also OLE.

        The author seems unfamiliar with innovations in the Windiws ecosystem.

        1. 4

          Pretty much all interest in OLE and COM has been lost years ago - for better or for worse, developers want to treat Windows like Unix.

          1. 2

            Yeeeep. Those who don’t learn from history…

      2. 4

        A wide character is a 2-byte multilingual character code

        Already a sign of disaster is representing unicode code points as 2 bytes (little or big endian?) and referring to them as characters.

        1. 1

          Turns out when you make these design decisions in 1996 or whatever they end up flawed and it’s hard to change them.

    3. 3

      It’s funny that when you use native applications (like Mail and Calendar on a Mac) instead of cross-platform LCD things like Thunderbird, these problems go away.

      1. 4

        Windows, Linux+XDG, and macOS all have better abstractions than “get a byte, get a byte, get a byte-byte-byte”. But they’re not portable, and they don’t work over the Internet, so they don’t get used.

    4. 2

      Separating encoding out into the OS means that text strings that are reliable in one system are unreadable on another. Fundamentally, they have to combined into a rich text system; a structured text statement. Anyway, as other posters point out, this is something that Windows has essentially solved.

      Something like HTTP & HTML with its specification of encoding as a structure on the text is the Unix solution. The author pushing encoding into the system layer, where it belongs with the data layer.

    5. 1

      There is only one representation of text: UTF-8. The rest are broken binary blobs.

    6. 1

      Which definition of the word ‘factored’ is being used in the title of this article?

    7. 1

      Programs should transport data between themselves using structured data formats like TSV, s-expressions (or JSON (or XML))

      1. 12

        Everyone, including Microsoft, KDE, Gnome, IBM and W3C tried representing everything in XML in early 2000s, everything ran smoothly, then in 2005 Google made AJAX search autocomplete, and it suddenly became out of fashion (I still see arguments like “XML is not mobile-friendly”).

        Popularity of AJAX (a political rename of XMLHttpRequest) and jQuery gave us json, which can’t even represent date and time. And because it doesn’t have comments and has weird strictness of rules, such as “no trailing comma”, it’s not human-writable, so we got yaml, along with reference manuals how to write strings in it.

        TSV does not define escaping rules for newlines and tabs, so it becomes a set of TSV formats.

        S-expressions are cool when it’s EDN, because it’s clojurified XML, but “true” lispers hate clojure and its three types of brackets.

        Protocol buffers is glorified C struct casting for webscale companies, when you need to move hoards of personal data between nosql databases and JSON.stringify() leads to disk swapping. Even for Openstreetmap planet dumps it makes not a big difference if you consume it in the form of compressed xml or compressed pbf, if you know how to stream XML.

        At this level of depressive state of affairs it’s just simpler to define your own format with funny syntax (take your favorite computing history era, take inspiration from cobol, for example) and write parser for it, especially when every language nowadays have parsec-style lib.

      2. 4

        It would still have the unnecessary overhead of encoding and parsing. I’ve been playing around with writing a backwards-compatible stdio.h alternative, that uses shared memory to pass data between programs, or fall back onto streams if the other process doesn’t support this. Still very alpha, and there’s no point in doing this if the whole shared memory idea isn’t worth the performance.

        Another point, that the CLOSOS paper has suggested in connection to s-expressions, is to have operating systems without processes and pipes, but since address spaces are large enough nowadays to use higher-level languages as a protection mechanism and have all the memory within a OS shared in one address space. Interesting, but would probably require some more thinking through.

        1. 1

          directly sharing memory between separate processes would probably require a capability system to keep things in check

          1. 3

            I think that’s the point of using a higher-level language: if there is no direct memory dereferencing, and if I can’t create my own references, then any reference I hold is effectively its own capability.

            1. 2

              Then what would prevent me from manipulating the generated code from an existing binary to reach into another program’s address space and mess things up? You still need runtime protection because otherwise a binary running on your CPU can basically do anything.

              1. 2

                I imagine that the binaries would be controlled by the OS in that case, and not editable by user code.

                It’s actually pretty doable. Downloaded executables could be in some kind of source or bytecode form, and the OS could then convert that to binary AOT or JIT — much like JavaScript and WebAssembly today.

                1. 1

                  Hard disks and other operating systems still exist. The only way to make this work is runtime support

                  1. 1

                    I wouldn’t think that editing the hard drive from another OS would be in-scope, since clearly that would be done by someone with physical control.

          2. 2

            directly sharing memory between separate processes would probably require a capability system to keep things in check

            Microkernels have existed for a while. In theory, IPC is more or less a solved problem; check out the L4 family, in particular seL4 for its cap system; see also the IPC system used on Horizon (the Nintendo 3DS operating system) and its services API.

            1. 2

              Plus, seL4 has CAmkES to glue the processes together.

    8. 0

      Separating encoding out into the OS means that text strings that are reliable in one system are unreadable on another. Fundamentally, they have to combined into a rich text system; a structured text statement. Anyway, as other posters point out, this is something that Windows has essentially solved.

      Something like HTTP & HTML with its specification of encoding as a structure on the text is the Unix solution. The author pushing encoding into the system layer, where it belongs with the data layer.