1. 18
    1. 5

      You might have a look at the following paper for a generalisation of the idea:

      Standard Object Out: Streaming Objects with Polymorphic Write Streams

      It only deals with the output case, a follow up for “standard object in” is in the works.

      1. 2

        Hm what do you like about it? Reading the abstract scared me off…

        • “the object-oriented architectural style” – It feels like they’re making an implicit assumption that objects need to be serialized. But I reject that idea, and the industry has rejected it too – data needs to be serialized, not objects. In the sense of Rich Hickey’s “functions and data”.
          • Objects have code. Serializing the code gives you security problems. Not serializing the code breaks encapsulation. So when you need serialization, model the domain with data rather than objects. (Note I use OOP all over Oil, but I also follow this guideline. They’re not mutually exclusive.)
          • CORBA was a failure but protobufs/thrift are a relative success.
          • Java serialization also has big problems because of OO assumptions, and https://cr.openjdk.java.net/~briangoetz/amber/serialization.html .
        • “fails completely if there are cycles in the object graph” – the Java proposal/critique above also rejects this
        • triple dispatch – not sure what this is but I don’t even like double dispatch, i.e. the visitor pattern. There are some threads on the #oil-dev Zulip about concrete patterns I’ve had with that.
        • “chaining messages along the streams’ inheritance chain” – Do streams need inheritance chains? Go has polymorphic streams (Reader / Writer) but no inheritance.
        1. 1

          Hmm…I like about it that I’ve been using it with great success for around 2 decades, and I have to admit I don’t understand your criticisms.

          First, this has virtually nothing to do with Java-style object serialisation, and of course objects have to be serialised if you want to either persist them or pass them to another process. Your distinction between objects and data is one I don’t buy, and no, industry has not agreed to this and invoking Rich Hickey also doesn’t help…most of the claims he makes are…how to put this politely…not well supported by evidence, logic or even basic definitions.

          Typical implementations of the visitor pattern can be problematic. Double dispatch ≠ the visitor pattern, so your extrapolation starts off with having no basis and then goes off into the weeds from there. In my experience and opinion based on that experience, PWSs tend to solve the problems with the visitor pattern. YMMV.

          Nothing “needs” inheritance. However, refinement, or programming by difference, is incredibly useful if you have (or would like to reuse) something that works almost the way you need it, and don’t want to start from scratch. The paper describes how simple polymorphism in the stream is not sufficient.

          1. 1

            This seems like a needlessly aggressive response. I doubt @andyc realised you linked your own paper. Clearly they took the time to get an idea goodof what it’s about, clearer than I just managed in 30 minutes of reading it. I feel that deserves a more friendly reply even if their stance questions the basis of your work.

            Personally, I’d love to read a less technical article on this architecture that isn’t targeted at experts in object oriented programming. I’d be curious to understand the trade-offs along the spectrum between streaming text and streaming objects. Plain structured data like JSON is something I’m comfortable with here, while YAML with anchors already gets messy in my head, never mind object structures with inheritance and code.

          2. 1

            I don’t understand why you think this paper is relevant to shell. The shell language doesn’t have objects. [1]

            If you’re going to drop a link, then the burden is on you to explain why you think it’s relevant in a few sentences. And yes I didn’t realize it was your own paper – that would have been nice to mention.


            Also I would have liked to know what specifically are you using and in what domain? You’ve provided almost no information.

            It sounds like at the very least we’re working in very different domains. The way I think of shell is very influenced by distributed systems. It has almost all the same problems.

            In all the distributed systems I’ve deal with (mainly at Google), serializing data is one of the most common operations, but serializing objects causes versioning and upgrade headaches and has been rejected. It breaks encapsulation and object invariants due to lack of versioning. “Real” distributed systems can’t be upgraded all at once.

            In my experience, Object serialization has very narrow use cases:

            • You CAN send objects from a version X of a program to version X of the same program.
            • You can’t reliably send objects from program A to program B. I prefer to think of programs / node in a distributed system as “accidentally adversarial” – it makes the overall system more reliable and secure. This also applies to two or more programs communicating through pipes.
            • You can’t reliably send objects from version X to version Y of The same program.

            Maybe if your domain is changing very slowly you can get away with it, but in my experience it’s been more trouble than it’s worth. I program mostly in OO languages but I’ve never used object serialization in ANY program ever!


            [1] Oil may grow some sort of “record” in the medium term (in addition to dictionaries), and maybe some kind of module bundling code and data in the very long term. But that has very little to do with JSON support. The Oil interpreter is very OO so I have nothing against objects (unlike Hickey), and I’ve argued in many places that OOP and FP have converged in many important senses.

            1. 1

              why you think it’s relevant

              From the (brief) post: “…for a generalisation of the idea…” I think generalisations are relevant.

              Section 4.1 of the paper shows how the use of a restricted subset (property lists, equivalent to the JSON object/data model) is a special case of the more general technique presented, and the limitations of that approach.

              doesn’t have objects … The Oil interpreter is very OO … [implemented in Python]

              This seems to be the/a fundamental place where we are talking past one another. You keep making a distinction between objects and data as fundamentally different categories that I simply don’t buy, and for some reason what you did is on the “data” side of this and what I did is on the “object” side of this.

              While it is possible to construct differences, I don’t think the idea of fundamentally disjoint categories holds up to even superficial scrutiny. For example, what is JSON? JavaScript Object Notation. Your interpreter is implemented in Python, so even if you only model the restricted JSON subset, all of it is going to represented using objects. And once again, using a restricted subset is a special (and limited) case of the technique presented in the paper.

              Object serialization

              You keep bringing up object serialisation. Why? PWSs do not use “object serialisation” in the sense you use (like Java object serialisation). Object serialisation is not mentioned in the abstract and certainly if it were just Java style object serialisation, it would hardly rate a paper at a research conference.

              Of course, you can use PWSs for object serialisation (and that is listed in the “Applications” section), but in this instance PWSs actually solve the exact problems you list, which are largely due to the tight coupling, in turn due to the lack of polymorphism in Java-style object serialisation. With PWSs, the external data format is separate from the object model, and can vary independently. (And yes, the object model that is used by the yajl library you use is very much an object model).

              The triple dispatch + chaining allow you to build different PWSs (“data format adapters” if you will) with comparatively little effort, and thus make it comparatively easy to support different data formats and different object versions.

              So basically, you can get the stability you seem to associate with the “data oriented” approach, but without actually having to restrict your data model.

    2. 4

      Built-in JSON support seems like a great feature to draw in some users!

      I tried following along, and had some issues. Perhaps you could add a note to the start of the article stating how the shell should be set up? Should I be running oil or osh? This is with 0.7.pre8 built on Guix – it’s possible that the Guix packaging does something that is at fault here.

      • can’t find the = command, within either oil or osh
      • within oil: the line sharray=( foo.txt *.py ) fails because shopt parse_equals is on
      • it was a bit hard to interactively figure out shopt, all of these failed:
        • shopt -h
        • shopt --help
        • help shopt
        • man shopt Plain shopt output then let me guess that calling shopt -u parse_equals might do the trick, and it did.
      • (not a problem, but see later) once I had sharray, echo $sharray failed, and directed me succesfully to call echo @sharray instead
      • next I tried osh instead; the sharray= assignment now worked out of the box, so I felt I was on the right track
      • however, = myvar once again couldn’t find =
      • also, somewhat unexpectedly, while echo $sharray failed with the same error as in oil, this time echo @sharray didn’t actually “work”
      1. 2

        Thanks, this is great feedback!

        • I plan to change pp to = for pretty-printing, and was overzealous in the doc :-/ I changed it back to pp for now.
        • Added declare sharray=(...) as this works in both OSH and Oil. (Oil allows bare assignment like x = ‘foo’, so it disallows the old style x=foo).
        • help builtin is future work, sorry about that! We made some progress on it in the 0.7.pre6 release.
        • Yes in bin/osh, echo @array isn’t special. Because it’s meant to be strictly compatible with bash/POSIX, where you don’t need to quote @. This syntax occurs in a few command line tools so I don’t want to break them. When you opt into Oil you will have to writed '@quoted' for a literal @.

        http://www.oilshell.org/preview/doc/json.html

        https://github.com/oilshell/oil/commit/7ef11b82c9b7dd1703b8223004b85960600d06af


        And I also realize the shell sessions on this page should be syntax highlighted! I’m planning that but would rather get some docs out first.


        Update: I changed the “pretty print expression” syntax in Oil from pp EXPR to = EXPR. Everything on the RHS of = is always an expression.

        https://github.com/oilshell/oil/commit/81cd5c106c59e88f4f2d5b32fac0155248de2f34

    3. 2

      I’m looking for feedback on this! You can try it with the latest release: https://www.oilshell.org/release/0.7.pre8/

    4. 1

      I guess this is more or less inevitable today, but as a staunch JSON hater I feel a bit let down by Oil shell here. I just hope the json support is kept as a sort of accessory “compatibilty” layer, but the native and natural format will always remain plain text.

      1. 7

        You might “hate” JSON because it’s overused in certain contexts, e.g. it’s not good for config files.

        But there are many good uses of JSON too. Other observations:

        • JSON fits very well with shell’s philosophy because it’s “versionless” and “polyglot”.
        • Yaml and Toml both desugar to JSON. Oil will have support for a few other serialization formats, but not yaml or toml because you can just convert them to JSON with a separate tool, and then pipe that as JSON into Oil.

        Oil will probably also have support for HTML and some kind of CSV/TSV. Those are also basically “versionless” formats, and they do jobs that plain text or JSON doesn’t do well.

        1. 1

          You can’t losslessly convert floats with special values from YAML or TOML to JSON.

          It would be nice to support a format that can handle floats and binary data properly.

          Being able to pipe images around as more than just bytes could be really nice, for example.

          1. 2

            Yeah, it would be nice to have some kind of higher level support for binary data in Oil. But I don’t know what it would be like. If you have some use cases / experience, let me know: https://github.com/oilshell/oil/wiki/Where-To-Send-Feedback

          2. 2

            Being able to pipe images around as more than just bytes could be really nice, for example.

            You can already do that. This doesn’t need any “special support” by the shell. Binary pipes between processes work really well and have negligible overhead. Source: I work in image processing and I pipe binary images along filters for a living. What kind of further “shell support” would are you thinking about, for that case?

          3. 1

            You can’t losslessly convert floats with special values from YAML or TOML to JSON.

            Could you elaborate? Are you referring to NaN, -0 and the like?

            It would be nice to support a format that can handle binary data properly.

            It’s fairly hard to do that nicely in a format your text editor will support.

            1. 2

              NaN, Inf and -Inf are all unencodable by spec in JSON because of dumb security issues in JS. -0 might be unencodable, I’m not sure. These are useful values sometimes.

              Binary data doesn’t have to be text editor friendly, the outputs of tar or a soundwave generator are not text editor friendly but are still useful to pipe.

              Not all data is adequately representable as text.

              (And you can always use xxd and vim, if you like ;) )

      2. 3

        I think richer objects than “a big string” are good and useful.

        JSON is a pretty poor format (no comments, can’t encode float values NaN, Inf or -Inf, expensive parsing, verbose, etc), but it is also quite popular and basically good enough for most purposes.

        1. 1

          I think richer objects than “a big string” are good and useful.

          Of course! And plain text is not just a big string (well, you can argue that it is, and you can do the same for json). Plain text is a sequence of lines, and each line is a sequence of fields. This is quite popular and basically good enough for most purposes.

          1. 1

            Heh.

            The issue is that this encoding is not really true for many processes.

            There are header and footer and division lines in output. Fields may contain whitespace that may be indistinguishable from field delimiters. Fields are not named or typed.

            And you have to work this out laboriously for each tool. Because the interfaces are decidedly non-standard.

            Using a format designed from the start for machine representation of a wide variety of values is advantageous.