1. 23
  1. 9

    This pattern of implementing things in a higher-level / easier language and then lowering for performance is also a common one in the computer vision space! It’s often much easier to prototype things in something like MATLAB before moving to OpenCV with C++. The MATLAB version lets you make sure that the concepts work or are even worth pursuing further, and then you go to OpenCV when you want it to actually run fast and be usable in other contexts.

    1. 7

      Also, in the Rust world this makes me think of the often-recommended pattern of avoiding references, cloning liberally, sprinkling RefCell and Rc all over the place, and only once you’ve got it working and right, starting to reduce the unnecessary allocations.

      1. 3

        Yes for sure, I think we need more of this to make reliable systems! I mentioned the analogy to TeX here:


        And how the Go compiler was translated from C to Go. It’s very similar because it’s not arbitrary C – you can’t translate say bash or sqlite to Go with their translator. Similarly our translator is not for arbitrary Python – it’s for the (statically typed) subset that we use.

        1. 2

          This is all very cool! I’ll add that in the formal methods world, there’s also a fun approach of model-based code generation, where you write some model that you prove useful properties of, then generate code from that model, and prove that the generated code maintains all the stuff you proved about the model!

          I’ve actually contributed to a project that did this, though I was a developer and target language expert and definitely not one of the formal methods people doing the proofs. Fun stuff!

          1. 3

            Yes for sure! That is kind of what I was getting at with the C++ bits at the end.

            When many people see C++ now, they think “ewww” unsafe. But the point is that if the C++ is generated from the Python “model”, then it retains its semantics!

            As the post says, you can’t express use-after-free, double-free, or any kind of memory unsafety in Python, so the generated C++ won’t have it.

            I guess it’s hard to explain without examples …

            1. 1

              Whoa - do you have anything else to share about a project like that? Any posts or anything?

              This is the exact approach that I’m taking with a programming language that I’m developing. It feels like a workflow that actually has a ton of potential, especially for people who care about quality (that’s how I ended up thinking about this).

              1. 1

                Unfortunately no this wasn’t an open source project and I’m not able to say any specifics about it.

                Your project looks really cool though! I’d like to add it to my list of languages implemented in Rust if that’s alright with you!

                1. 2

                  Ah, I understand. Adding to your list is cool with me!

          2. 2

            This pattern of implementing things in a higher-level / easier language and then lowering for performance is also a common one in the computer vision space!

            Why not implement in a high level but also performant language like Rust, Ocaml, Swift, etc?

            1. 6

              Nothing beats the expressiveness of using a DSL or a language that’s more suited to a particular domain. The languages you mentioned are all general purpose, which means they have a ceiling on how expressive they can be for any given domain.

              1. 4

                Yes exactly, thank you for the great answers! :) That’s what I meant in the post by by “getting leverage” if the “middle language” fits the problem domain.

                I find that if you program mostly in one language it’s hard to see this … You “think” in that language; it becomes your world. But if you use many languages then you always see the possibility of fitting the language to the domain more closely.

                1. 1

                  Sure, but writing in one of them has got to win over writing in python/Ruby first and then rewriting in C++…

                  1. 5

                    Maybe. I know this was in a reply to a comment talking about writing both versions, but this article is about generating the lower-level code. So it’s not being implemented twice.

                2. 2

                  I don’t do computer vision stuff anymore, but the answer is almost certainly lack of libraries. MATLAB and OpenCV have a lot of APIs that any other language would need to match to be usable, not to mention the amount of existing research code in MATLAB or OpenCV which you just “just use” if you’re working in the same language.

                  1. 3

                    Also I was doing this work ~8 years ago, in 2014, so Rust wasn’t even 1.0 yet.

              2. 9

                Out of curiosity I took a look at your existing C++ and it’s pretty clear to me it’s written by someone who has no experience in modern C++ development (int for sizes everywhere, naked pointers to dynamically allocated memory passed around even though exceptions are also used, no awareness of where to copy and where to move, etc). I believe you will have a hard time finding a good C++ developer who would be willing to carry on in this style. IMO, your best option would be to find a good C++ developer, give them the spec for the shell and conformance tests, and let them re-implement it from scratch.

                1. 6

                  Also I just added this to “Good Signs” on the “job listing”

                  If you think our C++ is ugly! That means you have ideas on how to make it better. What exists is a proof of concept, designed to show the strategy will work and can perform well. There are many improvements that can be made. If you’re convinced a complete rewrite is necessary, then please make a case that it’s feasible justified by a survey of the code.

                  Also note that the dumb_alloc part is going to be thrown away in favor of gc_heap.cc, etc. There is already some rewrite inherent in the project, though maybe it wasn’t clear from the blog post.

                  1. 5

                    Some of this is leaking from the constraints of the generated code. The generated code uses an intersection of the type system of MyPy and C++, and then we link against the hand-written C++. So the hand-written C++ can be ugly. Moreover it has to be rewritten to use the garbage collector.

                    I mentioned the dumb_alloc caveat in the post. It is a proof of concept. It wasn’t obvious this would work from the start!

                    I would be fine with a complete rewrite, and the entire goal is to find someone who’s a better C++ programmer than me !!! (FWIW I have written medium amounts of C++ from scratch that runs continuously in production, though it’s definitely not my main language)

                    However I also want to be respectful of the donations / grants and take the shortest path to a good implementation.

                    It will save ~5 years if you reuse the executable spec rather than rewriting by hand :) The compilers are extremely short (< 10K lines) while the shell would easily be ~150K-200K lines of hand-written C++ (or Rust).

                    As mentioned, I also encourage parallel efforts in other languages (and other people have started them)

                    Also I accept patches and more than 50 people have contributed in the past. We can probably use size_t in many places.

                    On the other hand, there shouldn’t be any copy/move because the generated C++ uses a garbage-collected heap shaped like statically typed Python. The source program only passes pointers around.

                    1. 5

                      If it’s machine generated, I’d expect it to be using smart pointers. If you’re coming from a ref-counted language then just lowering all references to std::smart_ptr will work. If you’re using a GC, then you can use your own gc_ptr<T> smart pointer and now you have an automatic way of tracking roots on the stack, so you don’t need to use conservative GC and all of the exciting non-determinism that entails.

                      1. 1

                        Yes that was an early attempt. Instead of generating List<Str*>* (an extremely common type in the program), I generated code like:


                        Besides being pretty ugly to read, I couldn’t get it to compile! I discovered that shared_ptr has some other template parameters that somehow don’t compose (?)

                        It’s possible that there was some constraint in my code generator that caused this. I don’t remember the exact issue, but I remember discussing it with the author of the Fish shell (written in C++) in late 2020. And apparently he didn’t have an answer either.

                        It is probably worth another try, so I filed this issue for posterity: https://github.com/oilshell/oil/issues/1105

                        At the time, I also watched some CppCon videos about the common implementations of shared_ptr. From what I remember it is quite bloated especially in the presence of those recursive types. I believe the Cheney collector will using is much lighter, although I don’t have measurements.

                        I also looked at Herb Sutter’s GC smart pointer experiment, although I don’t think I played with the code .. I haven’t seen any codebases that use such techniques in production, but I’d be interested in references.

                        I definitely want to get more input from C++ experts involved … Although as I mention here I think an ideal person could be someone with a lot of C++ experience who also uses Rust: https://old.reddit.com/r/ProgrammingLanguages/comments/tt49kb/oil_is_being_implemented_middle_out/i2y1h4y/

                        1. 2

                          C++‘s shared_ptr is very general, which sadly precludes an optimised implementation. It is non-intrusive, and so handles objects created with std::make_shared and object allocated separately that are later managed by std::shared_ptr. In the former case, it will do a single allocation to contain the refcount and the object, in the latter case it will allocate them separately. I say refcount, but it actually needs two, because it also supports weak references and so maintains a count fo weak references too. When the strong reference count reaches 0, the object’s destructor is called and, if the object is in a separate allocation, then it is also freed. When the weak reference count reaches zero, the control block is freed. 99% of the time that I use std::shared_ptr, I don’t care about weak references and so it bugs me slightly that I have to use it. It also annoys me that the separate allocation means that std::enable_shared_from_this stores a weak reference to the control block, rather than just embedding the control block in the object, so I end up needing three words per object to store one object’s state.

                          All of that said, I generally start writing code using std::shared_ptr, std::vector, std::unordered_map, std::unordered_set, and even std::regex. This gives me a similar set of abstractions to something like Python and makes it very easy to optimise later when I discover that one of these is a bottleneck. The compile + run time is often similar to the run time for Python for short programs and much better the second time I run it.

                          I believe both WebKit’s JavaScriptCore and V8 use a smart pointer for tracking stack roots for GC so that C++ code can use JavaScript objects fairly natively. This kind of model also lets you use a GC that can pin objects or relocate them. If your fields are smart pointers that return a different smart pointer for access, that can explicitly pin the objects while you’re using them, but while they’re in fields they’re just values that the GC knows about and can update.

                          I don’t know how common cycles are in your code and whether you need anything more than reference counting though. If you don’t need concurrent collection then a tracing GC might be faster since you don’t pay anything during execution (on the other hand, if you’re single threaded, non-atomic ref-count manipulation is incredibly cheap). For a shell, you might be able to do a GC sweep before each wait call and completely hide GC pauses behind execution of other programs. The nice thing about generating C++ code that uses smart pointers is that it’s trivial to change the memory management policy by changing the using directive that defines the smart pointer type that you’re using.

                          1. 2

                            Yeah I remember the intrusive/non-intrusive distinction and extra weak count from those CppCon videos … If it had compiled I might have gone with it, but that was the source of my strong feeling that GC would be more compact and thus perform better. Especially when you take into account that we have many similar types, like List, List, List<List> on occasion, etc. which produces a template code explosion. There’s always the possibility of writing our own smart pointer but I didn’t feel up for it …

                            Oil’s code probably doesn’t have that many cycles, but I would want some kind of tool to verify/debug if that were the case …

                            The stack root issue did seem to be the hardest to solve and the least documented. The best blog post I found was this one:


                            (which is linked from https://github.com/oilshell/oil/wiki/Compiler-Engineer-Job)

                            I wish there were a survey of what JavaScriptCore, V8, and SpiderMonkey do. I looked at the v8 and SpiderMonkey code. It’s not that easy to figure out the invariants … I couldn’t find good comments or design docs.

                            Right now the collector can potentially run whenever an allocation is made, and at no other time. I took some inspiration from the small collector in Julia’s femtolisp (bootstrap language).

                            The wait() trick might be fun – there’s a lot of room for fun optimizations once the whole thing is working.

                            Sounds like I need to write some more blog posts … I wanted to do smart pointers for the stack roots problem, and I talked to Max Bernstein a little about it, who is working on a Python VM at Facebook.

                            But since we’re generating code, I actually just went with a very simple scheme like this:

                            Str* myfunc(Str* a, Str* b) {
                              Str* mylocal = nullptr;
                              // register stack roots before running any code in this function
                              gc_heap::StackRoots({&a, &b, *mylocal});  // add them to the root list, them pop them off later
                              // code that allocates and thus could collect ...

                            I actually don’t remember why I abandoned smart pointers, since that was the initial solution, based on looking at SpiderMonkey, v8, talking to Max, etc. A downside of smart pointers in my mind is that stepping through the code in the debugger becomes more annoying, and introspection with the debugger is affected.

                            However I don’t think that is the whole reason. It is possible it was a limitation of our translator again. It is using MyPy in a somewhat hacky way. I can probably dig it up from a year ago …

                            Anyway I am waiting to hear back on this NLNet grant for 50K euros … hopefully that will kick things off, along with some blog posts about how the C++ works.

                            For reference here is most of the stuff in the metalanguage that has bearing on garbage colection:

                            • Str type (like Python)
                            • Generic containers
                              • List
                              • Dict<K, V>
                              • Tuple2<A, B>, Tuple3<>, … Tuple5 (doesn’t need to be more general)
                            • Classes, with inheritance and virtual functions
                              • Zephyr ASDL is a special case of this – it models sum types as inheritance
                            • Control flow:
                              • exceptions
                              • scoped construction and destruction: Python’s “with” is mapped to C++ constructors/destructors

                            The memory layout is somewhat inspired by reading about OCaml’s runtime. It’s basically a more homogeneous graph of memory blocks, in contrast to the more heterogeneous MyPy/C+ types.

                            So I still need to explain more and figure out administrative things… But any referrals would be very helpful. In some sense this is like straightforward engineering, but it also appears that it does span quite a few subsystems and the ideal person would know C++ well, but not be ONLY a C++ engineer. I think the design is a little more holistic since we’re doing everything ourselves and not using libraries. (I think we may need to ditch the MyPy dependency and write our own type simple type checker, which I researched …)

                            1. 1

                              Hi again! FWIW, when building our VM, I don’t recall the smart pointers (Handle & co) being annoying.

                  2. 4

                    Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.


                    I’ve been reading posts on oil (off and on) for years now, and this is the first time I’ve “gotten it,” and only because of “Language Oriented Programming.” So, I’m glad you finally found the words.

                    However, I’m curious why Greenspun’s Tenth Rule applies so well, and why you’re trying to bend Python to do DSLs, when anything in the Lisp family is already a million times better suited, and self contained within it’s own single compiler, for writing DSLs?

                    1. 4

                      This is just so awesome and inspiring to read. I want to first share, I’ve never ever been interested in building a programming language or in PL theory. I really thought it was a solved field, and we’re just making incremental improvements. But a few things recently have led me to think about the workflow described here, that being creating a language at the center of your workflow which is a DSL for your architecture.

                      Again, as someone who had no prior interest in PL design, this is the most important concept for me, and it’s the one that I think might actually be compelling for other people too:

                      Oil is many times smaller than bash in terms of its source code.

                      Even in the most well-factored architectures there can be both boilerplate and cognitive overhead. Take something simple like REST vs. GraphQL in a web app. Whichever one you choose, you have to write a whole bunch of code for each new piece of data that you add to the domain model. That’s code that has nothing to do with the domain, aka “accidental complexity.” Building a language allows you to generate that code from a specification of the essential complexity alone (or at least your approximation of that ideal via your language design). This is the “two halves” that are referenced in this post: the domain language, and the specifications that use that language.

                      I totally agree with the author, I’d rather focus on the specification level, i.e. focusing on the essential behavior of the application. And modern web development is full of nothing but tedium and boilerplate. There’s a push toward simplification of architectures again (e.g. LiveWire / Hotwire and moving back to more work on the server), but this still doesn’t seem like the simplicity ideal.

                      So, I started playing around with creating a language based on that idea, with a goal of building web applications. It’s still really immature, so I almost feel guilty bringing it up, but why not: https://github.com/amw-zero/sligh. Having a language to express the core specification of an app opens up a lot of doors, like generating a single property-based test that checks that the generated code is semantically equivalent to the specification. My thinking is that this can eliminate a ton of effort with testing as well, since I believe so much testing effort is spent on testing state spaces that are much larger due to the increased number of concerns that the implementation level has. This of course isn’t a new idea, formally verified projects such as sel4 have been verifying that implementations refine / simulate specifications for years.

                      Anyway - really cool to see someone thinking about and sharing a similar idea, and have it work out on a larger project. Good luck with the funding goal and keep posting!

                      1. 3

                        Yes I am a stickler about code size! I guess you could call me traumatized from working with hundreds of thousands of lines of C and C++ professionally …

                        The funny thing is that we think of these things as “DSLs” now, but eventually they will just become “programming”.

                        It used to be that a “for loop” was an exotic concept in “automatic programming” (I should dig up some citations for this). Assembly language was “manual programming”, and “for loops” were magic that saved you all this boilerplate !

                        Yes the web space is very interesting right now in terms of language design. (And I think underrepresented on lobste.rs) It’s fascinating to me how many new language projects have sprung up – Svelte as you mention on your page, Elm, Hack on the server side, etc.

                        I also find the web very Unix-y in terms of its combination of DSLs – HTML, JS and CSS all interleaved. And now TypeScript and more.

                        I’d definitely like to read some blog posts about how the language is designed! And particularly comparisons to other solutions in the space. I found that these kind of surveys of other projects really clarified my thinking about the shell, and what I want to do. Explaining and defending it in lobste.rs / reddit / HN threads definitely makes you sharpen the ideas and arguments ! :) It’s a lot of work but it’s been worth it.