It could be interesting to see if it is possible to use graalpython. It’s probably not compatible enough yet, but that could be solved. My guess (from Truffle’s results with other dynamic languages like ruby) is it will be faster than any other approach. E.g. being able to inline across the python/C boundary.
Cinder:
It also bundles Static Python, a type-checked and type-optimized subset of Python.
There are also some vague internal reasons like “we tried PyPy and it wasn’t promising on our workload”.
That’s interesting, as it implies that Cinder compares favorably to both CPython and PyPy on some representative benchmark. Would you be willing to submit a benchmark upstream to PyPy’s list of benchmarks? They generally treat slowness as a bug and would be interested in understanding.
I don’t know if it’s that simple. It was a multi-person effort at least 5 years ago to port our workload to PyPy and I think all people involved have since moved on to greener pastures.
I don’t mean to say that PyPy was necessarily slow; the problems could have been in the forking model, or memory consumption, or taking some time to warm up, or not being able to optimize across the C extension boundary… something in that realm.
PyPy may well have been faster, even, but it’s an enormous task to try and internalize the whole runtime and see where we can make improvements :)
Who wrote this? By “…when I wrote SSB” I’m guessing it’s Dominic Tarr?
Mobile platforms are also autonomy robbing simply because they are not cross platform. You have to develop software twice simply because people need to use them on different brands of phone. And again if someone wants to use it on a regular computer. That’s just a silly waste of time.
Hey, at least there are only two major mobile OSs, as opposed to at least three on “regular” computers. (And all mobile OSs are POSIX compliant.) And it is not a “silly waste of time” to tailor an app to a platform so it supports platform features and integrates coherently with the platform UX.
I’m not clear what the point of this article is. The web takes away autonomy but so do native apps, somehow WASM will fix it, stay tuned for more. … ?
There’s a bit of irony here coming from the SSB project which spends an inordinate amount of time dealing with how its state serialization format is tied to the Node runtime.
Like the way their signatures rely on the exact behavior of JS’s JSON.stringify function, which makes it inordinately difficult to write a compatible implementation of SSB in any other language.
Also, there are a bunch of cross-platform toolkits. Yes, the UX isn’t as good, but it can be done where it is considered economical or otherwise desirable.
Interesting. This seems similar to LuaJIT’s DynAsm or the DynAsm Rust crate, which is used in the Wasmer runtime, except instead of incrementally pushing machine code onto a stack/Vec, it’s pushing blocks of code and then fixing them up? I’m amazed the performance difference is that large. I’ll have to keep reading…
A big difference is that they rely on the compiler to generate optimal code instead of having programmers write assembly. This is probably a big perf difference.
Yes, the large library of compiler-generated templates adequately explains superior runtime performance, but I’m amazed that the compile-time performance is that much better than Wasmer Single Pass.
text file in dropbox + shell script that opens the file and inserts date -Iseconds
, under which I write the notes.
https://superuser.com/questions/782063/vim-append-date-to-end-of-line might be useful for you
I write a blog using a static site generator that supports markdown. I write the blog mostly for myself: if I can’t explain something well, do I really know it? The markdown helps with writing and editing quickly, and is convenient to translate to HTML.
If I am taking notes in a class or during a meeting, I take notes by hand in an engineering notepad and transcribe the useful parts into Quip/Google Docs/a website (depends on the audience) later.
Yeah, the wiki assembles to a 1200 bytes rom, and generates about 500 pages in 400ms on a 12 years old laptop.
https://github.com/XXIIVV/oscean/blob/main/src/oscean.tal
It can be assembled with the self-hosted assembler
omg HI DEVINE! you’re such a huge inspiration of mine. it’s amazing to see you comment on my post :D
i love hundred rabbits to death. it’s what has inspired me to take my life into my own hands. i’m sure you’re aware, but rekka made my website’s pufferfish logo :3
so cool to see you. i hope you’re doing well!
Rekka showed me your website and I loved it instantly, I was so happy to see your little pufferfish show up on HN/Lobsters yesterday :) Looking forward to see how it evolves.
We have a this little webring, if you’re ever keen to join, lemme know!
yes i am very keen! feel free to contact me via email! (or this thread is fine too but i check lobsters pretty randomly :D)
Also, I originally thought this would be a post about https://redbean.dev/
Out of curiosity I took a look at your existing C++ and it’s pretty clear to me it’s written by someone who has no experience in modern C++ development (int
for sizes everywhere, naked pointers to dynamically allocated memory passed around even though exceptions are also used, no awareness of where to copy and where to move, etc). I believe you will have a hard time finding a good C++ developer who would be willing to carry on in this style. IMO, your best option would be to find a good C++ developer, give them the spec for the shell and conformance tests, and let them re-implement it from scratch.
Also I just added this to “Good Signs” on the “job listing”
If you think our C++ is ugly! That means you have ideas on how to make it better. What exists is a proof of concept, designed to show the strategy will work and can perform well. There are many improvements that can be made. If you’re convinced a complete rewrite is necessary, then please make a case that it’s feasible justified by a survey of the code.
Also note that the dumb_alloc
part is going to be thrown away in favor of gc_heap.cc
, etc. There is already some rewrite inherent in the project, though maybe it wasn’t clear from the blog post.
Some of this is leaking from the constraints of the generated code. The generated code uses an intersection of the type system of MyPy and C++, and then we link against the hand-written C++. So the hand-written C++ can be ugly. Moreover it has to be rewritten to use the garbage collector.
I mentioned the dumb_alloc
caveat in the post. It is a proof of concept. It wasn’t obvious this would work from the start!
I would be fine with a complete rewrite, and the entire goal is to find someone who’s a better C++ programmer than me !!! (FWIW I have written medium amounts of C++ from scratch that runs continuously in production, though it’s definitely not my main language)
However I also want to be respectful of the donations / grants and take the shortest path to a good implementation.
It will save ~5 years if you reuse the executable spec rather than rewriting by hand :) The compilers are extremely short (< 10K lines) while the shell would easily be ~150K-200K lines of hand-written C++ (or Rust).
As mentioned, I also encourage parallel efforts in other languages (and other people have started them)
Also I accept patches and more than 50 people have contributed in the past. We can probably use size_t
in many places.
On the other hand, there shouldn’t be any copy/move because the generated C++ uses a garbage-collected heap shaped like statically typed Python. The source program only passes pointers around.
If it’s machine generated, I’d expect it to be using smart pointers. If you’re coming from a ref-counted language then just lowering all references to std::smart_ptr
will work. If you’re using a GC, then you can use your own gc_ptr<T>
smart pointer and now you have an automatic way of tracking roots on the stack, so you don’t need to use conservative GC and all of the exciting non-determinism that entails.
Yes that was an early attempt. Instead of generating List<Str*>*
(an extremely common type in the program), I generated code like:
shared_ptr<List<shared_ptr<Str>>>
Besides being pretty ugly to read, I couldn’t get it to compile! I discovered that shared_ptr
has some other template parameters that somehow don’t compose (?)
It’s possible that there was some constraint in my code generator that caused this. I don’t remember the exact issue, but I remember discussing it with the author of the Fish shell (written in C++) in late 2020. And apparently he didn’t have an answer either.
It is probably worth another try, so I filed this issue for posterity: https://github.com/oilshell/oil/issues/1105
At the time, I also watched some CppCon videos about the common implementations of shared_ptr
. From what I remember it is quite bloated especially in the presence of those recursive types. I believe the Cheney collector will using is much lighter, although I don’t have measurements.
I also looked at Herb Sutter’s GC smart pointer experiment, although I don’t think I played with the code .. I haven’t seen any codebases that use such techniques in production, but I’d be interested in references.
I definitely want to get more input from C++ experts involved … Although as I mention here I think an ideal person could be someone with a lot of C++ experience who also uses Rust: https://old.reddit.com/r/ProgrammingLanguages/comments/tt49kb/oil_is_being_implemented_middle_out/i2y1h4y/
C++‘s shared_ptr
is very general, which sadly precludes an optimised implementation. It is non-intrusive, and so handles objects created with std::make_shared
and object allocated separately that are later managed by std::shared_ptr
. In the former case, it will do a single allocation to contain the refcount and the object, in the latter case it will allocate them separately. I say refcount, but it actually needs two, because it also supports weak references and so maintains a count fo weak references too. When the strong reference count reaches 0, the object’s destructor is called and, if the object is in a separate allocation, then it is also freed. When the weak reference count reaches zero, the control block is freed. 99% of the time that I use std::shared_ptr
, I don’t care about weak references and so it bugs me slightly that I have to use it. It also annoys me that the separate allocation means that std::enable_shared_from_this
stores a weak reference to the control block, rather than just embedding the control block in the object, so I end up needing three words per object to store one object’s state.
All of that said, I generally start writing code using std::shared_ptr
, std::vector
, std::unordered_map
, std::unordered_set
, and even std::regex
. This gives me a similar set of abstractions to something like Python and makes it very easy to optimise later when I discover that one of these is a bottleneck. The compile + run time is often similar to the run time for Python for short programs and much better the second time I run it.
I believe both WebKit’s JavaScriptCore and V8 use a smart pointer for tracking stack roots for GC so that C++ code can use JavaScript objects fairly natively. This kind of model also lets you use a GC that can pin objects or relocate them. If your fields are smart pointers that return a different smart pointer for access, that can explicitly pin the objects while you’re using them, but while they’re in fields they’re just values that the GC knows about and can update.
I don’t know how common cycles are in your code and whether you need anything more than reference counting though. If you don’t need concurrent collection then a tracing GC might be faster since you don’t pay anything during execution (on the other hand, if you’re single threaded, non-atomic ref-count manipulation is incredibly cheap). For a shell, you might be able to do a GC sweep before each wait
call and completely hide GC pauses behind execution of other programs. The nice thing about generating C++ code that uses smart pointers is that it’s trivial to change the memory management policy by changing the using
directive that defines the smart pointer type that you’re using.
Yeah I remember the intrusive/non-intrusive distinction and extra weak count from those CppCon videos … If it had compiled I might have gone with it, but that was the source of my strong feeling that GC would be more compact and thus perform better. Especially when you take into account that we have many similar types, like List, List, List<List> on occasion, etc. which produces a template code explosion. There’s always the possibility of writing our own smart pointer but I didn’t feel up for it …
Oil’s code probably doesn’t have that many cycles, but I would want some kind of tool to verify/debug if that were the case …
The stack root issue did seem to be the hardest to solve and the least documented. The best blog post I found was this one:
https://blog.mozilla.org/javascript/2013/07/18/clawing-our-way-back-to-precision/
(which is linked from https://github.com/oilshell/oil/wiki/Compiler-Engineer-Job)
I wish there were a survey of what JavaScriptCore, V8, and SpiderMonkey do. I looked at the v8 and SpiderMonkey code. It’s not that easy to figure out the invariants … I couldn’t find good comments or design docs.
Right now the collector can potentially run whenever an allocation is made, and at no other time. I took some inspiration from the small collector in Julia’s femtolisp (bootstrap language).
The wait() trick might be fun – there’s a lot of room for fun optimizations once the whole thing is working.
Sounds like I need to write some more blog posts … I wanted to do smart pointers for the stack roots problem, and I talked to Max Bernstein a little about it, who is working on a Python VM at Facebook.
But since we’re generating code, I actually just went with a very simple scheme like this:
Str* myfunc(Str* a, Str* b) {
Str* mylocal = nullptr;
// register stack roots before running any code in this function
gc_heap::StackRoots({&a, &b, *mylocal}); // add them to the root list, them pop them off later
// code that allocates and thus could collect ...
}
I actually don’t remember why I abandoned smart pointers, since that was the initial solution, based on looking at SpiderMonkey, v8, talking to Max, etc. A downside of smart pointers in my mind is that stepping through the code in the debugger becomes more annoying, and introspection with the debugger is affected.
However I don’t think that is the whole reason. It is possible it was a limitation of our translator again. It is using MyPy in a somewhat hacky way. I can probably dig it up from a year ago …
Anyway I am waiting to hear back on this NLNet grant for 50K euros … hopefully that will kick things off, along with some blog posts about how the C++ works.
For reference here is most of the stuff in the metalanguage that has bearing on garbage colection:
The memory layout is somewhat inspired by reading about OCaml’s runtime. It’s basically a more homogeneous graph of memory blocks, in contrast to the more heterogeneous MyPy/C+ types.
So I still need to explain more and figure out administrative things… But any referrals would be very helpful. In some sense this is like straightforward engineering, but it also appears that it does span quite a few subsystems and the ideal person would know C++ well, but not be ONLY a C++ engineer. I think the design is a little more holistic since we’re doing everything ourselves and not using libraries. (I think we may need to ditch the MyPy dependency and write our own type simple type checker, which I researched …)
Hi again! FWIW, when building our VM, I don’t recall the smart pointers (Handle
& co) being annoying.
I did this a while back. It is not as detailed as this post, but implemented in pyodide. So you can explore online.
I think that the link to Skybison is broken.
Have you looked at PyPy’s translation process? The RPython toolchain also builds a collection of basic blocks from bytecode.
Thanks for the tip! Fixing it now. I’ll take a look at PyPy…
EDIT: Oh, very neat. I should add a link to that somewhere.
I’m building a similar project for our internal compiler IRs. I hope to have a public web UI for it soon.
There is also https://github.com/smarr/are-we-fast-yet
Consider: https://kdl.dev/
The syntax can be a little weird
Of course, using children for literals is overly-verbose. It’s only necessary when nesting arrays or objects into objects; for example, the JSON object
{"foo": [1, 2, {"bar": 3}], "baz":4}
can be written in JiK as:
object {
(foo)array 1 2 {
object bar=3
}
(baz)- 4
}
It wasn’t clear to me whether the partial executer is upstream or if it’s something that you maintain. I’d love to have it on by default for -Oz.
PartialExecuter is currently only part of Cheerp, an open-source LLVM fork that target JS+Wasm. The pass in theory it’s robust plus passes a big chunk of LLVM tests already, but I expect some amount of work will be needed to upstream it, but that’s the the proper destination at some point in the future.
Do you have any particular codebase/example where this could be of impact?
Do you have any particular codebase/example where this could be of impact?
Not especially, but the printf example seems quite compelling. If a program doesn’t call setlocale
then being able to trim all locale-related code from libc would be a big win for a load of tiny statically linked tools. Embedded systems would probably have a similar win to web assembly.
Is this applicable to other compilers & runtimes too or are you taking advantage of something already in LLVM? Haven’t read it yet
Implementation is very much LLVM-specific, but the same logic can be applied to basically any compiler
I use https://gandi.net Never had a problem.
I stopped using Gandi after their unprofessional response to their data loss incident in 2020: https://imgur.com/s3R1VVc
I’ve personally experienced GANDI becoming surreally rude and unhelpful when they’ve made a mistake.
I still use Gandi for domain name, but moved away from using their web hosting service after the data loss incident few years ago.
I love gandi, and use them for all my domains, but their lack of transfer lock for .fr
for individuals, while they support it for their enterprise customers is a shame for a French company. (Even OVH, from which I moved away 5 years ago supported transfer lock for .fr
for all customers)
I also use them but I’m a bit scared by the fact that their DNS API has no fine-grained access control. I can’t give an ACME client the ability to create the required TXT records for a specific domain, I have to give it the ability to modify any DNS records for any domain that I own.
Heh, I wrote https://bernsteinbear.com/blog/recursive-python-objects/ a while back. Similar thoughts
You might like my friend’s project: https://github.com/ethanpailes/remake
Them: “Leslie Lamport may not be a household name,[…]”
Me: “The hell he isn’t.” [rage close]
(I opened it back up and read it anyway. It was actually really interesting. But my rage close was real.)
There are whole households out there that don’t have a single graduate degree in them. Amazing, I know!
That said, I didn’t actually know LL was the one behind TLA+, so it was a useful read for me too. (Also, it turns out he actually does look somewhat like the fluffy lion on the cover of the LaTeX book!)
Yeah, I knew he did Lamport clocks but didn’t know he was also the guy who did LaTeX and TLA.
Inverse for me, I never made the connection between LaTeX Lamport and clock Lamport.
I saw that and ran a Twitter poll
Not as good as I’d like, but not as bad as the article makes it sound. Granted, this is a heavily skewed demographic. I ran the same poll at work and got 0% yes.
And even then, people tend to know him more for LaTeX than for his much more important work on distributed computing. Which I’ve heard bothered him.
I imagine that there are more people writing professional documents than working on distributed computers.
I can kind of see why that bothered him. Many people viewed TeX as the real, manly typesetting system, and LaTeX was for weaklings. Lamport received the impression of a hippie simplifying Knuth - more an enthusiastic college lecturer than a “real” computer scientist.
OTOH LaTeX made TeX usable, and facilitated the distribution of a lot of science. That has to count for something.
I know him for Lamport clocks and not much else :)