@andyc How crazy am I for thinking of using OSH on Cygwin? Would it be too early to try to do that? I could lend a hand to set up AppVeyor or some other CI system on Windows, probably with Cygwin, maybe also package OSH for Cygwin.
Hm I guess it depends on how difficult a Cygwin port is? Does Cygwin use the same Makefile and build scripts?
I think it would be cool to do, but I wouldn’t expect it to be a great experience right now. You’ll probably notice the slowness. That’s my top priority right now. The build will still be ./configure; make; install after that, but the underlying source will look totally different.
./configure; make; install
Right now there is a big chunk of CPython, which is why you will see some build warnings. I hope to get rid of that, and I’m not sure how that will affect a Cygwin port. That is, if you have to throw it out when I make things faster, it’s probably not worth it.
Someone did try it on the Windows Linux emulation and it apparently worked fine. But as I understand it, that’s much easier than Cygwin.
But if you want to chat more about it feel free to bring it up on oilshell.zulipchat.com. Thanks!
Cygwin should normally use the same Makefiles and everything. I’m not sure about the precise dynamics but I expect cygwin.dll to be involved somehow, providing the Unix syscalls behind the scenes. It’s more about paths and such regarding building and running OSH.
I’ve joined Zulip and posted in oil-discuss, not sure it’s the right place, I haven’t used Zulip before :)
I seem to always be out of luck when trying to compile OSH on a standard ArchLinux x86_64 box. The .pre15 build is no exception – it segfaults here:
strip -o _build/oil/ovm-opt.stripped -S _build/oil/ovm-opt
dsymutil _build/oil/ovm-opt.stripped -o _build/oil/_build/oil/ovm-opt.stripped.dSYM
0. Program arguments: dsymutil _build/oil/ovm-opt.stripped -o _build/oil/_build/oil/ovm-opt.stripped.dSYM
LLVMSymbolizer: error reading file: No such file or directory
#0 0x00007fdab2905b4b llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/bin/../lib/libLLVM-7.so+0x900b4b)
#1 0x00007fdab2903fa4 llvm::sys::RunSignalHandlers() (/usr/bin/../lib/libLLVM-7.so+0x8fefa4)
#2 0x00007fdab290412e (/usr/bin/../lib/libLLVM-7.so+0x8ff12e)
#3 0x00007fdab1ce9e00 __restore_rt (/usr/bin/../lib/libc.so.6+0x37e00)
#4 0x0000556bb3b423dd (dsymutil+0x113dd)
#5 0x00007fdab1cd6223 __libc_start_main (/usr/bin/../lib/libc.so.6+0x24223)
#6 0x0000556bb3b449ae (dsymutil+0x139ae)
make: *** [Makefile:142: _build/oil/ovm-opt.stripped] Segmentation fault (core dumped)
make: *** Deleting file '_build/oil/ovm-opt.stripped'
Thanks for the report! It looks like the dsymutil path meant for OS X is triggering. I filed a bug here:
It did build on Arch Linux like a year ago, but it looks like the OS X support somehow broke this.
Does Arch Linux use Clang by default?
I’ll update the OSH AUR once this is resolved. It’s been a while since I updated it, because I was waiting for a major version update. At this point, however, the latest minor release is a progression significant enough that I think the broader Arch community will want in on it.
BTW thanks for all your work on this! I started doing the minor releases because I realized there would be many releases that are significant from the perspective of OSH, but don’t move the needle much from the user perspective.
I don’t really expect anyone to use OSH as their full time shell now, but I do want it to be tested! If having an Arch package makes it easier for people to test it, that’s great.
I plan for 0.6.0 to be sort of “feature-complete”, but it’s still going to be slow. Hopefully 0.7 will be a faster version translated to C++, but it’s hard to predict that right now :)
Does Arch Linux use Clang by default?
ArchLinux uses gcc by default, but I have both gcc and clang installed. I am unsure which of them did the compiling.
Thanks for opening the issue on Github.
I see you mention some kind of a “MyPy / C++ translation”, presumably at some unspecified time in future. I’d like to ask, did you consider maybe using Nim as the target language instead? I’m exploring it recently, and found it seems to have some features which could potentially make it an interesting fit for your use case:
As a random example, here’s some code I’ve written recently, a parser for a particular rather simple data format I needed to process.
I did look seriously at Nim, since it’s Python-like and has types. But there’s no way to automatically convert Python to Nim, even though superficially they look similar. I discovered the hard way with a few experiments that apparently tiny differences really add up (e.g. between Python and Lua).
OSH is small, but rewriting it by hand is hard, since it has most of the logic of 124K lines of C code in bash. It’s a dense codebase!
I downloaded the Nim compiler and I liked their bootstrapping process. But I didn’t like their generated code. It was not human-readable and it’s very long. I’m sure they have a good reason for that, perhaps efficiency. But either way the first point about rewriting makes it moot.
I also looked at Haxe and Vala, which are some other languages that compile to C or C++.
Some more here:
I do think there is some benefit to a custom translation. Because if you only need to translate one program, then you don’t have to be that general. It’s a much easier job to translate one program than to design an entire programming language meant to be translated!
I’m basically trying to take the shortest path to make OSH faster now. I don’t care what it is! I was too concerned with bootstrapping before, and now I no longer care about that. That can come much later.
Speed is an open problem, but I’m very motivated to solve it! If it works I can say it’s the first “metaprogrammed” shell :)
Oh yeah and the reason I care about compiling to C or C++ is because build time dependencies matter for a shell. Shell is used on all sorts of weird embedded devices with weird build systems!
Also I wasn’t sure Nim worked that way. I think it generates C “under the hood”. There’s a subtle difference when people don’t actually distribute the C it translates to. Nim is more of a compiler than a translator as far as I can tell. It looks like the control flow graph is compiled to C, as opposed to the AST, as far as I remember. LIke the generated code is all gotos rather than structured control flow.
Thanks a lot for taking care to reply so thoroughly! Sure, translating code between languages is certainly not easy, neither by hand nor automatically. But as you say, doing it for one particular app is significantly much easier than doing it as a general tool. You now reminded me, that I actually did it once for one app this way, which I would now call a “computer aided translation”, i.e. starting off with a simple, barebones Python->Lua experiment that I extended as needed, sprinkled with some odd by-hand translations in a few places where needed. Actually, I was inspired by how the translation of the Go compiler from C to Go was handled. That said, I also did some by-hand translations, to tell the truth… uh, but, maybe nevermind, probably. I certainly never checked CLOCs, but I’m pretty sure all of those were orders of magnitude smaller than yours, anyway. I see there seems to be some other person’s barebones Py->Nim experiment out there too, but I certainly don’t intend to try and push you in any direction any more! :) Especially given that I see you’ve researched it super thoroughly, awesome to learn that! Also, the link is very interesting, and may be useful to me, thanks for sharing! :)
Just one thing I’m somewhat scratching my head about, is that I think I’m somewhat confused what’s your situation with the translation process in the end… I mean, I kinda don’t really remember any concrete mention of this in your recent posts; if you wrote more about it, sorry that I missed it! And if you actually did not, then I’m just sorry for rehashing some things you totally know and researched, and better than I did; it’s just that I didn’t realize you’re already advanced into the process to some extent, so I aparently totally misjudged where are you with it! :)
If that’s not asking too much, wouldn’t you then mind sharing a super-short overview of what’s your current thinking related to the translation, assuming I know totally nothing about your plans — or directing me to some post/comment where you already sketched those? As all things OSH, I would find this really interesting! :D But no pressure, you probably already have enough work and writing to do, so don’t worry if that would be too much trouble for you. Cheers & Good Luck once again! :)
Small addendum: I was also inspired by the talk on converting the Go compiler to Go! 
I watched it a couple years ago when I started Oil, and I just watched it again a month or so ago. And I understood more of it this time around :) For example, the point about the failure of escape analysis and resulting slowdown was pretty interesting, and I didn’t know enough about compilers to understand it the first time.
Anyway, what’s interesting is that the translator has to do a lot of semantic analysis, and it makes control flow graphs from the C code as far as I remember.
I looked at some of this code – it’s impressive for a one-off!
I think their problem is simultaneously easier and harder. Harder because their goal is to generate EDITABLE code. The Go version becomes the new source, but that’s with OSH Python/MyPy will be the source for the forseeable future. But it’s easier in the sense that C and Go are more semantically similar than Python and C++.
Yes things changed, thanks for paying attention :)
At the beginning of the blog, I expressed the wish to translate to C++ (e.g. even in the first post). But for expedience, I’ve had the CPython hack for a long time, since the first release in 7/2017. The goal was to define the language in a simple way without getting bogged down in detail – without groveling through backslashes and braces one-by-one, as I like to think of it.
At first, I was using a restricted dialect of Python, e.g. avoiding exceptions my own code, thinking I would translate this dialect to C++. Somewhere along the line line, I started using more and more features of Python simply because it made the code easier to write, and I realized the translation to C++ would be hard. (An analogy is that RPython from PyPy is a relatively unpleasant language to write in, from what I gather).
I had this other OVM2 idea to replace CPython, which was attractive because of bootstrapping. I wrote about it in December. In the fall, I got a very basic VM 1000 lines of C++ code running a Fibonacci program in Python. That is, I barely started it, but I think I scoped out the task and it seemed doable.
The thing that really changed my mind is getting the interactive shell working, hitting TAB, and seeing the hundreds of millisecond delays. So OSH is over an order of magitude too slow. Both parse time and runtime are too slow. (Although runtime might be due to using /usr/bin/printf rather than a printf builtin.)
So basically what I foresaw is that I could spend 6-12 months on OVM2, and OSH would STILL be too slow. Then I would have to do all this work to speed up OVM2, which is hard.
So now I’m back at translating to C++. I believe that is the shortest path to speed up OSH without rewriting all the code (although I certainly could be wrong). This was partly inspired by looking at the Shed Skin Python compiler and trying some examples, and somebody on HN telling me that they actually use it in production!
Shed Skin does type inference, but that doesn’t scale to big programs. So I decided to add types with MyPy first. So far I like MyPy and I didn’t have too many problems getting it going, and I was able to get some significant code passing under --strict (which means everything is explicitly typed). PyAnnotate also helped.
So I plan to keep going with MyPy. No matter what, I believe that types will help make OSH faster. In some sense, there’s simply not enough information in the current source code for OSH to be fast!
After OSH passes under mypy --strict (which doesn’t look like it will take too long based on my initial experience), I want to translate it to C++.
That is a open problem. I mean I think it’s possible with enough effort, but the question is whether it takes 3 months or 12 months, and how fast the result will be. At a very high level, I think it makes sense to leverage the power of OTHER people’s work – MyPy and C compilers. At one point, I thought I would write my own type checker and turn “OPy” in a typed language – but I realized that this is way too much work and a lot of yak shaving that didn’t have much to do with the shell. It was scope creep because as you know it is very enticing to design your own language :) Once you try to implement it, your ambitions can be trimmed a bit :)
So long story short, I think the C++ translation will lead to a faster result in a shorter time. OVM2 was basically because I was attracted to the idea of bootstrapping, and I now realize I was thinking about it too early. Go wasn’t bootstrapped from 2007 to 2015 or so, and I think that makes sense.
I discovered that Racket STILL isn’t bootstrapped, which is surprising for a Lisp! That is, Racket contains like 200K lines of C code, just like CPython and Ruby do.
So the OVM2 idea would have just dragged the project on for too long. Now that OSH does what i want it to do, I’m very motivated to make it faster!
Thanks for the question! I didn’t write about this on the blog for reasons of space, and because whether I will succeed in translating to C++ is an open question still. But yes feel free to ask more questions – it does help me to try to explain things informally!
The blog tends to be about actual results, rather than promising things. I go back and forth a lot about various things and don’t necessarily put it on the blog.
With OPy I had some limited success, such as compiling all of OSH to bytecode, Oheap 2, and a few other things. That work wasn’t useless because “hollowing out the Python interpreter” reduced CPython dependencies and made the translation more feasible. I probably only “wasted” 1 month of work, but I don’t consider it waste because I learned a lot!