Is OCaml using some kind of incremental compilation
Ocaml itself is not concerned with how you compile it. I use makefiles which means I only recompile the subtree that depends on the changes. Most other ocaml build tools do this as well, but this is a separate concern of the actual language.
About execution performance: I read the source code of each benchmark in Go and OCaml. The Go version is generally shorter, which is unexpected considering OCaml emphasis on expressiveness. Do you know why?
Sorry, no, I haven’t read the code on the language shootout. I use the link more of an indication that one can acheive roughly equivalent performance even with a more expressive type system like Ocaml. I don’t actually think the language shootout is very good for much else, in a large system a lot of these micro benchmarks are drowned out by real-world issues.
So in reality, Go is probably slightly faster than Ocaml in a lot of cases, however when it matters Ocaml let’s you write fast code.
This is a weird argument. It’s like saying “Interpreted languages are slow but it’s because we’ve not given serious attention to optimization”.
I’m saying something differently though. Compilers like mlton demonstrate you can compile an ML to be highly optimized and efficient. Nothing in the semantics of the language prohibit this. This simply has not been a priority for Ocaml since, for the most part, it’s fast enough. So the work has been done for Ocaml’s twin, it’s just not worth it yet. Which is different than saying that one thinks a problem is solvable if they work hard enough.
What about the strategy used by the compiler(s) to implement generics?
The way this works in Ocaml is, in the worst case, you use some extra space to represent your data with a header and the actual value. So, boxing. Again, as a low-bar, one can see that the memory consumption between Go and Ocaml are roughly equivalent in the language shootout. In my experience the memory consumption of an Ocaml program is much smaller than equivalent Java, Python, or Ruby apps. C is of course smaller and I cannot speak to C++. Where possible the compiler can optimize the boxes away, for example if all of the computation happens inside the function, values will be put into registers sometimes. The Ocaml compiler does not do any monomorphic optimizations, AFAIK. Yes, mlton monomorphises everything, which is one reason it’s so expensive.
BTW, do you know how Go implements interfaces btw? I would imagine it requires boxing as a place to put the data + vtable. Is this correct?
Seeing as it’s been written in 2007, I’m not sure if the author’s ideas have changed, but a few things to consider:
Python and Erlang get immediate boosts for having been used in large commercial projects
Erlang is interesting here because it (mostly) literally is a $100m bet on a pet programming language. The legend is that it was developed in Ericsson’s CSLab and ended up beating out the C++ software. A physical product that governments buy and has to run for decades was built with it.
I’d become very open to writing key parts of an application in C, because that puts the final say on overall data sizes back in my control, instead finding out much later that the language system designer made choices about tagging and alignment and garbage collection that are at odds with my end goals.
It’d be great to know if the author still agrees with this and what “key parts” means. I believe it’s important that one pick a language that interops with C easily, but writing in C is unsafe with marginal benefits at this point in time. It depends on the problem but there are really very few problems that are solved better in C, these days.
The problem is that “float” in OCaml means “double.” In C it would be a snap to switch from the 64-bit double type to single precision 32-bit floats, instantly saving hundreds of megabytes.
This scenario is presented as though this is free. C comes with a whole host of other problems that will just be solved for you in something like Ocaml. On top of that, Ocaml does have pretty good FFI into C, so you can always make an opaque type to do this bit of work, protected in a safe Ocaml shell. So one doesn’t have to change the Ocaml compiler really, they can just use C for the portion that needs C and be safe otherwise. I honeslty don’t remember what programming in 2007 was like (it all blurs together) so I don’t know where this attitude was at that point.
To defend Ocaml, my language of choice, it has a great property where the translation between the Ocaml code and the machine code is fairly understandable, meaning you can understand what your program is doing at runtime. Haskell, for all its strengths, is much less easy to understand, IMO. This gives you a lot of the value peopel want out of C, understanding what happens at runtime.
Libraries are much more important than core language features.
This hasn’t completely matched my experiences. I have worked on large projects in Erlang, Python and Java. And my personal work is in Ocaml. Java, by far, has the most library and tooling around it and it has been the most difficult, for me, to complete work in. Despite having to write a lot of libraries and tooling in Erlang, the features of the language make it an incredibly productive language to work in. It’s also a very safe language to work in, both in terms of memory safety and the process isolation at the language level makes it easy to isolate crappy libraries from the rest of the program. Most of the libraries are very light and non-intrusive. Ocaml, I have found, I am very productive in. It does not have the isolation that Erlang gives, but I have found that if a library doesn’t exist in Ocaml, writing it is really not that difficult. It depends, I mostly do backend things which don’t really involve interacting with the OS in a sophisticated way like a GUI toolkit, so that helps. But accomplishing the same work in Java has been very difficult for me, often getting stuck solving way a library does not work rather than solving the problem I actually have.
This was a great post, as usual. A few thoughts I had while reading it:
null is still something that can be expressed even when it makes no sense. I guess this is from the C# roots.Result and Exceptions part reminded me a lot of what you get in Haskell or Ocaml with Either or Result + some monadic combinators. I guess there is also a line to draw between usability and “correctness”, but it always bums me out to see languages only let one express a subset of monadic expressions well. Why special case it? I don’t get it, but I’m rather happy with code that reads like foo () >>= fun x -> do_something.But to talk up Ocaml:
Ocaml has, IME, the best error handling model of any language I have seriously used. What is so great about it is it lets you express exception-like things in a type safe way using return values. This comes from polymorphic variants, which I’m not sure of another language that has them. But roughly how it comes out is you can have code like: x >>= y >>= z and with polymorphic variants the type of that expression will be the union of every polymorphic variant that those values have. This makes it possible to compose functions that use return values but aren’t related to each other. It’s fantastic and incredibly powerful. I have heard some Ocamlers claim that they ran into issues with this, understanding the error messages with polymorphic variants can be tough, but I have never experienced this particular issue, I’ve had positive experiences with it.
Once one has this setup going, adding new error cases that need to be handled is as simple as adding it to the return type of a function and the type spreads through the code like a wild fire and you can fix all callsites (assuming nobody is throwing out any return values) to handle the error. This is why I’ve been using Ocaml for all of the distributed systems work I can. The concurrency is fairly poor but the error handling is so great I’m willing to make that sacrifice.
I see the point of your post: all language implementations can benefit from more effort poured into them, and indeed it’s great that Pierre Chambart (OCamlPro), Mark Shinwell and Leo White (Jane Street) could pour this work into a new inlining pass.
I still find your three points a bit frustrating.
I don’t think anybody suggested that having a strict language negates the need for inlining (see the aggressive inlining work poured into C/C++ compilers). What is true is that a lazy language is slower without optimization than a strict one (because call-by-need necessarily implies more bookkeeping), and that GHC thus has to rely on optimizations to be competitive performance-wise with other compiled languages, while OCaml implementations can do without a refined optimizer: good data representation choices and a fast runtime suffice to get most idiomatic programs within an acceptable factor of C (or your other language of reference).
There is a trade-off between inlining and separate compilation, but it was already present before the flambda work – the native compiler has always done cross-module inlining and optimizations that would require more recompilation than in a purely separate compilation setup. In the last released version of OCaml (4.02, August 2014), I added an -opaque flag that can force the compiler to export no optimization information for a module, thus ensuring its compilation is completely separate from its dependencies. This helps for some workflows – typically short edit-compile-test cycles.
Indeed, aggressive inlining has plenty of downsides. It makes the code performance harder to reason about (but people are working on that, providing annotations to make sure that the compiler would warn if inlining did not happen, etc.), it also makes the compiler sensibly slower, and gives it a more complex tuning interface that is harder to use. This blog post is about celebrating the landing in the upstream compiler codebase of a several man-years effect to develop this inlining pass, so it is, quite understandably, not discussing the downsides much, but that does not mean they do not exist. Of course, I expect all of this to be improved in future iterations.
The general point about the OCaml compiler is that the compiler has an excellent performance-to-effort ratio. It is relatively simple, and it uses the 80/20 principle so that it implements the few main optimizations that really matter for most codebases, and in practice this works really well. Using last released version (4.02), the compiler source code is less 350 kilo-lines of code, and it bootstraps and builds in 1m14s on my machine (a few years old laptop). The compilation times for typical OCaml projects are excellent. GHC is a wonderful piece of work, I am amazed at how friendly and active its development community is, and the support it provided for evolving the Haskell language is humbling, but it is a much more complex compiler and fares sensibly worse on all those metrics (the same point would apply to many other programming languages).
OCaml is competitive in performance with other implementations of ML-like languages, such as SML/NJ, MLton, or GHC. (The language benchmark site has been revamped, but the OCaml page compares its performance results with GHC and it is more than competitive.) If the programs written in OCaml in practice are faster than the same programs ported and compiled under MLton, maybe that’s a sign that the strength of your optimizer or compiler backend is not all there is to language performance?
To my knowledge there is no paper describing flambda available. The sources are actually rather readable, so you may want to give them a try.
As described above, the ocaml-tls library forms the core… The interface between OCaml and C is defined by another library…
The scope grows quite fast here doesn’t it? C interop plus a ‘pure ocaml’ TLS implementation that is not as widely tested and additionaly calls into C code itself as stated here:
For arbitrary precision integers needed in asymmetric cryptography, we rely on zarith, which wraps libgmp. As underlying byte array structure we use cstruct (which uses OCaml Bigarray as storage).
From the PDF:
Since OCaml programs do not manipulate addresses, collection and compaction are not generally visible to a program.
I disagree. Garbage collection takes time, an attacker can observe timings & patterns in the application based on the GC impact.
I am personally disappointed that the researchers didn’t try to evaluate possible timing attacks or attacks on OCaml’s runtime itself. Considering that just recently Pornhub bug bounty yielded 20k and it consisted of:
- We have found two use-after-free vulnerabilities in PHP’s garbage collection algorithm.
- Those vulnerabilities were remotely exploitable over PHP’s unserialize function.
OCaml might shield you from buffer overflows, manual memory management etc. I don’t think though that you can simply ignore the amount of code that is involved in achieving the goal and various possible side channel attacks on it.
OCaml is a nice language. The problem with that anecdote is it compares a really good static typed language (OCaml) with a really bad dynamic language (Python).
[The study I cited] “Basically measures speed without quality.”
True there are no clear citations of defects. But you are reading into that differently than I am. My assumption is if a language is cited as implementing a function point in N time units then that implies the function point is correct as implemented in that time. What would be the point of stating a function point was implemented in N time units but full of bugs? I would give the study the benefit of the doubt but admit the gap. These kinds of studies are always full of questions, which is why I’m interested in any you have available.
“ACL2 is a heavily constrained version of it because dynamic LISP couldn’t be verified so easily.”
Well, right, because one approach is to add some static tools above a dynamic language. If the argument was Lisp or Smalltalk could statically check exactly as OCaml does then you’d be right. But if the argument is there are large categories of applications where simple dynamic languages can be augmented with certain practices and tools, including some optional static checking to great effect, then you’d not be right.
“You supported what Smalltalk can do by using a 3rd party tool whose DSL (ROOM) uses static, strong types that drive the checking and code generation process.”
You are continuing to refer to a specific, second implementation of ROOM that has nothing to do with the original implementation. The original implementation does not use that DSL. The implementation of ObjecTime is almost entirely in Smalltalk. The executable models consist of those built by the tool structurally (graphically) augmented by code written in “any supported” programming language… almost always C but also C++ or Smalltalk.
“I disagree on the last part given the Github issues and CVE’s indicate the average app needs better built-in support for correct applications.”
I for one certainly do not care a zip about the average GitHub project. And I would remain skeptical that OCaml or Haskell or Idris or whatever may come along in the next 10 years would do much of anything for the average GitHub project. A type checked mess is still a mess.
“Logical next step is redoing the same thing with modern Smalltalk and Ocaml (or something similar).”
I agree. As I wrote above, I like OCaml, and would certainly not be unhappy if it turned out to be the best choice for some project I participated in. If you read everything I’ve written under this headline artical, I’ve never claimed the superiority of a dynamic language over a static typed language. I’ve suggested the evidence does not clearly favor one over the other. Good simple languages appeal to me. I’ve used Smalltalk and Lisp a lot and know they fit the bill for the kinds of systems I’ve worked on for 35 years. I’ve done enough OCaml and other ML’s to suspect they largely would meet my needs. Haskell’s another story, personally. I used Haskell 98-era a good bit, but I’m just not interested in the last 10+ years of what’s been going on there.
I appreciate what you are saying about Ada, OCaml, etc. and the benefits of static typing. I am disagreeing with how you are positioning dynamic languages requiring more effort to test, etc. due to lack of type checking. I’ve just not experienced that, but I understand a lot of people have not had the same experiences I’ve had with good languages, tools, and practices. But neither am I convinced those same programmers will end up with significantly better programming effectiveness in a good static language. You think they will, I think they won’t. I’m not sure we’re going to resolve those differences, or need to.
After using ASDL a lot, I definitely wondered if I should have written it in OCaml. In 2013, I did another parsing project in Python, and I actually resolved to write future projects like that in OCaml.
But there are a couple problems with OCaml. I would say it’s a meta-language for semantics and not syntax. ANTLR/yacc are meta-languages for syntax. In particular:
The second problem is that OCaml is not a good meta-language for itself. I talked about that in “Type Checking vs. Metaprogramming” [1].
So yes I’m daydreaming about a meta-language that’s a cross between ML/Lisp/Yacc/ANTLR – some details here [2]. But it is toward a somewhat practical end. I actually have to implement multiple languages (OSH, Oil, awk/make). If it can implement those languages, I’ll call it done.
Now, I’m not necessarily saying that OCaml would be worse than Python. I certainly worked around a lot of Python’s quirks, no doubt. But they both have flaws for this application, and I chose the one I was more familiar with and which had a bigger toolset.
“I’m always interested in reading good studies if you have 2-3 at hand.”
I just reinstalled my system due to a crash a few days ago. I’m still getting it organized. Maybe I’ll have the links in a future discussion. Meanwhile, I found quite a few of them Googling terms like defects, Ada, study, programming languages, comparison. Definitely include Ada, Ocaml, Haskell, ATS, or Rust given they use real power of strong, static typing. C, Java, etc do not. It’s mostly just a burden on developers that assists the compiler in those languages. It’s why study results rarely find a benefit: they’re not that beneficial. ;)
Thomas Leonard’s study of trying to rewrite his tool in many languages is actually illustrative of some of the benefits of static typing. He was a Python programmer who ended up on Ocaml due to type system’s benefits plus performance gain. On the page below, in Type Checking, he gives many examples of where the static typing caught problems he’d have had to think up a unit test for in Python:
http://roscidus.com/blog/blog/2014/02/13/ocaml-what-you-gain/
He doesn’t even have to think about these in the static language. Just define his stuff in a typical way allowed by the language. It knocks out all kinds of problems invisibly from there. That he was an amateur following basic tutorials without much due diligence, but still got the results, supports benefit the static types had. The ML’s also are close to dynamic languages in length due to type inference and syntactic sugar.
“Here’s at least one study in the following link that puts Smalltalk ahead of Ada, C++, C, PL/I, Pascal, etc. on a significant sized project.”
It actually doesn’t since it leaves off data on defect introduction, time for removal, and residual ones. Basically measures speed without quality. In those studies, including some on stijlist’s link, the LISPers have larger productivity boost over C++ than Smalltalk does here. I ignore LISP over strong, static types for same reason: few comparisons on defect introduction & removal during development or maintenance. The studies I saw on Ada showed it way ahead of other 3GL’s on that. Dynamic languages usually weren’t in the defect comparisons of the time, though.
In yours, the Smalltalk program gets developed the fastest (expected) with unknown quality. The Ada95 program… whose compiler catches all kinds of algorithmic and interface errors by default… cost 75.17 person-months over Smalltalk but 24.1 faster than C++. So, it looks good in this study compared to flagship imperative and dynamic languages of the time given it’s a straight-jacket language. Smalltalk’s design advantages mean Ada can’t ever catch up in productivity. Logical next step is redoing the same thing with modern Smalltalk and Ocaml (or something similar). As link above shows, Ocaml has strong-typing with benefits close to Ada but conciseness and productivity more like Python. That would be a nice test against Smalltalk. Actually, the better tooling means Ocaml would be handicapped against Smalltalk. So, maybe not as nice a test but any comparable or winning result on Ocaml’s side would mean more if handicapped, eh?
“Many of those kinds of problems are covered by tests that would be needed in either case. ”
You don’t need the test if the type system eliminates the possibility of it ever happening. Java programmers aren’t testing memory safety constantly. Modula-3 programmers didn’t worry about array, stack, function type, or linker errors. Rust programs that compile with linear types don’t have dangling pointers. Eiffel and Rust are immune to data races. Yeah, go test out those faster than Eiffel programmers prevent them. ;) Annotation method in Design-by-Contract documents requirements in code, catches obvious errors at compile time quickly, supports static analyzers, and can be used to generate unit tests that provably test specs. As code piles up, the kinds of things you have to test go down vs dynamic stuff where you have to hand test about everything. Good annotations, which cover every case in a range, mean the large programs of 100,000+ lines simply can’t compile with the interface errors. Imagine… although you’ve probably done it… trying to code up tests representing every way a behemoth’s software might interact incorrectly plus the runtime they impose after each change.
So, part of the argument is reducing maintenance by balancing QA requirements across static types, annotations, static analysis, and tests. Ensure each property holds by using whichever of them is easiest to do, maintain, and apply to maximum number of execution traces or states.
“I have doubts that a good type system will make good programmers out of bad ones, just as OOP and dynamic typed FP has not. Bad programs can be type checked.”
I’m sure the average Rust programmer will tell you a different story about how many memory errors they experience vs coding in C. They don’t test for them: it’s simply impossible to compile code that has them in safe mode. Same with SafeD and SafeHaskell. Common cases in Ada and the Wirth languages also have checks inserted automatically. If it’s common and should fail-safe, you shouldn’t have to think about it. Pointers, arrays, strings, function arguments, handles for OS resources, proper linking… these things come to mind. Strong, static typing lets you just use them correctly without thinking past the initial definitions. No need to even write tests for that.
“That’s a second, later implementation of the same ROOM methodology. ObjecTime is decades older, implemented in Smalltalk-80.”
I know. My point didn’t change. You supported what Smalltalk can do by using a 3rd party tool whose DSL (ROOM) uses static, strong types that drive the checking and code generation process. The tool supports my argument. Imagine if it let you assign to various variables interfaces, ports, states, and so on without checking or labeling them. I imagine correctness would involve more thought by developers than the constrained process I saw that prevented classes of problems by design through variable types and enforced structuring. Did the ObjectTime not have labels representing types, restrictions on what variables could contain, and similar structure to force sane expression of FSM’s, etc?
“The closest I can think of off the top of my head is ACL2”
I knew you’d go for that. I made same mistake on Hacker News where ACL2 users grilled me. They pointed out that, despite starting with LISP, ACL2 is a heavily constrained version of it because dynamic LISP couldn’t be verified so easily. Further, if you check docs, you’ll see it has all kinds of static types that are used during development. Leave them off and it won’t do much for you at all. Hard to classify ACL2 but it’s more like a statically-typed LISP than a dynamic one. .
“but that they can be at least as good of a choice in most cases. Very few software systems need this level of verification.”
I agree that they can be used and used well. I disagree on the last part given the Github issues and CVE’s indicate the average app needs better built-in support for correct applications. The weak, static languages and dynamic ones seem to catch fewer problems in small and large apps. This results in much time wasted figuring out why or residual defects because they don’t want to invest time in figuring it out. So, even average developer needs a good baseline in their language and tooling. Code away in the dynamic languages if you want but strong, static ones catch more problems you’ll run into. If you’re disciplined and experienced, you might prevent the same ones with you’re strategies of testing and designing safer constructions. Evidence indicates most won’t go that far, though, due to less discipline or experience. Plus, different, average developers rolling their own way of doing basic checks instead of built-in often leads to nightmares in debugging or integrations.
OCaml has pretty awesome tooling for writing compilers and other language processors IMO. When you are creating a compiler, you want a good lexer generator and parser generator. OCaml comes with a lexer and parser generator included, called ocamllex and ocamlyacc, respectively. For this project I used menhir instead of ocamlyacc because it has better warnings and error messages. The syntax that menhir uses is compatible with ocamlyacc though, so you don’t have to change any code if you want to switch.
Another benefit of using OCaml for compilers is that you have algebraic data types, which are very good for creating intermediate representations such as an abstract syntax tree (or in the case of my assembler, a list of instructions). OCaml also has pattern matching, which lets you describe traversals of your intermediate representation very easily.
If you are interested in writing a compiler or interpreter, I highly recommend learning OCaml. However, most other languages have similar tools. For C, there is the original lex and yacc programs. For Python there is PLY (Python Lex/Yacc). I chose OCaml because I’m most familiar with writing compilers in OCaml since that’s what I learned in school.
I’ll throw my hat in for Haskell since you’ve gotten mostly “OCaml then Haskell”, “OCaml”, and “Both” answers.
Learn Haskell. It can be easier and less noisy than OCaml too. The ability to do ad-hoc polymorphism with typeclasses in a way that is more conventionally what people want out of polymorphism without giving up type-safety is pretty nice. If your goal is to learn new things, the overlap-but-different bits between Haskell and OCaml won’t do a lot of good either. Haskell’s built on a simpler core semantically as well.
tl;dr I got tired of writing .mli files, I prefer my types to be next to my code, and I found modules tedious for everyday work. YMMV.
You won’t harm yourself by learning both, but you’ve got to make a choice of where to start. I think there’s a lot to be said for learning Haskell first, hitting some limitations of typeclasses, then playing with modules in OCaml or SML.
Disclosure: I am quite biased, but this is stuff I actually use in my day to day, not night-time kicking around of pet projects. Haskell is my 9-5 and I wouldn’t replace it with OCaml. That said, if Haskell disappears tomorrow the first thing I am doing is writing a Haskell compiler in OCaml and I can’t say I wouldn’t enjoy it :)
So the easiest one of your criteria to game is ‘compilation time’ because acceptable compilation time is subjective. The language that fits for me is Ocaml. I find Ocaml compilation times quite acceptable. They are longer than Go’s but still better than C++ (IME). Ocaml has a powerful type system and, IMO, the language specification is not too complicated. I learned most of my Ocaml from reading the specification rather than a tutorial.
In terms of executable performance, Ocaml is on par with Go based on the Language Shootout, despite supporting a richer set of data structure abstractions. Of course this benchmark comes with all of the well known disclaimers of benchmarking. And one should note: the Ocaml compiler has not been given serious attention when it comes to optimization, so its performance can be improved.
http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=all&lang=go&lang2=ocaml&data=u64q
There are other ways to game the system as well. mlton is often cited as producing extremely performant code at the cost of compilation time. But what SML developers do is develop using a faster compiler, such as SML/NJ, which has quick compilation times, and then build final releases in mlton. Since releases are built on the order of a few times, generally, the cost of mlton is minimal to the overall process. Of course that comes at the cost of a more complicated ecosystem (two compilers) but it’s possible to do something like clang or gcc and adjust the level of optimizations at compile time depending on if one wants faster compilation times or executables.
I think these points are totally spot on. I learned Haskell first then learned OCaml for a university compilers course. OCaml was trivial to learn if you’ve already learned Haskell. That sentiment is expressed in the introduction to Haskell for OCaml programmers. The OCaml for Haskell programmers really only covers syntactic differences.
Haskell is not really much more difficult to learn than OCaml, I estimate, but I don’t really know because I learned Haskell first.
I have long been aware of C# as a strictly-better Java (imo), but I only recently learned of F# which appears to be an OCaml running on .NET if I understand properly.
I must say, I keep seeing very cool things for F# that really interest me.
Actually, I think this may just be OCaml that interests me as I have very little experience with OCaml but I keep seeing very cool projects crop up in various dialects of it.
Can someone with more experience with OCaml chime in and offer some opinions/articles on a few things for me?
I’m an Ocaml user and, except for a few rare conditions, I’ve found I much prefer a result type to exceptions. My response will be based on Ocaml which may not be the same as F# so if they don’t apply there then ignore it.
Some points I disagree with the author on:
AN ISSUE OF RUNTIMEI didn’t really understand the example here. How is the author accessing an optional value? In Ocaml we have to use an accessor that would throw an exception if the value is not present or pattern match the value out. This doesn’t seem to have anything to do with exceptions or results, just an invalid usage of an option.
AN AWKWARD RECONCILIATIONThis is the case in Ocaml as well, which is why many libraries try to make exceptions never escape the API boundary. But writing combinators for this are really quite easy. A function like (unit -> 'a) -> ('a, exn) result is available in all the various standard libraries for Ocaml.
The author should be using the standard applicative or monadic infix combinators. Maybe F# doesn’t allow that. In Ocaml the example would look like:
let combine x y z =
pure (fun x y z -> (x, y, z)) <*> x <*> y <*> z
WHERE’S MY STACKTRACE?
This is the one I disagree with quite a bit. If I am using exceptions then yes, I want stacktraces, because it’s a nearly unbounded GOTO. But the value result types give me is that I know, using the types, what errors a function can have and I have to handle it. This makes stacktraces much less valuable and the win of knowing what errors are possible and being forced to handle them. I’d much rather have this than stacktraces.
THE PROBLEM WITH IOThe problem here doesn’t have anything to do with exceptions, it’s that the return type should be a result where the Error case is a variant of the various ways it can fail. Ocaml makes this much much easier because it has polymorphic variants.
Yeah, use a variant not a string.
INTEROP ISSUESThis can indeed be a problem. It’s also a problem with exceptions, though.
I’m reviewing/grading proposals submitted to the OCaml Workshop and the ML Workshop (ML is a family of programming languages including OCaml, ML, F# and, depending on the context, may even encompass aspects of Haskell, Rust and Scala), which are going to happen consecutively on September 7th-8th in Oxford, UK, colocated with the International Conference on Functional Programming. The OCaml workshop is more application-oriented (cool new libraries or tools for the OCaml ecosystem), while the ML workshop is more theory-oriented (it’s about research ideas that would apply to several different programming languages). Both have incredibly exciting submissions this year, and I’m already eager to actually see the talks.
Thanks, we’re moving toward a constructive discussion :)
About compilation time: Is OCaml using some kind of incremental compilation (recompiling only the parts/modules that have been changed) or does it recompile everything always? And what is the usual build time for the programs you work on?
About language specification: I agree that OCaml authors have found a good balance between language expressiveness and language simplicity. The spec seems quite readable and not too long.
About execution performance: I read the source code of each benchmark in Go and OCaml. The Go version is generally shorter, which is unexpected considering OCaml emphasis on expressiveness. Do you know why?
I also noticed some code duplications that I don’t understand in spectralnorm (eval_A_times_u and eval_At_times_u) and knucleotide (write_frequencies15 and write_frequencies16, write_count15 and write_count16). These are typically the kind of duplications that I don’t expect in a language which supports parametric polymorphism and algebraic data types. Is it for optimization purposes?
And one should note: the Ocaml compiler has not been given serious attention when it comes to optimization, so its performance can be improved.
This is a weird argument. It’s like saying “Interpreted languages are slow but it’s because we’ve not given serious attention to optimization”. The truth is this is harder to optimize most interpreted languages. The cost of development of a runtime like V8 is very very high. It’s like saying “C++ templates compilation is slow but it’s because we’ve not given serious attention to it”. The truth is this is harder to compile quickly for structural reason. OCaml gives more expressiveness and safety to the programmer, and this is good, but it also complicates things on the compiler-side and there is a cost there that must be taken into account.
mlton is often cited as producing extremely performant code at the cost of compilation time.
I agree that having different compilers for development and building the production binary is an interesting strategy.
What about the strategy used by the compiler(s) to implement generics? Do you know about this? Does the compiler generates one polymorphic version of the code or many specialized versions for each type (monomorphization)? I think that MLton uses the latter strategy.
Thanks for the link! It is an excellent explanation of OCaml memory representation.
You are right about the representation of a value in OCaml being similar to the representation of an interface value in Go (even with some significant semantic and operational differences). It follows they should have similar performance characteristics.
I think that using this approach for everything is ok in OCaml considering the goals and the use cases of the language. It simplifies a lot the implementation because the compiler does not need to generate a specialized version of each function for each possible type. It just needs to generate an efficient polymorphic version of the code and it is done.
But this approach would be difficult to reconcile with the goals of Go. Go is positioned as a systems programming language (to program application servers, not operating systems). Using boxing almost everywhere is not an option here. This is the reason why interfaces are not the default mechanism for everything. This is also the reason why generics are a little bit more difficult to implement than in OCaml.
The runtime cost is related to pointer indirections and having to allocate/free memory on the heap for each value. Here is an interesting presentation of Microsoft about this subject in C++: http://view.officeapps.live.com/op/view.aspx?src=http%3a%2f%2fvideo.ch9.ms%2fsessions%2fbuild%2f2014%2f2-661.pptx
Java is well-known for having a problem with this. For illustration, here is an excerpt of what the Disruptor project says about it [1]:
There is a proposal by John Rose to introduce “value types” to the Java language which would allow arrays of tuples, like other languages such as C, and so ensure that memory would be allocated contiguously and avoid the pointer indirection.
[1] See section 4.1 at http://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf
To conclude, I think that Go will have some form of generics one day. I think it will not be based on some kind of monomorphization because it would increase compilation time a lot, which is one of Go’s major goals. So we’re left with the compiler generating polymorphic code like OCaml. For generic containers, which generally don’t need to know anything about the element type behavior, except its size and sometimes an equality operator, we don’t need interfaces and the performance can stay high (it looks like built-in arrays/slices/maps/chanels work like this according to a quick read of the runtime code). But for other purposes, generics will probably depend on interfaces and have the same performance characteristics than in OCaml. There is no silver bullet :)
Edited: removed a useless note about MLton.
it’s a small crossword editor that i’ve been using to explore various combinations of language and toolkit to see which one is nicest to develop and maintain a desktop application in. so far i’ve prototyped it in ruby/shoes (too slow), chicken/iup (not complete enough at the time, though there are now a great set of iup bindings), clojure/swing (got a good long way, but i got sick of both swing and the jvm) and common lisp/eql (eql is nice but inactive, and i’m not a huge fan of common lisp). i’d like to give some more static language a try, and i wanted to use qt if possible.
i’ve not really chosen ocaml insofar as i haven’t started (re)writing the app yet, but the frontrunners this time around were d, ocaml and f#, and i’ve already been using ocaml for a bunch of other stuff. what finally decided me against d was that qtd does not seem very well documented, also no one seems to be using it. qt/ocaml isn’t the most mature project either, but at least i know there are solid gtk bindings to fall back on, and i already know ocaml (though i’ve not done much gui code in it)
edit:
yes, i was clearly biased towards ocaml because i already knew the language, but
i really think a good gui development experience could be a killer application for d, especially if you have the option to end up with a single, static binary (something go did beautifully).
Coming from Erlang, getting used to either one will require a bit of mental rewiring. Haskell more than OCaml, since with OCaml you’re allowed to throw printfs (not literally) around. OCaml’s type system is not as expressive than Haskell’s but it is certainly very advanced in its own right and it probably is easier to learn.
Fundamentally, both belong to the ML family, so learning either one first will help in learning the other. If I could choose I would learn Haskell first, because it is stricter in the type system sense. Moving to OCaml will feel like loosening a belt, so to speak. That said, OCaml isn’t a simpler language by any means: it has a powerful OOP system, module functors (think parametrized static classes) and a great record system. Recent versions added GADTs. Concurrent and parallel programming on OCaml is a sadder story, but work is being done on it.
I’ve used both OCaml and F# in anger, in production, and I’m honestly mediocre at both, but I can at least partially answer your questions.
F# is an ML, but not really an OCaml variant in any meaningful sense. Yes, it’s true that early versions of F# aimed to be syntax-compatible with OCaml, but that was always a bit of a stretch. Even if you ignore the syntactical differences, F# and OCaml have meaningfully different semantics. F# has .NET-style classes, OCaml has its own object system (which AFAICT no one uses) and its own take on the whole concept with a really neat module system. F# has workflows, OCaml doesn’t have something quite equivalent. F# has units of measure, OCaml lacks an equivalent. OCaml has abstract data types, F# does not, but F# has operator overloading and reified generics, which OCaml does not. And so on. In nearly all respects, OCaml has vastly more in common with SML/NJ than it does with F#.
The runtimes are also very different. By virtue of running on the CLR, F# has real, full-blown multithreading with a rich async system. In contrast, OCaml has a stop-the-world GC that can feel quite a bit like the Python GIL in practice. On the other hand, OCaml compiles directly to native binaries and has a simple, very easy-to-understand runtime that I feel confident I could explain to you in about ten minutes, whereas .NET in most situations is still really a VM-like affair, and requires you understand a vastly more complicate runtime situation. (Yes, .NET Core changes at least the native compilation aspect, but that cuts out a lot of libraries most .NET devs want to use today.) And again, because F# runs on the CLR, you’ve got access to the .NET ecosystem, whereas OCaml is limited to a mixture of the OCaml ecosystem (which is significantly smaller), or calling out to C libraries (which has gotten very pleasant, based on my experience, but is still more overhead than just consuming C# libs from F#).
I personally found my experience with F# more pleasant than OCaml due to a mixture of superficial issues (e.g. lighter syntax, better IDE support) and what I’d call “real” issues (the multithreading support and library situation), but, again, I don’t consider myself an expert in either language based on my limited experience. And I cannot possibly compare this to Haskell or Agda. But I hope this helps answer your question at least a bit.