I’ve used both in significant anger and Haskell in production. Both are good next steps and both will take some serious getting used to over Erlang. Your personal goals will dictate the right next moves, so it’s worth noting that Haskell will introduce you much more forcefully to ideas of “purity” which can be very useful. OCaml is made somewhat simpler to understand by avoiding this issue. OCaml will introduce you instead to the idea of “modularity"—which is not precisely what you might think it is based on prior experience—and I’d say you should take a detour though learning that concept regardless of which one you end up investing more time into.
If your goals are more project-oriented then either can work, but Haskell’s library ecosystem is vastly better developed today. Haskell in production is tricky due to laziness—you will have to learn new techniques to deal with it. OCaml I’ve not used in production, but the challenges there seem more likely to be around getting the right initial momentum as you may have to write some significant portion of library code and work around a bit of ecosystem incompatibility.
Speaking of ecosystem again, OCaml has a standard library problem in that its genuine standard library is often not what the doctor ordered and there’s a choice of “extended” standard libraries. This choice sometimes feels a little forced which is slightly ironic due to the aforementioned modularity goals of OCaml, but c'est la vie.
The Haskell community has issues or benefits in that you will undoubtedly run into someone sooner or later who will happily try to explain the solution to a problem as a mere application of recognizing your algorithm as being a catamorphism over an implicit data type or will suggest refactoring your effect stack according to mtl or a Free monad. It might even be me that does this. Jargon aside there’s some significant insight that Haskell can be a gateway drug to. On the other hand, this can be the absolute last thing you want to hear when trying to get your project working. Mileage varies here.
The OCaml community has issues in that it’s small and sometimes speaks a lot of French.
Both communities are great in that they’re full of passionate and damn smart folks, though.
Menhir is still a LALR(1) generator as far as I know, and that algorithm doesn’t work with shell at all. I always see people recommending parser combinators, but I can literally think of zero production quality languages that use them. And I’ve looked at perhaps 30 “real” parsers.
As far as I can tell, OCaml, SML, and Haskell all have the same thing going for them – algebraic data types and pattern matching. I don’t know of any non-toy or non-academic programs written in SML. OCaml seems to be what you use if you need to write something production quality. Between OCaml and Haskell, OCaml actually has a Unix tradition and has better tools as far as I can tell. Supposedly Haskell is not efficient with strings.
I claimed that C is better than OCaml for writing hand-written lexers. I explicitly didn’t say that Python is better – I said the opposite. OCaml would probably be the #2 choice. The first post describes how I ended up with Python:
http://www.oilshell.org/blog/2016/10/10.html
I started with 3000 lines of C++ and realized I would never finish at that rate. The prototype became production… but I think that will end up being a good thing.
Managing mutability explicitly provides a lot of value. Carmack is not exactly an academic with no real-world game programming experience. My sense is that even the mainstream OO world is pushing towards immutable-by-default.
The “how do we get to there from here” problem exists for any new language. Is the author claiming that OCaml has a better migration story, or a better C FFI, than Haskell? Because that’s not my impression. (OTOH, the fact that Scala has great Java interoperability is a big factor in why I’ve ended up using it so much more than Haskell).
Reasoning that you hear less about OCaml because it’s used for more useful projects is spurious. Haskell is more popular than OCaml in academia, sure. That doesn’t imply OCaml is more popular than Haskell in industry; that may be true, but I’d want to see more evidence. Particularly if you’re looking at finance (where many of the purer functional languages are more popular), there are systems providing millions or billions of dollars of real-world value that no-one outside that particular organization has heard about. Likewise arguing that more libraries = worse language is very dubious.
Parsec is a different model from flex/yacc-like parsing, but I found parsec style infinitely easier to work with than yacc style. Certainly it’s popular enough that people have found it worthwhile to write parsec-style parser libraries for other languages. Maybe you find yacc-style parsing easier to work with; in that case find a yacc-style parser library for Haskell (I’m sure there will be several). But there’s no general point here (or rather, if you’re arguing that yacc-style parsing is inherently superior and anyone who prefers parsec-style is an idiot, then I must vehemently disagree); parsec is not a low-quality library by any means, it’s just one with a particular model that you might be unused to.
I would bet that pandoc has more real-world users than MLDonkey does today, and probably than it did ever. As someone who actually used MLDonkey, it was a dog; the UX was horrible (the best frontends weren’t even written in ML), the plugins were extremely variable in quality, and it crashed whenever you did something unexpected. The whole project felt like a proof-of-concept for “OCaml can do stuff, honest!”, not a working system (in contrast to pandoc which feels like an ordinary unix tool that you wouldn’t even notice was written in Haskell if you weren’t looking).
I refuse to believe that modules are so much better than typeclasses for programming in the large: if they are, why haven’t they crossed over into any more mainstream language? We’ve seen people making a fuss about pattern matching in Swift et al, or list comprehensions in Python. I have never seen any non-ML language adopting ML-style modules (maybe OSGi is the same thing? But OSGi is not exactly a success story).
There are certainly some bad attitudes involved in Haskell. But if we’re going to talk about tone and the like, this post is worse than most of them. Calling anyone who disagrees with you a shill or bully or delusional and trying to psychoanalyze your opponents is not convincing; you’re making OCaml sound worse than Haskell here.
This was a bit confusing to me. Readability is a very subjective experience. Correctness is much more objective. So it’s very easy to say if a piece of code is correct or not but it’s much harder to say if it’s readable.
Instead of readability, I think a measurement of how much information outside of the code I’m reading is needed to understand it. For example, in Ocaml doing Foo.bar x (this applies a function bar which exists in the module Foo to the value x) is pretty easy to grok as long as you know what Foo is. This is how a lot of Ocaml looks where the names of things are bound at compile time rather than run-time. On the other hand, languages like Ruby, Python, and even Java, do almost all binding at run-time. The above code would probably more likely be written as x.bar() in those languages and it’s near impossible to know what code is actually being executed at run-time without knowing a bunch more context. This is why I find Ocaml much more readable than other languages even if the code people write in it can be quite terse. All I have to do is run that code through some automatic indenting software and it becomes easier to read for me. In languages with late binding, all the automatic indenting in the world won’t tell me what is executed.
Disclaimer on typeclasses:
Haskell is very similar to Ocaml in this regard except for type classes. I don’t have enough experience at scale to know if type classes suffer from the same readability issue that I claim exists for OO languages. Ocaml is likely getting something like type classes in the near future, called modular implicits, and I’m both looking forward to it and nervous that we will enter an age of difficult-to-grok Ocaml code.
Dynamic languages let you write much more direct code that’s easier for a human to read.
No more than any other paradigm, though, right? I don’t find Ocaml code any less readable than Python code. In most cases, the types aren’t even explicitly written in Ocaml so there is line noise or anything getting in the way.
I’ve actually been writing code in Python a bunch lately after spending a lot of time in Ocaml and I’ve noticed that I end up having to look at documentation fairly frequently, whereas in Ocaml I would generally just look at the type of a value instead. But the documentation is really hit or miss, so I’ve even found myself looking at the code for dependencies several times, I’ve only done that a few times in Ocaml.
YMMV, but your blanket claim that dynamic languages let one write code easier to read than statically typed languages doesn’t match my experience.
On top of that, ease of reading isn’t the metric I personally go by. It’s ease of refactoring. I’ve found that dynamic languages take a lot of effort to make refactoring safe, but in a language like Ocaml I have the compiler watching over my shoulder.
No one big, that I’m aware of, has chosen Ocaml for doing HTTP/REST services, so the situation will seem pretty anemic if you’re coming from the Node world. I’m doing my best to address some of the gaps as well as others, but not being paid to write Ocaml means it happens in my spare time. That being said, most of those components do exist. Specifically Async or Lwt, Cohttp, and yojson, and you’d probably want to use ppx_deriving.yojson as well. But I doubt there is any documentation out there that ties all that together for you. But, the reason I use Ocaml is the same reason I use Joyent and run FreeBSD: I believe in the way they solve problems even if it’s behind OS X and Linux and I’m willing to feel the pain a bit for the overall win I believe comes out of it. That being said, I think Ocaml is at the point where the issues you ran into are the type of thing a company could solve pretty quickly if they chose to use it. Mostly it’s picking your stack and writing some templates to bring everything together..
For your exception issue, I’m not entirely sure I grok what you mean. If an exception makes it all the way up to the program entry point it will end the program. And stackframes being recorded depends on a runtime option, by default they are off. I actually avoid exceptions for the most part, choosing instead to use a result monad almost everywhere combined with polymorphic variants. I find it significantly less surprising than exceptions.
I know there is only a small chance of you picking up Ocaml and applying it at work but if you feel the urge to look at it again and run into issues, feel free to direct message me with any questions.
P.S. I’d love to see native dtrace probe support in Ocaml.
I just played with the Ocaml code a bit and got the run time halved. The performance hit comes from using a regex to split the string rather than just splitting it. The Python version uses just string splitting.
I generated a CSV with:
awk 'BEGIN { while (count++<30000000) print rand()","(rand()*10)","(rand()*100) }' > /tmp/data.csv
I moved the Ocaml code over to Core in order to have access to String.split, which just splits on a character instead of a regexp, which is pretty expensive.
The new code is here: https://gist.github.com/orbitz/05afcda28a33f784d2fe
Original:
$ ocamlfind ocamlopt -thread -package core,str -linkpkg -o sum-ocaml foo.ml
$ time ./sum-ocaml data.csv
15000332.2443,150006471.834,1500135734.85
real 1m0.173s
user 0m57.158s
sys 0m2.403s
Modified:
$ time ./sum-ocaml data.csv
15000332.2443,150006471.834,1500135734.85
real 0m26.482s
user 0m25.351s
sys 0m0.912s
And as comparison, the Python version on my system with my file:
$ time python3 foo.py data.csv
15000332.24433928,150006471.83433944,1500135734.846217
real 1m58.709s
user 1m57.410s
sys 0m0.796s
Hello @apy,
Thanks for answering.
I’m surprised the language they created does not make expressing solutions in terms of data structures easier.
I disagree. Go provides a lot of things that makes defining and using data structures easy and efficient by default:
For example, Ocaml and Haskell developers find data structures important to solving problems.
I agree that having parametric polymorphism and sum types in OCaml and Haskell is great in some cases. But it can also makes the language specification complex (for example have a look at the infamous “monomorphism restriction” in the Haskell Report) and, as far as I know, you lose the benefits of the efficient memory representation Go provides by default.
I think you are suffering from selection bias: I’m critical of just about everything, you just happen to read more Go posts or Go posts are more popular right now.
No, I’m not. It happens that I also read the threads related to functional programming, Haskell, OCaml, ML et al, and I don’t see you putting the same energy and repetitiveness in your critic of these threads.
I’ll spend my time how I want even if it confuses you.
Ok, you’re right. How you spend your time is not my concern :) But my concern is you not making me spending my time reading the same unconstructive comment again and again in each thread related to Go. Again, official Go’s FAQ clearly states that the language does not have generics or sum types at this time, and explains why.
And to conclude, what would you think if I would comment every story about OCaml with “This language is just a mess! We are in 2014, concurrency is all the rage, and OCaml still has a global lock.” and every story about Haskell with “This language is an immature mess, full of unresolved issues. Look for example at the monomorphism restriction in the Haskell Report”?
OCaml basically owns the ml tag at the moment. OCaml effectively has its own tag, the question is whether the ML family languages that fall under that tag need their own tag separate of OCaml, not whether OCaml does.
Further, if they do, might it be better to fission them into ocaml and sml since that broadly characterizes the groupings?
Has a notion of tag unions been discussed at all? Perhaps with covariance and contravariance? :P
I don’t think anybody suggested that having a strict language negates the need for inlining
It’s a bit of a strawman when we’re talking about inlining, less so when you talk to OCaml users more generally about perf. I genuinely have had push-back on the importance of inlining for OCaml from users though.
I added an -opaque flag that can force the compiler to export no optimization information for a module, thus ensuring its compilation is completely separate from its dependencies.
We’ve got NOINLINE, type-check only builds, and -O0 for similar purposes.
so it is, quite understandably, not discussing the downsides much
I wasn’t really querying into downsides so much has hard limitations arising from certain patterns in polymorphic code that could apply to typeclasses and ML modules. More of a theoretical issue than anything else and to see if OCaml hackers had figured anything out we could use since it seemed to me that things like generative functors and row type polymorphism would tickle this problem more frequently than Haskell code does.
fares sensibly worse on all those metrics (the same point would apply to many other programming languages). the last released version (4.02), the compiler takes around 350 Kloc
You sure about that? GHC hasn’t cracked 250kloc to my knowledge.
The sources are actually rather readable, so you may want to give them a try.
I doubt that is an efficient way to answer my question about inlining and how it interacts with modules so if there’s no documentation or papers on it, I’ll assume the limitations are similar to how typeclasses and existential quantification interact.
I don’t think my affection for OCaml always comes through in the jokes and pokes. I’ve enjoyed kicking it around and doing so has gotten me interested in SML as well. My coauthor (haskell book) is probably tired of me talking about SML and OCaml by this point.
Being able to do optimizations not possible in AOT doesn’t necessarily translate into faster programs. To take one data point, Java does worse than C in all cases in the computer benchmark game
https://benchmarksgame.alioth.debian.org/u64q/java.html
Comparing Java to Ocaml, a language implementation with orders of magnitude less person-time put into it, Java does better than Ocaml but does worse in the longer running benchmarks.
https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=java&lang2=ocaml
Now, many of these are short-lived so you can argue this isn’t the sweet spot for Java, that’s fine, but I don’t have any other data points to call on. Anecdotally, my own experience in running long-lived services in Java is that the JIT is not impacting performance relative to similar AOT, but the JVM being significantly more complex to operate.
You can also say that against C, Java is much more high-level and safe so it’s still doing pretty good against C. Sure. But, for me, I can take Ocaml which is pretty comparable, performance wise, and has a significantly simpler implementation and run-time, so I’d gladly take that. And, again, Ocaml is comparable in performance with orders of magnitude less person-effort put into it.
So if you have some performance numbers that show a clear benefit of the JIT, happy to see them, but as far as I can see the JIT is complexity without much win.
In case you want to know more on this front…
One of the biggest stumbling blocks we (at Joyent) had when evaluating OCaml was the exception handling implementation. I managed to modify the runtime to abort() on an uncaught exception, but it seems like the only context where we know it’s not been caught is one where we’ve already unwound the stack (since exceptions seem to be indicated by returning particular special values from each function). The OCaml level backtrace is actually generated by putting a string into a global symbol every time an exception is thrown, just in case it ends up uncaught (which is why running with backtraces on incurs such a high performance penalty).
We consider it pretty important to be able to get a core file after an uncaught exception, in the context of the throw, and always run with the ability to get a detailed backtrace. The OCaml compiler makes this pretty hard at the moment.
Another thing we value pretty highly is being able to spelunk around in a core file to get information about a program’s state, and the OCaml compiler’s aggressive type elision with lack of debug info alongside to help understand what’s going on is pretty limiting there as well. I saw there was a project to add DWARF type info generation to OCaml but I couldn’t find much about it or what happened to it (I could have missed something there, sorry).
These might sound to some people like small details, but it’s a big deal to us with how often we encounter bugs that simply can’t be reproduced from instructions in a different environment. You get one shot at figuring out these bugs, and that shot is in production where you can’t just bring the service to a stand still to step it through something. Aggressive use of abort() and post-mortem is really the only good way. So not having them is really a deal-breaker for a new language for us.
In defense of Ocaml over Haskell, I think Ocaml is significantly simpler, both in terms of a language and implementation. Reasoning about how Ocaml code will perform on a system is fairly straight forward. Haskell certainly has merits, but even with a GIL I find Ocaml’s simplicity a big selling point.
If I’m reading correctly, it doesn’t require multicore OCaml. It works in either regular OCaml with the delimcc library or in multicore OCaml, though more efficiently in the latter.
There’s little info on what’s coming in the next version of F#; yet we know quite a lot of language features coming in the new versions of VB and C#. It feels like C# and VB are charging forwards at a much faster rate than F#. Do we want to get off the fast train and board the slow train?
I haven’t used any .Net language, but in any way is this because F# is already so flexible it doesn’t need much? I am a user of Ocaml and Ocaml simply doesn’t need to change much. Most of the value added when it comes to new versions of Ocaml is things behind the scenes. Sometimes nice little syntax additions but for the most part they aren’t a big problem. Compared to a language like Python or Java, where every new interesting thing you want the language to do generally requires a new version of the language, Ocaml allows users to do quite a bit of growing without modifying the language.
Thanks. This is very helpful.
Category Theory I am still trying to “grok”. I can see its usefulness, but most of what makes Haskell a great real-world language is only very loosely connected to CT. The concept of a Haskell Monad (or Functor, or Monoid) is incredibly useful but I almost find the word to be somewhat of an albatross. There’s a misconception that you have to know Category Theory to get Haskell, and that’s demonstrably not true. I know very little Category Theory.
I like OCaml’s named function arguments and don’t think OCaml “gets it all wrong”. I see OCaml’s niche as being in the very-high-performance single-threaded space and in system programming, where stateful effects are common and sometimes necessary. (OCaml doesn’t have, to my knowledge, the sort of ultra-powerful compiler that can make Haskell fast.) So I’m not bothered by its lack of IO annotations on functions. It’s still far better than nothing.
There’s a great book on Haskell performance called Parallel and Concurrent Haskell. It’s not a good first Haskell book (I’d look at Learn You a Haskell and Real World Haskell first) but once you know Haskell itself, it’s one of the best books (as far as Haskell books and multithreaded programming books go) out there.
(OCaml doesn’t have, to my knowledge, the sort of ultra-powerful compiler that can make Haskell fast.)
This is true, but also has some quite nice features: it’s very easy to look at Ocaml code and predict what the assembly will be like. The memory model is very understandable and the code is quite understandable. This is great for scaling in production. That is one thing that would make me think twice about Haskell, the distance between what I write and what runs on the machine is quite large. It’s certainly understandable, but with Ocaml it’s quite simple. Real World Ocaml is also great in that it explains these things at an operational level to the reader.
I worked on an S3 OCaml client for work. Was able to create a simple version of the S3 client as an example for the Cohttp Async library: https://github.com/mirage/ocaml-cohttp/blob/master/examples/async/s3_cp.ml Excited to be contributing back to the OCaml community!
I recently stated a new job at Pegged Software. We’re building out our engineering team. If you’re interested in doing social good and working in OCaml please take a look: https://s3.amazonaws.com/public-bucket-pegged/PeggedEngineeringPost.pdf I’d love to hear from you!
I would be interested in a bit more specific. I have a relatively good idea of what the OCaml runtime is like (in particular, I appreciate the fact that is clean C code that is readable and hackable), but I have never looked deeply in the GHC runtime. I know it has a decent parallel GC, and that it recently got some work to reduce GC pauses (coincidentally, Damien Doligez worked on GC latency improvements for OCaml’s GC, to be included in the coming release).
I am thus easily convinced on the multicore aspect (getting a good multicore runtime requires a horrendous amount of work), but do you have some specifics on the ‘gc’ angle? I am also curious about what aspects of the optimizer you think would be important for the kind of OCaml programs you would be interested in writing. The kind of programs I work with (symbolic manipulations of ASTs mostly) are rather hard to optimize as they tend to be memory-access-bound. I know OCaml could improve its unboxing on numerical code (and one thing I miss from Haskell is explicit support in the source language for unboxed values; but that had negative impact in terms of language complexity as well), and there is ongoing work to specialized higher-order function (the ‘flambda’ intermediate representation), but I wondered if you had other things in mind.