1. 26
  1.  

  2. 9

    Not to be “that guy”, but I’ve brought up inlining as part of the reason I use GHC Haskell for my work and I’ve gotten responses that were along the lines of:

    1. OCaml doesn’t need inlining! It’s fast already…cuz strict!

    2. Separate compilation uber alles!

    3. Inlining, even if it makes some code faster, makes it harder to know how fast your code will be! HT @apy

    I, for one, am glad OCaml’s compiler is producing faster code. I just wish people would stop making excuses when it’s as simple as labor inputs. There’s nothing wrong with OCaml, it just stands to benefit from some more love.

    Another issue for me is concurrency. Async and LWT are not convincing when I have a huge repertoire that includes STM.

    Are there any papers or documentation on FLambda? I’d like to compare how this works with the GHC inliner as I am very curious about the differences. An example of something interesting with inlining is how it interacts with typeclasses and modules.

    1. 16

      I see the point of your post: all language implementations can benefit from more effort poured into them, and indeed it’s great that Pierre Chambart (OCamlPro), Mark Shinwell and Leo White (Jane Street) could pour this work into a new inlining pass.

      I still find your three points a bit frustrating.

      1. I don’t think anybody suggested that having a strict language negates the need for inlining (see the aggressive inlining work poured into C/C++ compilers). What is true is that a lazy language is slower without optimization than a strict one (because call-by-need necessarily implies more bookkeeping), and that GHC thus has to rely on optimizations to be competitive performance-wise with other compiled languages, while OCaml implementations can do without a refined optimizer: good data representation choices and a fast runtime suffice to get most idiomatic programs within an acceptable factor of C (or your other language of reference).

      2. There is a trade-off between inlining and separate compilation, but it was already present before the flambda work – the native compiler has always done cross-module inlining and optimizations that would require more recompilation than in a purely separate compilation setup. In the last released version of OCaml (4.02, August 2014), I added an -opaque flag that can force the compiler to export no optimization information for a module, thus ensuring its compilation is completely separate from its dependencies. This helps for some workflows – typically short edit-compile-test cycles.

      3. Indeed, aggressive inlining has plenty of downsides. It makes the code performance harder to reason about (but people are working on that, providing annotations to make sure that the compiler would warn if inlining did not happen, etc.), it also makes the compiler sensibly slower, and gives it a more complex tuning interface that is harder to use. This blog post is about celebrating the landing in the upstream compiler codebase of a several man-years effect to develop this inlining pass, so it is, quite understandably, not discussing the downsides much, but that does not mean they do not exist. Of course, I expect all of this to be improved in future iterations.

      The general point about the OCaml compiler is that the compiler has an excellent performance-to-effort ratio. It is relatively simple, and it uses the 80/20 principle so that it implements the few main optimizations that really matter for most codebases, and in practice this works really well. Using last released version (4.02), the compiler source code is less 350 kilo-lines of code, and it bootstraps and builds in 1m14s on my machine (a few years old laptop). The compilation times for typical OCaml projects are excellent. GHC is a wonderful piece of work, I am amazed at how friendly and active its development community is, and the support it provided for evolving the Haskell language is humbling, but it is a much more complex compiler and fares sensibly worse on all those metrics (the same point would apply to many other programming languages).

      OCaml is competitive in performance with other implementations of ML-like languages, such as SML/NJ, MLton, or GHC. (The language benchmark site has been revamped, but the OCaml page compares its performance results with GHC and it is more than competitive.) If the programs written in OCaml in practice are faster than the same programs ported and compiled under MLton, maybe that’s a sign that the strength of your optimizer or compiler backend is not all there is to language performance?

      To my knowledge there is no paper describing flambda available. The sources are actually rather readable, so you may want to give them a try.

      1. 3

        I don’t think anybody suggested that having a strict language negates the need for inlining

        It’s a bit of a strawman when we’re talking about inlining, less so when you talk to OCaml users more generally about perf. I genuinely have had push-back on the importance of inlining for OCaml from users though.

        I added an -opaque flag that can force the compiler to export no optimization information for a module, thus ensuring its compilation is completely separate from its dependencies.

        We’ve got NOINLINE, type-check only builds, and -O0 for similar purposes.

        so it is, quite understandably, not discussing the downsides much

        I wasn’t really querying into downsides so much has hard limitations arising from certain patterns in polymorphic code that could apply to typeclasses and ML modules. More of a theoretical issue than anything else and to see if OCaml hackers had figured anything out we could use since it seemed to me that things like generative functors and row type polymorphism would tickle this problem more frequently than Haskell code does.

        fares sensibly worse on all those metrics (the same point would apply to many other programming languages). the last released version (4.02), the compiler takes around 350 Kloc

        You sure about that? GHC hasn’t cracked 250kloc to my knowledge.

        The sources are actually rather readable, so you may want to give them a try.

        I doubt that is an efficient way to answer my question about inlining and how it interacts with modules so if there’s no documentation or papers on it, I’ll assume the limitations are similar to how typeclasses and existential quantification interact.

        I don’t think my affection for OCaml always comes through in the jokes and pokes. I’ve enjoyed kicking it around and doing so has gotten me interested in SML as well. My coauthor (haskell book) is probably tired of me talking about SML and OCaml by this point.

        1. 3

          I doubt that is an efficient way to answer my question about inlining and how it interacts with modules so if there’s no documentation or papers on it, I’ll assume the limitations are similar to how typeclasses and existential quantification interact.

          Just to clarify, I’m not particularly happy with the under-documented state of flambda either, or think that the current answer is satisfying. I’m not sure which limitations related to existential types you are referring to, so I can’t comment further on that.

          Re. line counting: I’m not sure what’s a good way to count lines of code for projects. If I clone ghc’s git repository and run git ls-files *\\.* | grep -v testsuite | grep -v docs | xargs wc -l, I get 544k lines as a result. The same request on OCaml’s current trunk gives 330k lines. Before flambda, 276k. I would agree the results are actually relatively comparable (2x is within the margin of error of the measuring technique).

          1. 1

            I usually use cloc. The number is higher than I remembered, but not that far out either:

            $ cd ghc
            $ cloc .
               11590 text files.
               10788 unique files.                                          
                4921 files ignored.
            
            http://cloc.sourceforge.net v 1.60  T=11.80 s (560.5 files/s, 54595.0 lines/s)
            --------------------------------------------------------------------------------
            Language                      files          blank        comment           code
            --------------------------------------------------------------------------------
            Haskell                        5993          87080         121903         318525
            C                               201           9187          11932          48794
            C/C++ Header                    193           3237           5570           9640
            yacc                              4            854             10           4275
            
      2. 9

        In my defense, I don’t think I’ve ever explicitly said anything against inlining in regards to Ocaml vs Haskell. I have spoken about the runtime and compilation being easier to understand and reason about, but that is mostly a result of laziness, IMO. I don’t know how much flambda affects ones ability to understand what Ocaml code turns into on the machine and predict its performance. I’m excited for flambda, and hope it doesn’t make programs significantly harder to understand. I am excited that functor’s will become less expensive, I’m a fan of functors and use them quite a bit.

        Yes, parallelism/concurrency will probably be in the next release of Ocaml, which is pretty exciting, assuming they can keep the memory model sane. ref’s, what a bummer when it comes to trying to add parallelism to a language.

        EDIT: By “next release of Ocaml” I mean the release after the one described in this post.

        1. 1

          Inlining, even if it makes some code faster, makes it harder to know how fast your code will be! HT @apy

          For many applications worst-case performance is far, far more important than average-case performance, and any average-case optimization that impedes your ability to measure worst-case performance is a liability. Look at all the fuss around Vulkan. I’ve experienced this directly with Scala, which does opportunistic TCO, so a seemingly innocuous change can dramatically change performance or even lead to stack overflows; I’d far prefer if Scala would only apply TCO to @tailrec methods.

          I’m not sure how you’d do the same with inlining, but I hope this FLambda has very simple, consistent rules so that it’s clear from the source what will or won’t be inlined, and easy to control whether inlining happens.

          1. 1

            For many applications worst-case performance is far, far more important than average-case performance, and any average-case optimization that impedes your ability to measure worst-case performance is a liability.

            I’ve never seen inlining do anything to worsen pessimal case performance in GHC Haskell. The optimizer is quite conservative.

            Look at all the fuss around Vulkan.

            What does this have to do with inlining?

            I’ve experienced this directly with Scala, which does opportunistic TCO, so a seemingly innocuous change can dramatically change performance or even lead to stack overflows;

            This is not at all comparable with inlining in Haskell or OCaml. I’m sorry your language implementation disappoints you.

            I’d far prefer if Scala would only apply TCO to @tailrec methods.

            ok

            I’m not sure how you’d do the same with inlining

            In GHC Haskell, because of the optimizer is very conservative, we have INLINE and INLINABLE.

            , but I hope this FLambda has very simple, consistent rules so that it’s clear from the source what will or won’t be inlined, and easy to control whether inlining happens.

            I don’t think a mature optimizer that doesn’t do dumb things with inlining will be anything other than cost-based. If you make it naive because you want simple rules, you’ll just get pissed that it’s inlining or not-inlining in stupid ways.

            1. 1

              What does this have to do with inlining?

              It’s an example of how much value people place on being able to reason about performance.

              This is not at all comparable with inlining in Haskell or OCaml. I’m sorry your language implementation disappoints you.

              How can it be otherwise? If inlining happens automatically then surely there are cases near the margins where a small, seemingly innocuous change makes the difference between a function being inlined and not. And inlining presumably has a dramatic performance impact (otherwise why would you care about it?)

              In GHC Haskell, because of the optimizer is very conservative, we have INLINE and INLINABLE.

              NOINLINE is more encouraging. But I’d think you would want to control it at call site rather than only on the function.

              If you make it naive because you want simple rules, you’ll just get pissed that it’s inlining or not-inlining in stupid ways.

              Then make it explicit. Make the pragmata concise enough that it’s natural to write code where every potential-inline is marked explicitly one way or the other.

              1. 1

                But I’d think you would want to control it at call site rather than only on the function.

                While INLINE says “please inline me”, the INLINABLE says “feel free to inline me; use your discretion”. In other words the choice is left to GHC, which uses the same rules as for pragma-free functions. Unlike INLINE, that decision is made at the call site