1. 8

    Everytime I cross my fingers that @bcantrill will say “We’re moving from node to Ocaml”. So close this time, SO CLOSE. It’s a great little language with a node.js compatible concurrency model and it could use some Joyent-style love around debuggability and dtrace.

    1. 7

      In case you want to know more on this front…

      One of the biggest stumbling blocks we (at Joyent) had when evaluating OCaml was the exception handling implementation. I managed to modify the runtime to abort() on an uncaught exception, but it seems like the only context where we know it’s not been caught is one where we’ve already unwound the stack (since exceptions seem to be indicated by returning particular special values from each function). The OCaml level backtrace is actually generated by putting a string into a global symbol every time an exception is thrown, just in case it ends up uncaught (which is why running with backtraces on incurs such a high performance penalty).

      We consider it pretty important to be able to get a core file after an uncaught exception, in the context of the throw, and always run with the ability to get a detailed backtrace. The OCaml compiler makes this pretty hard at the moment.

      Another thing we value pretty highly is being able to spelunk around in a core file to get information about a program’s state, and the OCaml compiler’s aggressive type elision with lack of debug info alongside to help understand what’s going on is pretty limiting there as well. I saw there was a project to add DWARF type info generation to OCaml but I couldn’t find much about it or what happened to it (I could have missed something there, sorry).

      These might sound to some people like small details, but it’s a big deal to us with how often we encounter bugs that simply can’t be reproduced from instructions in a different environment. You get one shot at figuring out these bugs, and that shot is in production where you can’t just bring the service to a stand still to step it through something. Aggressive use of abort() and post-mortem is really the only good way. So not having them is really a deal-breaker for a new language for us.

      1. 2

        Thanks for the response! And I understand the desire for debuggability. Out of curiosity, did you get any response from the community on why these things are not in Ocaml and the possibility of getting them in it?

        Also, I don’t quite understand one aspect of the uncaught exception thing. The async libraries I’m aware of always have a catch-all for exceptions when executing callbacks so that the entire system does not get stopped due to one stray exception. Does Node.Js not do this? Can a single uncaught exception stop the entire program in Node? Or do you do a lot of sequential programming in JS where you don’t have these callback handlers? Or do you rip out any catch-all exception so a stray exception can kill the system?

        Similarly, while I can understand the desire for abort-on-uncaught-exception, IME this issue isn’t really present in Ocaml programs. The convention is that exceptions should not cross API boundaries, so they should be elevated to something in the type system (such as option or result). I don’t think I’m nearly as thorough as Joyent and I can’t remember the last time I had a stray exception. But Joyent is doing more complicated work than I do so that may or may not make one feel better. Not that this is the only issue with Ocaml for you. But hey, happy to offer consulting hours to make Ocaml Joyent-approved :)

        Do you know how Golang fits into the Joyent model (@bcantrill mentions it in his talk as an option)? It seems to be its own strange beast from what I understand but maybe it’s closer to workable.

        Thanks again for the response. I get great joy out of Joyent’s products and I feel a pang knowing they are written in JavaScript…and possibly Golang :) I’m not all negativity, though, I’m hoping Joyent does some cool stuff with Rust and look forward to seeing it.

        1. 3

          Yes, in JS-land we actively go around ripping out domains and uncaught exception handlers from libraries we want to use (and sometimes end up running our own forks of them permanently as a result). We always run with --abort-on-uncaught-exception so that a single uncaught exception takes down the process. All our services are managed by SMF which will restart it immediately after a crash. The core files are hoovered up automatically by thoth, which puts them in our object store and indexes them in a database, then automatically runs diagnostics to compare them to known bugs and identify key information about them.

          We haven’t interacted much with the OCaml community as yet, and honestly our use of it has been pretty casual just to try it out and see if we can make it work. I know the type system and the normal ways modules are used means that a lot of things that would be runtime errors in other languages are caught by the compiler, but it’s still not perfect, and we really value the ability to have assertions and panics that are debuggable in this way in any language.

          Golang really is its own strange beast, and we’re having a lot of internal back-and-forth about it. There are some people in the company who are super keen on using it and are pushing hard on trying to find the best ways we can do that. Sadly the OCaml fanclub is quite a bit smaller. But nothing is set in stone – and as Bryan is fond of saying, our future is probably a more polyglot one, rather than trying to find one new language for everything, at least as it looks today.

          1. 1

            Thanks for the detailed and thoughtful response, looking forward to hearing more out of those of you that work at Joyent.

      2. 6

        Joyent getting behind OCaml would indeed be a welcome thing. You’ve ruined my day now that I’m thinking about it not happening.

        1. 3

          I’m surprised they’re not using Erlang/OTP. Isn’t BEAM, like, the most observable and debuggable platform ever?

        1. 3

          Thanks for writing this up! It’s been disappointing to see some anti-mitigation (“just don’t write bugs”) folks pick this paper up as justification for their positions, given how ridiculous the assumptions made in it seem to be. But I guess it is possible (maybe?) that the authors do have something more interesting here and have just written it up poorly – no way to tell without more details and code. The creed of “PoC or GTFO”, while crude, seems appropriate.

          1. 7

            I’m glad they’re looking at this. There are some issues, though:

            1. It’s partial ASR, not ASLR. There are technical differences between ASR and ASLR.
            2. Performance will be degraded due to using ASR instead of ASLR.
            3. Without PIE base randomization, AS[L]R won’t do much good.
            4. They’re not randomizing the top of the stack, only inserting a gap.
            5. I don’t know whether IllumOS supports a VDSO, but I didn’t see VDSO randomization in the patch (perhaps the VDSO doesn’t exist at all?)

            Regardless, having any form of randomization is better than none. And I’m sure this is just the start of the work. I hope they plan to support PIEs.

            1. 3

              For what it’s worth, this is definitely just the start of the work. This patch has been floating around for quite a while, and most of what you’ve listed here was brought up right back at the beginning – but like you say, we decided that as a first step, something is better than nothing. illumos still has quite a bit of exploit mitigation work to catch up on at the moment when put against other server operating systems of its class (heck, we only got stack cookies turned on late last year in the kernel and we’re still on gcc 4.4.4 so they’re not even in half the functions I want them to be in).

              PIE is on the cards. Rich Lowe said back at the beginning that he already had a dirty hacked-up prototype of it, but it probably needs a lot of work before it can go in.

              Regarding vDSO – we don’t use that trick currently, though Joyent’s SmartOS distro includes some patches that add it only for Linux emulation (in the “LX brand”). Specific enhancements here will likely have to be done before LX makes it into upstream illumos.

              And as for terminology… well, consider it aspirational – at least for me, I’d expect the PROC_SEC_ASLR flag to eventually turn on full ASLR once we have it. For now it’s doing the best it can (with the code it has) to fulfill what the user asked for. You could argue that’s misleading, and I’m not going to really disagree, but eventually I hope it won’t be. The bug title / commit message could have been clearer, but I hate bike-shedding those.

              1. 2

                Completely agreed and it’s good to see that this is just the start. We at HardenedBSD are still playing catch up. There’s a whole heck of a lot of tasks to bring the grsecurity patchset to HardenedBSD. And we’re a smaller team that IllumOS.

                Rich Lowe asked for my thoughts and suggestions in an email thread to him, which was sent off this morning. I gave some helpful suggestions on how to make the implementation better, but still keeping along the lines of ASR rather than ASLR.

              2. 1

                I think that many people consider ASLR as a generic term, applying to various forms of address map randomization. The Wikipedia page on ASLR doesn’t make a clear distinction, and I have yet to find a good, detailed comparison of the implementation in various operating systems, or a reasonable benchmark comparing implementations.

                1. 2

                  I guess we’ll just have to eternally agree to disagree. Here’s why I’m so insistent on ASR vs ASLR nomenclature (in some of the points, I’m going to use HardenedBSD as an example, because I’m most familiar with its ASLR implementation):

                  1. The dude who coined the term ASLR, pipacs, aka PaXTeam, has a few more years experience than probably both of us at exploit mitigations. I like to learn from the experiences of others. He believes there’s enough technical difference between ASR and ASLR to merit the separate names.
                  2. Crypto is expensive. Pulling from the entropy pool is expensive. FreeBSD’s entropy pool doesn’t block, but pulling from the entropy pool is still expensive regardless.
                  3. HardenedBSD pulls from the entropy pool at most five times (PIE base, stack, mmap(!MAPFIXED) mappings, VDSO, and if supported, for MAP32BIT). For systems that don’t support MAP32BIT, HardenedBSD’s ASLR will only pull from the entropy pool four times. This is done within execve time, not during other points in the application’s lifecycle. By contrast, ASR pulls from the entropy pool during the lifecycle of the application. Firefox on my system, for example, loads 122 shared objects. That means ASR will pull entropy at least 122 times during run time. We must conclude, therefore, that pulling from the entropy pool five times during execve is less expensive than continuously pulling from the entropy pool during the lifecycle of the application. 5 < 122.

                  Now, I’m not a performance engineer, but it doesn’t take one to know that doing expensive crypto operations fewer times means better performance. Unless there’s some magic that says the opposite. I think by performance alone, there’s merit to show distinction between ASR and ASLR. Add to that not using deltas.

                  Side-note: lobsters is misformatting my comments when using underscores in the right spots for MAPFIXED and MAP32BIT, so I’ve left them out. But I’m sure it’s clear where those underscores are supposed to be.

                  1. 1

                    I think you may be missing my point. I’m not really disagreeing with you and certainly there are technical differences between the two approaches.

                    I’m simply saying that:

                    1. Like it or not, ASLR is already in common use as a generic term to refer to a number of different approaches to address randomization.
                    2. Absent benchmarks we’re left with hand-wavy arguments about one approach being “more expensive,” without any indication of how much more, or if it’s enough to make a significant difference in application performance.

                    The final sentence in your first comment is the important part: getting this change into Illumos is much more important than whether the commit calls it ASR or ASLR and whether or not it addresses all of the related exploit mitigations.

                    1. 1

                      I understand. English isn’t a dead language and meanings of words are subject to change. But, being an engineer, I prefer a distinction made between the two to easily tell the user what’s going on under-the-hood. If a user educated in security sees that an operating system uses ASR instead of ASLR, said user instantly knows the ramifications. If the technical implementation of ASLR is actually ASR under-the-hood, yet is marketed as ASLR, there is a lot of ambiguity. The user may assume deltas are being used.

                      Consider a developer writing an application that needs to make careful use of the virtual memory subsystem (maybe qemu or wine could be good examples). If the operating system claims ASLR when really it’s ASR, the developer might be confused when certain tasks within the application misbehave.

                      There’s nothing wrong with being extra clear about an implementation. Correctly stating the implementation is ASR benefits everyone in the long term.