Some of his comparisons are really apples-to-oranges: obviously downloading a binary build of the JVM – or any runtime! – will be smaller and faster to get moving with than totalling up the source code, compiler, build dependencies, and compile times of another.
All of his comparisons dance around the real issue. Why do people complain that the JVM (particularly wrt clojure) feels heavy? To crib a line from James Carville: it’s the interactive performance, stupid.
irb starts imperceptibly fast. A bundle exec rails c with a spring preloader takes ~5 seconds on a moderately large Rails project (with several large engine dependencies). A simple lein repl outside a project directory, so it’s only loading the bare minimum, takes a slow count of 10 seconds.
bundle exec rails c
The clojure project has been really open about why it is that just bringing up a simple repl is as slow as it is (it boils down to ‘loading all of the separate .class files needed is slow’ ), but the impression it leaves people with is, unsurprisingly, that these tools are very heavy and sluggish.
That’s all true. But what’s a bit silly to me is that the JVM doesn’t have to be that way. I have a very small script I use, jn, that is basically nothing but a thin shim to java -Xms32m -Xmx32m -client -noverify that I use for little utility programs.
java -Xms32m -Xmx32m -client -noverify
How well does it work?
benjamin@Reliant ~/s/jcli> time python -c 'print "Hello, world!"'
0.02 real 0.01 user 0.00 sys
benjamin@Reliant ~/s/jcli> time jn -cp build/classes/main Main
0.08 real 0.05 user 0.02 sys
So, slower than Python, but plenty fast enough for me to view it as “instant.” I know that Clojure’s a lot more complicated for tons of reasons, but there’s no reason you can’t have irbesque startup times on the JVM.
I totally get it, but at the same time this is the big problem I have with java. It’s crazy to me that you have to do this at all. And every time I deploy a java package, I know that some time in the near future I’ll have to dig in and learn precisely the correct GC / memory settings to even get the damn thing to run without OoM-ing every so often. Why is it like this, and why does it seem so accepted as okay?
Depends on your viewpoint, I guess. In Python, Go, and many other languages, you can’t tweak any of these things, so you risk having an OS-level OOM and/or fragmented memory if you have a leak. (Ruby actually now appears to have similar GC/VM flags.) Java instead defaults to crashing if you have a leak, but at least lets you choose not to with the Xmx and Xms flags. And the defaults are such that, while I understand you may’ve had a different experience, most normal Java programs can go with the default. (Which is a gigabyte, incidentally. If your program needs more than that, I think it’s reasonable to learn a bit about the GC. In C, I’d at least consider a custom malloc in a similar situation.)
In other words, for normal operation, the defaults are sane. For extreme situations, the knobs at least exist.
In this particular case: I am turning off bytecode verification (my code, my rules); turning off the JIT (it won’t run long enough to be worth it); and setting both the max and min heap to 32M, effectively avoiding any GC passes. Do you need to do that? No; on a recent JVM, my tweaks now appear to shave off a whopping .02 seconds. But at least those knobs are there if I want them.
(There’s actually a bunch of tuneable knobs for the Ruby VM & GC)
Oh neat! I had no idea that’d changed. I’ll fix my comment. Thanks for the correction.
setting Xmx == Xms means avoiding any GC passes
My gut reaction was to say “Be careful with that” - but you already are, aren’t you?
This is for command-line programs, not a server program. You don’t have heavy garbage cycles or weakref caches. Minecraft uses caches for loading data from disk, and setting Xmx == Xms is a recipe for disaster - as soon as you get near the max memory, the GC starts to thrash and it appears like the server is hanging (until all the clients time out, at which point it goes back to normal - because it doesn’t need as much memory anymore!).
So, yeah - that’s why they’re tuning flags.
Weird. Using jruby is exactly what convinced me to give up on the jvm. It was indeed faster than mri once it got going, but it was several long seconds of hot nothing before that happened.
Do people really care about disk space that much? That would be somewhere about 10th on my list of heaviness metrics. Among those metrics, there’s really only one which correlates to “feels heavy”: startup latency. Curiously, that’s the one metric this article chooses not to consider.
Do people really care about disk space that much?
No; it’s all about finding some statistic to back up your perceptions that have arisen from your gut.
My take is less about resources than how “heavy”/opinionated the JVM is about languages that use it. In Clojure this manifests as no tail recursion, an object system that looks a lot like Java’s, and a sparse standard library because you’re expected to use Java libraries. This sucks because Java libraries aren’t exactly known for good API design.
I prefer Common Lisp’s quirks to Clojure’s, largely because they feel less artificial, if that makes sense.
Does anyone using Go in production cases have any numbers on binary sizes for larger team sized projects? My personal projects have never been very large in code, even if the binaries are large by compiled standards, and the idea of having 500MB in dependencies seems like you’d have to be doing a awful lot, or have a crazy set of dependencies, especially when my usage of go puts it usually around 10-20MB in binary size.
I have access to one of the oldest continuously developed go code bases in existence outside of google. The optimization server (from https://techcrunch.com/2013/07/24/walmart-labs-scoops-up-website-optimization-startup-torbit-to-help-it-keep-pace-with-amazon/) is a single go binary clocking in at 17mb.
This app handles traffic for the 37th most popular site in the US.
Most of our other “microservices” style binaries weight in around 10mb. They generally do something like read from a queue, do some work, and put a result in object storage or another queue.
My code is at 14-15MB as well, pretty consistent across the binaries of two projects entirely created by me.
Looking over my bin folder again, I think that the dividing line is “does the program use net/http? if so, it’s >10MB, otherwise it’s <10MB.”
Clojure is, far and way, the most tolerable way of working with the JVM. It has an excellent community and it’s arguably one of the best dynamically-typed languages out there. I’d recommend it highly as a way of getting into Lisp, and it does a great job of being both a “prototype” and “production” language– something that is not easy to achieve.
Scala, I would give mixed reviews at best; and Java is a C- language with a D- community…
My biggest issue with the JVM isn’t the ~2s startup time. Use C or Python if you’re writing command-line utilities where that matters. For a long-lived server, it’s not a problem. My problem with the JVM is that there’s a massive community of enterprise mediocrity (Spring/Hibernate/POJO). You might think that you’re immune to this if you use Clojure. Okay, you’ll have to look through a Java stack trace now-and-then, but that ain’t no thing. Unfortunately, most companies will see you then as a “Java developer” and have no qualms about staffing you on an all-Java, mostly maintenance, task. And now VibratorVisitorSingletonFactory patterns and “business requirements” are your problem, and you have accordingly less cognitive bandwidth for technical excellence.
I agree, clojure is the most tolerable way to jvm. It has some internal coherency, elegance, ease of use, and good tooling.
I’d agree with calling java a C language, but I strongly believe that the money Sun poured into the jvm makes the jvm an A runtime. Specifically the performance can get pretty good, the jit does some cool stuff, and everything is runtime hot swappable and instrumentable.
However the java library and tooling ecosystem gets a D- from me. It’s literally the worst set of libraries and build platforms I’ve ever worked on. Looking at you ant and maven. Java is also plagued by a legacy of 10 layers deep bad inheritence heavy designs and “patterns”.
I’d agree with calling java a C language
This gets my nomination for confusing phrase of the week. =)
However, I think Java-the-language’s grading is more complicated than that. The people designing it knew exactly what they were doing, and executed that goal incredibly well; however, their goal was to make a language even the least-talented, most poorly-trained schmuck could write functioning code in, which goal is—questionable, at the very least. (And we can see the downsides in the ecosystem, which I certainly wouldn’t give even a marginal passing grade. It’s an F- mess of nonsense and overcomplexity piled together in vast, incomprehensible edifices of thoroughly pointless code.) In terms of fitness for purpose, Java is definitely an A language, but taking that purpose into consideration drags it down significantly.
You must not follow american politics on twitter. ; )
In all seriousness, I don’t think they did a very good job of making a language even the least talented, most poorly-trained schmuck could write. I think they did a really good job of making something C++‘ish but less confusing and without the memory leaks. I’ll give them that.
The Go developers did a far better job of creating a simple language anyone can write, although in fairness, they had Java and C# to learn from.
It was inspired by Oberon-2 that Pike used once upon a day. Mixed features of other languages. Anyone using Wirth-like languages will be unsurprised by Go except in its limitations. It’s possibly the first whose runtime or whatever won’t let you do things like operating systems. Either not easily or not at all.
My biggest issue with the JVM isn’t the ~2s startup time. Use C or Python if you’re writing command-line utilities where that matters. For a long-lived server, it’s not a problem.
It’s often really useful to be able to write command-line utilities which have access to the same libraries and code used in your long-lived server.
I write Clojure for a living, and I like it, and the startup time is an issue I can deal with, but it is an issue. And it’s also, in the real world, nowhere near two seconds; with leiningen, reasonable dependencies, etc., it’s an order of magnitude more than that. I just ran lein repl on my current small project, which is ~5kloc (half of which is tests) and it took 30 seconds to get to the REPL on a 2014 MBP with 16GB of RAM and all dependencies already cached locally. That adds up–it slows down deploys, it slows down CI runs, it slows down a ton of things.
Given that most consumers of the JVM are running their code on servers they run, all running x86 and on Linux (or at least they have control over what the application runs on), most of the functionality in the JVM does not add value but complexity. Sure, there are developers too but the platforms they run are very limited as well.
most of the functionality in the JVM does not add value
How do you figure that an advanced JIT compiler and the world’s fastest GC routines aren’t relevant just because the servers are x86 Linux boxes?
Because the JIT doesn’t have a huge performance impact when you already know what system you’re targeting. And you can get as good GC without the thick layer of bytecode.
Um… neither of these assertions are actually true?
JIT allows for optimizations that are impossible to make ahead of time because you don’t have enough data about which code paths are most common–off the top of my head https://en.wikipedia.org/wiki/Inline_caching#Megamorphic_inline_caching but there are loads more. LuaJIT was able to bring incredible performance improvements to Lua but was x86-only for a long time.
Of the top 5 most efficient GC implementations 2 or 3 of them are on the JVM where it’s not unusual to see hundreds of gigabytes collected in under 100ms. I’d be surprised if more than 1 of them are on a platform that doesn’t use bytecode, but whether a runtime uses bytecode or not is completely unrelated to its garbage collection performance.
Being able to do optimizations not possible in AOT doesn’t necessarily translate into faster programs. To take one data point, Java does worse than C in all cases in the computer benchmark game
Comparing Java to Ocaml, a language implementation with orders of magnitude less person-time put into it, Java does better than Ocaml but does worse in the longer running benchmarks.
Now, many of these are short-lived so you can argue this isn’t the sweet spot for Java, that’s fine, but I don’t have any other data points to call on. Anecdotally, my own experience in running long-lived services in Java is that the JIT is not impacting performance relative to similar AOT, but the JVM being significantly more complex to operate.
You can also say that against C, Java is much more high-level and safe so it’s still doing pretty good against C. Sure. But, for me, I can take Ocaml which is pretty comparable, performance wise, and has a significantly simpler implementation and run-time, so I’d gladly take that. And, again, Ocaml is comparable in performance with orders of magnitude less person-effort put into it.
So if you have some performance numbers that show a clear benefit of the JIT, happy to see them, but as far as I can see the JIT is complexity without much win.