1. 32
  1. 17

    As I read this I imagined that JGit was going to be 10x slower than C Git, so I was rather surprised at the end when the author gave (presumably slow) examples where the difference was only 2x. Now I wonder which codebase is easier to understand / modify for an average developer (moderately experienced with the respective language).

    1. 9

      It’s been 12 years since that 2009 post - I wonder how much the gap has closed with improvements to the JVM?

      1. 3

        Since then the cost of indirect memory access has got only worse relative to CPU speed. Has Java landed Value Types yet? They’re the key piece missing.

        1. 2

          They haven’t landed yet, but once they do it’s going to close a lot of the gap between JVM JIT and native code for sure.

      2. 6

        Isn’t the biggest problem with jgit the JVM startup time? Booting up all that support infrastructure takes of course longer than in barebones C land

        1. 2

          Doesn’t really matter if it’s being hosted in a persistent process (i.e. an IDE).

        2. 4

          A pretty good rundown of the various architectural mistakes the JVM makes. It just wasn’t necessarily clear in 1995 that these were mistakes, though the detractors will say it was obvious all along. By the time it became clear Java was already being hyped to the moon and nobody was going to make a breaking change.

          1. 6

            Remember Java was originally intended for embedded programming, but basically all these decisions made it unsuitable for that purpose.

            1. 15

              Remember the starting point. The JVM was based on ideas from the Smalltalk VM, with some of the most difficult bits to optimise refined:

              • Smalltalk had no intraprocedural flow control, everything was a message send (with some ‘primitive methods’ that had direct dispatch to help this a bit), Java added loops and branches.
              • Smalltalk had no primitive integers, everything was an object with boxing everywhere (and some optimisations to embed 31-bit integers in pointers). Java added unboxed primitives, without any of the dynamic checks.
              • Smalltalk had duck typing, Java uses explicit nominal types and interfaces that allow vtable-based dispatch.
              • Smalltalk had arbitrary reflection, Java doesn’t allow classes to be modified after construction, which means that you need a lot less deoptimisation with a JIT.

              .NET learned more from Java and added:

              • Value types, which avoid heap allocation for a bunch of things (though with some interesting concurrency issues).
              • Unsigned types. These were harder to do portably in the past (C doesn’t define signed integer overflow for the same reason).
              • Non-virtual dispatch by default, so ahead-of-time compilation can do inlining / direct dispatch.
              1. 15

                You’re correct in broad strokes, but I’m either misunderstanding what you mean, or you’re slightly off on a few things.

                At the time, Smalltalk bytecode had the same looping and conditional capabilities Java did, unless I’m missing something very fundamental. If you mean that Smalltalk had e.g. #ifTrue:ifFalse with blocks, whereas Java had language-level if/else, that’s correct by language spec, but not by implementation: all Smalltalk VMs by the 90s that I’m aware of, including at least Gemstone, VisualAge, Squeak, and VisualWorks, special-cased key methods in the VM (e.g., #ifTrue:ifFalse:, #whileTrue:, #do:, etc.), effectively giving them the same performance as Java. The commercial VMs all had polymorphic inline caching (PIC) as well, which largely eliminated the duck-typing and dynamic dispatch hit. Combined, these made Smalltalks’ overall performance in the mid-90s significantly better than the JVMs available.

                Likewise, using tagged pointers for arithmetic was sufficiently commonplace that Smalltalk performed basically on par with Java for numeric calculations until you had numbers bigger than 2^31. Paradoxically, Java’s need to box primitives as soon as they were in any structure other than a native array often meant that my Smalltalk code at the time outperformed Java, since the Smalltalk code wouldn’t be constantly boxing and unboxing, but Java often would. (And it’s worth noting too, in an embedded context, that Smalltalk did carry around WordArray, ByteArray, etc. for those times when you truly needed to work with machine values.)

                Given that Smalltalk was designed in the late 70s, and Java in the early 90s, I think it’s fair to ding Java either for not learning enough from Smalltalk to target the embedded space, or for failing to deliver the features embedded work really needed. You are right that Java did make some design decisions that ultimately allowed better optimization than Smalltalk permitted (unless you went the StrongTalk route and kneecapped some language features for speed), but I think that ended up being more luck than intention.

                1. 4

                  Even Xerox’s original ST80 compiler inlined control-flow messages. That was definitely part of the original design.

                  1. 5

                    I honestly was 99% sure of that, but I only have the purple book in my office, not the blue book, and didn’t trust my memory. Thank you for confirming.

                2. 3

                  One more thing on the last list which I think is important: Real generics, which the JIT may monomorphize if it feels like it. This again reduces the amount of heap allocation and allows compilation to do more direct dispatch.

                  1. 2

                    Fair. Put like that there’s a whiff of business types deciding to yeet a product out before it was good. Which on one level worked splendidly but on the other was a missed opportunity (Sun Microsystems are no more), or perhaps a misplaced bet on monetizing web technology.

                    1. 9

                      Remember that .NET had over five years of large-scale Java deployment to learn from before shipping the first version of the CLR. Java was a pretty good design for its time (not including JNI until 1.1 was a big oversight and it took until 1.2 to iron out some of the inconsistent naming in the standard library).

                      1. 3

                        .NET was also willing to make generics not backwards compatible, so they could be part of the VM’s type system, which allowed for actual type specialization, removing the need for gratuitous boxing and type checks, and reduced memory usage.

                        That said when I did the porting of the original rotor/gyro patch to PPC the codegen was .. not good :D

              2. 3

                There are many locations in JGit where we really need “unsigned int32_t” or “unsigned long” (largest machine word available)

                CHICKEN does the latter as well. This always seemed a bit roundabout to me, and AFAIK it’s not really specified that long must be the largest native machine word. Makes me wonder how much more efficient a language that did offer such things more explicitly would be.