1. 31

Today’s interview is with Lobsters user Moses Nakamura.

Introduce yourself, describe what you do for work and how long you’ve been at it.

Howdy, I’m Moses Nakamura.

I’m a software engineer at Twitter on the Core Systems Libraries team. We own the libraries that the real time services that power twitter are built on top of: util, finagle, scrooge, ostrich, twitter-server (all OSS). I’m taking this time to learn deeply about how the JVM works, how service oriented architectures fit together, how to build and debug distributed systems, and API design. I’ve been at Twitter for almost nine months, and I still feel like I’m learning a ton.

Because we own the core libraries that a huge portion of twitter relies on, we end up getting a lot of support requests, so probably 20% of my time ends up being devoted to bug hunting in other people’s services. This ends up being frustrating, but is also incredibly valuable user research.

Before I was at Twitter, I was at Tumblr, where we used Twitter’s stack as a consumer. Back then, I’d occasionally hack on finagle on the weekends, and I jumped at the opportunity when I heard they were hiring. It means that I don’t have as many weekend projects, because most of my weekend projects were finagle related, and now I get to do the stuff I’m really excited about full time.

When I was in high school, I got my first programming gig in a neuroscience lab—I didn’t know anything about programming, or neuroscience, but the consensus was that it would be easier to teach me how to program than neuroscience. I wrote a driver for a a robotic arm for a rat that was hooked up to a sensor in its brain, which got me interested in brain machine interface. I hacked a little more in high school, but didn’t really go anywhere until I was in college (Columbia University, CC, 2009 -> 2012) and I really properly looked around at what was going on in brain machine interface. Everything was incredibly primitive, and I realized that I wanted to work in brain machine interface in 50 years, not for the next 50 years, so I jumped ship to computer science. Software engineering has been intellectually satisfying, so I’m quite happy with how this all turned out.

What is your work/computing environment like?

I use a Macbook Pro, with two monitors, the standard Apple keyboard, a trackpad and a mouse. I spend half of my time in New York City and half of my time in San Francisco (three months at a time), so I’m not too picky about my gear. It’s completely possible that I might have to go get all new stuff when I show up in a city and my gear has been repurposed, so it doesn’t make sense to get too attached.

What software are you most often using?

Probably Chrome. For actually creating software, I spend pretty much all of my time in emacs. I use emacs buffers as an ersatz tmux, and run ansi-term + zsh in one window, with code in most others. I’ve recently started using org-mode a little, which is fun, but I haven’t gotten into it as obsessively as some org-moders do. We use an internal build tool called pants which is based off of Google’s blaze, and I’ll often have several ansi-terms devoted just to that.

What’s an interesting project you’ve been working on recently?

Finagle is adding a feature which we’ve been calling GC avoidance, although it might be more accurate to call it GC leasing. The premise is that we can predict when the JVM is going to trigger a minor garbage collection with very high accuracy, and so we should exploit this to warn clients not to send requests to servers before they gc. All of Twitter’s real time services are built on top of the JVM, so our long tail latency is often tied to garbage collecting servers. Adding a leasing API will provide a knob between throughput and long tail latency that we can tune as we like.

All of this will be built on top of mux, which is a protocol that CSL came up with a little before I joined. It’s a session layer protocol, and the canonical implementation is thrift-mux. It supports multiplexing, active liveness detection, and a leasing API, among other features. We’ve already run several experiments that seem to show good results, so I’m getting GC avoidance into a state where it can turned on for a few services in production on demand so we can iterate faster. So far I’ve churned out primitives for the GC avoidance internals, but I’m still working on writing the server-side part of the leasing, and the meat of the GC estimation algorithm (which will copy the way the JVM actually behaves).

What is something new you’ve used or integrated into your work that has made a positive impact?

org-mode is pretty good. Other than that, my workflow has been mostly the same for a while.

How did you get started with finagle, and what got your interest in it?

We were building online services at Tumblr that looked an awful lot like the ones at Twitter, so finagle was great for solving the problems we had. For example, the first system that I built at Tumblr was a zipkin backed by redis. Zipkin is a finagle service, so I became familiar with the inner workings of finagle services, and was able to learn a bunch about finagle from the zipkin engineers, who were helping me build out the redis backend.

The engineer at Tumblr who turned us all on to finagle had written the finagle-redis library, and so as we ran into bugs, he’d encourage us to fix them. I progressively took on more and more bugs, and started reading and contributing to other open source projects at Twitter. I kept on coming back to finagle though, because it focused on a problem that really excited me, which was, “How do you make it trivial to write a robust online distributed system.”

One of the cool projects I got to work on at Tumblr was to rewrite our bootstrapping script with the latest best practices for the tools we used, and to revamp our standard library for building online services. When writing this, I had to think concretely about what the right way to get a project up and running for us was, and which bells and whistles were the right ones. Around the same time, Twitter released twitter-server which solves a similar problem for Twitter.

A little after the rewrite, I helped onboard a new engineer, and he was able to have a simple finagle service up and running within hours. It was an incredible thrill to help him achieve that, and I knew that I wanted to keep on doing more of it.

    1. 4

      Hi Moses! I have another question for you, if you don’t mind:

      You mentioned that you used to hack on finagle on the weekends. Since finagle is a building block of very-large-scale distributed systems, I’m curious what you were running it on for your weekend hacking. Were you spinning up cloud instances, or just working on things that could be tested in a desktop cluster?

      1. 4

        Hi Mike! Most of the things I wanted to work on were improvements to the library itself, so I didn’t end up spinning entire topologies. Sometimes it was just things like adding commands to the redis library, which only required spinning up a local redis and checking correctness. Those things only required a fan powerful enough to cool down your CPU running scalac.

    2. 3

      Random fact: .NET provides advance warning of GCs so that you can coordinate with a load balancer to ensure low latency.

      1. 2

        Neat! Twitter has a VM team, so we might add that to our JDK–it seems like potentially a hard sell to get it back into the openJDK though, so even if we could take advantage of it, it’s unlikely that open source consumers would be able to.