This is a good survey talk, a good explanation of how to do benchmarking, especially for the JVM. I think she avoided making it too runtime-specific, which is probably a good tack for this kind of talk.
On the JVM, I’ve found that jmh works quite well. There’s a little more support in general for caliper, but jmh is catching up, and is quite good at forcing you to write good benchmarks. Alexey Shipilev (admittedly biased) provides a good explanation of what jmh has learned from the mistakes of its predecessors in this “JMH vs Caliper” thread on mechanical sympathy. Caliper also doesn’t seem to be actively maintained in OSS (the last OSS change was January), so in general I think it makes more sense to use jmh over caliper, although caliper used to be the industry standard.
She pointed sort of generally at what the problem at 128K keys was, in that there was probably a problem with doing effective caching, because she wasn’t trying to take advantage of cache locality. I wonder if the problem was that Riak started swapping. Presumably it would have been ameliorated because M3 uses local SSDs and not EBS for persistent storage, but it could still be quite painful, given the 50% random reads. It should still generally look like a cache hierarchy slowdown (persistent state can be thought of as just another cache) but the constant multiple increase from having to go to persistent storage is much higher than for RAM. It would be interesting to hear from one of the Riak folks on what they think was going on. (I think a few of you are here–@cmeiklejohn?)
I wonder if the problem was that Riak started swapping.
With the maximum number of keys = 1,024,000 gives us ((22+4) overhead + 4 key size bytes + 10000 value size bytes)*(3/5)=5877 MB total memory used per machine, which is smaller than the RAM size of 8GB. Even if we take into account the expected number of machines accessed per operation (to include temporary copies in the coordinators), that still gives us 6600 MB. So from this I think we can safely conclude that we were not swapping.
Notice the scale on the graph: it was not normalized to 0, and I observed increase of only 200 usec from 4k to 1,024k keys.
I used bitcask in this benchmark, so I’m inclined to think that this was probably due to the block cache churn.
What are you folks doing to actively combat this? Do any of you factor “cultural fit” into your hiring process? If you do, how do you quantify that and attempt to account for your social biases? Everyone has such biases. I only bring up cultural fit when discussing candidates if I think they may have attitude issues. Even that’s fraught with potential bias.
I think cultural fit is very commonly used as an acceptable euphemized bigotry. What do you think?
I’ve found that bringing in at least 1 interviewer from the same group as the candidate makes the candidate feel more comfortable during the interview. Also, the candidate is more likely to seriously consider working for the company if they feel like their identity group is represented. This, of course, implies that the company has enough diversity to make this possible, and someone from any of the underrepresented groups had to be first.
Then hire the best.
The problem is that our monkey brains are extremely good at identifying the person that looks a lot like ourselves as “the best candidate”, even if that’s not really true. That’s the nature of unconscious bias.
I heard an interesting approach from a friend that went through an interview: the very first phone screen was set up in such a way that the engineering team never heard his introduction (“Hi, I’m ‘foo’, I’ve worked on software for x years”, etc) and only came in during the technical portion of the call to listen and discuss technical questions.
At the most all they knew was that he was a man - didn’t know age, race, etc. He talked to them about it during a later part of his interview and while they admitted it wasn’t a perfect setup, they did find they were bringing more diverse candidates through to later rounds of the interview process.
I’m not sure we can ever get a double-blind interview set up, but I’d be really interested to see what candidates that go through that process would look like.
Yeah there are a lot of unintentional biases we have. Techniques like that and hiding the person’s name on the resume would help. You have to start somewhere though and not everyone is going to have the weight to change the entire interview process at their company.
Interesting paper. I think the model is missing one more aspect that causes net negative effect: production of additional maintenance overhead. If a programmer writes a lot of code quickly but the quality of the code suffers, this may make the team less productive overall due to the increased technical debt. Also, it will make other team members less productive if they have to catch and point out the issues frequently during the code review process.
I think it’s implied in the fourth paragraph:
This negative production does not merely apply to extreme cases. In a team of ten, expect as many as three people to have a defect rate high enough to make them NNPPs. … If you are unfortunate enough to work on a high-defect project (density of from thirty to sixty defects per thousand lines of executable code), then fully half of your team may be NNPPs.
I think a valuable takeaway point for those interested in writing their own compiler is to use appropriate tools to get to the end goal as quickly as possible. Lexing, parsing, and code generation are all very interesting and challenging problems on their own. However, when writing your first compiler none of these stages should slow you down. Then, you can focus on the design of the internal representation and optimizations, which IMO will gain you better understanding of compilers.
It’s great to learn about the collaboration that takes place between industry and academia and ways we can foster and improve these relationships.
Is your company/team actively working with researchers (internal or external)?