1. 3

    I think this post sort of belabors the point: small changelists are better than long ones, and it’s easier to manage small PRs if you have better tooling for it.

    I disagree with the “one local commit is one diff” argument, and although my company uses phabricator, we don’t use it that way. Sometimes a commit is only part of an idea, or doesn’t include tests, and that can still make sense as a local commit–with that said, it probably won’t make sense as a commit on master, and you should take advantage of the diff to collate different pieces that will become a single commit eventually.

    1. 4

      Camille Fournier has a really excellent book on exactly this topic, called The Manager’s Path. I strongly recommend it.

      1. 3

        I love that book. Ordered it 2 weeks ago and currently in the middle of reading it. :)

      1. 19

        My one beef with these kinds of articles is that they phrase things so that it sounds like Google has a grand plan to destroy open standards, but it may actually be that there are many local decisions that ended up doing it. The CEO of Google probably didn’t reach down one day and say, “Let’s get rid of XMPP”. I think it’s more likely that the hangouts group decided to stop maintaining it so they could compete with other chat products that weren’t restricted by XMPP. This isn’t to say that a trend of this kind of behavior from Google isn’t something to talk about, but probably it’s either something fundamental about how to make money from open standards, or else something about Google’s incentive structure. If you asked Sundar Pichai to stop doing this, he would probably say, “I don’t know what you’re talking about, but we have never made “destroying open standards” part of our long-term strategy.”

        1. 37

          Their intentions are irrelevant; only their actions and the consequences of them matter.

          1. 17

            If the sole intention of the article is to encourage other folks to avoid this pitfall, sure! If we want to also convince Google to stop doing it, then the practical mechanics of how these things actually happen are of vital importance. It’s probably the rank and file that want to do something innovative in order to hit quarterly goals, and are sacrificing open standards at the altar–getting those kinds of people to understand the role that they are personally playing is important in that context.

            1. 6

              You’re right that it’s important that the labor understand what the consequences of their efforts actually are; I think that’s what I’m saying, too.

              It’s also important that other people who are impacted by these actions by powerful actors like Google or Apple or Microsoft, but who don’t work there, understand who is responsible for these social negatives.

              It’s further important, for everyone to understand, that the directly responsible parties for that social cost are the corporations themselves, whose individual human members’ culpability for those costs is proportional to those members’ remuneration. Pressure should be applied as closely and directly to the top of that hierarchy as possible in order to convince them to stop it, in whatever way you can do best. The OP article is addressing the top of that hierarchy in Google’s (Alphabet’s?) particular instance, since they’re a very powerful actor in the space of the Internet and software in general.

              1. 6

                Additionally, though it may be helpful for third parties to critique actors like Google by having concrete suggestions or perfect empathy for the foot-soldiers caught up in the inhumane machine that Google in some ways is, it’s not the obligation of the victims to make things easy for the powerful. It’s the moral obligation of the powerful to be mindful and careful with how they act, so that they don’t inadvertently cause human suffering.

                Before any ancap libertarian douchetards weigh in with “corporations aren’t moral entities”, they absolutely act within the human sphere, which makes them moral agents. Choosing to be blind to their moral obligations makes them monsters, not blameless. Defending their privilege to privatize profit and socialize cost is unethical and traitorous to the human race.

            2. 7

              I think it’s more likely that the hangouts group decided to stop maintaining it so they could compete with other chat products that weren’t restricted by XMPP.

              That’s reasonable –by all accounts, xmpp is terrible – but the replacement could have been open sourced. This detail makes it clear that closing off the chat application was intentional. When GChat’s userbase was small, it made sense to piggy back off of the larger XMPP community. When GChat became the dominant chat client, it no longer needed the network effect that a federated protocol provided, and it moved to a proprietary protocol.

              1. 1

                By whose accounts?

                The vast majority of “commercial” chat networks are xmpp under the hood, with federation disabled.

                Being technically poor isn’t why they turned off federation, it’s because federated chat gives zero vendor lock-in.

                1. 2

                  I believe you and @orib are in agreement when s/he says:

                  This detail makes it clear that closing off the chat application was intentional

              2. 9

                On the other hand, the CEO of Google could decree that using open standards is important.

                I agree that this is closer to a natural disaster than a serial killer but an apathetic company doesn’t mean the outcome is better than an actively antagonistic company.

                1. 9

                  I thought this article was fairly agnostic about how conscious Google’s embrace, extend, extinguish pattern is. This seems like the right approach, as we don’t have any way of knowing.

                  We know from court proceedings that Microsoft executives used the term “embrace, extend, extinguish” (and no doubt they justified this to themselves as necessary and for the greater good). We don’t have the same window into Google executives’ communications, but it seems foolhardy to think that some of them wouldn’t recognize the similarities between Microsoft’s “embrace, extend, extinguish,” and Google’s current approach. Sunar Pichai could be lying to himself, or he could just be lying to us. Either way the particular psychology of Google executives doesn’t seem important when the effects are predictable.

                1. 2

                  Something the author doesn’t seem to understand is that google is trying to improve security for all users of Chrome. So it might be that no one ever gets man-in-the-middled on any of the author’s domains, but that doesn’t mean that no one will do it for latimes.com, which is served over unencrypted HTTP as of today. Chrome can guarantee you’re protected against certain classes of attacks with encrypted HTTP that it can’t with unencrypted HTTP, but if you’ve visited unencrypted websites, it can’t.

                  My suspicion is that someone at Google has a metric they’re trying to optimize for the proportion of traffic Chrome is delivering to users that they can prove has been encrypted. Unlike many metrics, this actually does seem to be one that’s good for all users. It’s certainly an inconvenience for website owners.

                  With that said, for personal websites, I agree with commenters who don’t think it’s that big of a deal, especially since Cloudflare will do it for you for free.

                  1. 2

                    I’m a big fan of having a template for commit messages for your open source project. As an example, finagle and netty both have templates, and it makes it much easier to understand the purpose of a commit, and then also how it achieved the purpose, which is what we typically want out of the commit message.

                    1. 16

                      The classic version of this is “How to ask questions the smart way”.

                      1. 10

                        Aside from being poorly written, this article tries to skewer event loops with the argument that CPU work will block your thread and prevent you from achieving high throughput. This doesn’t have to do with blocking vs non-blocking. Your CPU resources will be consumed regardless of the approach you take. The actual difference in throughput here is how many cores you can consume. Presumably if you were actually trying to achieve high throughput in production with node, you would have several node processes per machine and load balance your work across the different processes.

                        I’m not a huge proponent of node, but this article is not good.

                        1. 7

                          I helped out with the Twitter decision to vote no on JSR 376, and here’s what we said. The short version is that we felt like the JSR doesn’t have very much bang for the buck as it stands, although it’s an opportunity to tackle a real problem.

                          1. 5

                            Something a bit less obvious–you can write an async recursive loop with futures if you’re clever, but your future implementation needs to have built in support for it. In scala (using Twitter futures, although Scala futures support this too as of 2.10.3)

                            def loop(): Future[Unit] = Future.sleep(5.seconds).onSuccess { _ =>

                            This is quite tricky–the originally returned future is never satisfied, and this keeps on telescoping in forever, so if you’re not smart about this you’ll end up with a big space leak. If you’re curious, the scala promise implementation has a good explanation of how this works.

                            1. 17

                              I don’t think journalistic ethics have caught up with the ethics around doxing yet. The problem is that journalism tries to answer some basic questions, like “who, what, where, why, when, how” and historically, “who” has been a meatspace “who” because that was the only “who”. Now folks have persistent online identities, so it would be reasonable to refer to this guy just as MalwareTech and it would be fine. “Who” in this case doesn’t have to just be, “an anonymous person online” because MalwareTech is himself an identifiable person online, separate from what he does in meatspace.

                              Clearly, some kinds of doxing aren’t OK in journalism, like publishing someone’s address, telephone number, or social security number, but violations of privacy have always been somewhat fuzzy. Cf paparazzi, or revealing who Elena Ferrante was.

                              For what it’s worth, the SPJ code covers this kind of thing, but I suspect it will still take more time for journalists to get a good sense of how this works in the internet era.

                              1. 3

                                And now I was sitting here, slightly confused whether Simon Peyton-Jones gave an enlightning talk about online privacy that I missed before I followed the link …

                              1. 3

                                So eventually, after we platformize our hack and are comfortable from having run parallel infrastructures for some time, we’ll be handing off our DNS infra to the folks that probably know how to do it better than us.

                                So far I’m 2 for 2 on “companies I’ve heard of running their own complicated DNS set up, despite it not being a core part of their business” vs “companies who would have been far better off outsourcing their DNS.”

                                1. 1

                                  What does this look like when you’re in your own datacenter?

                                  1. 1

                                    One of:

                                    • You put the entirety of your zones on the external DNS service and you put only caching (if any) nameservers inside the DC.
                                    • You put the public-facing part of your zones on the external DNS service and you do split-horizon DNS to have an internal.example.com subdomain visible only inside your DC, inside which all your internal-only records go. You put at least one nameserver inside your DC which responds to requests from inside your DC only, believes itself to be authoritative for internal.example.com, and delegates all other requests.

                                    In both cases you get an API and UI for managing your publicly visible DNS entries because every worthwhile DNS provider does that.

                                    1. 1

                                      You can still outsource DNS in that scenario. Maybe it makes less sense though, but it’s equally possible as when you’re entirely cloud hosted.

                                    2. 1

                                      I think that an ideal state is “companies should only do the core part of their business”, but the reality is that “companies have to own whatever they need to own to ensure their customers can access their product”.

                                      If that means running your own code or your own DNS or your own fileserver then that’s what you gotta do. It’s obviously more expensive but some companies don’t have the luxury of saying (as I’ve heard many on hn say) “amazon is down lol that means the internet’s broken guess we can go to lunch until it’s working again”.

                                      1. 1

                                        This can’t possibly be true if “own” means “run themselves”. Every company that sells products using the internet needs, amongst many other things, DNS service. Proportionally very few of those companies are capable of running a DNS service with higher uptime than, say, Route 53.

                                    1. 2

                                      My main problem with commit messages in git is typically that I find them inscrutable after a few months, or if it’s on a piece of code I’m not familiar with. My team has adopted a strategy of ensuring that commit messages have a motivation for the commit and then explain how it fixes the problem, which I really like, and you can find here. It’s quite lightweight.

                                      1. 4

                                        I work on finagle, which underlies the technology that duolingo switched to (finatra) so I have a horse in this race, but I wanted to talk a little more about what you were saying about averages being useless.

                                        The tricky thing is that latency is something we get when we make a request and get a response. This means that latency is subject to all kinds of things that actually happen inside of a request. Typically requests are pretty similar, but sometimes they’re a little different–like the thread that’s supposed to epoll and find your request is busy with another request when you come in, so you wait a few extra microseconds before being helped, or you contend on a lock with someone else and that adds ten microseconds, and then all of those things add up to being your actual latency.

                                        In practice, this ends up meaning that your latency is subject to the whims of what happens inside of your application, which is probably obvious to you already. What’s interesting here is what kinds of things might happen in your application. Typically in garbage collected languages the most interesting thing is a garbage collection, but other things, like having to wait for a timer thread, might also have an effect. If 10% of your requests need to wait for a timer thread that ticks every 10ms, then they’ll create a uniform distribution from the normal request latency + [0, 10ms).

                                        This ends up meaning that when people talk about normal metrics being mostly useless for latency, this is usually because they mean the aggregate latency, which has samples which had to wait for the timer thread, and has samples which had to sit through a garbage collection, etc. However, it isn’t that the distribution they construct are particularly odd, but more that the distributions are composed of many other quite typical distributions. So there’s a normal distribution for the happy path, and then a normal distribution for the garbage collections, and a uniform distribution for when you were scheduled on the timer, and put all together they end up making a naive average difficult to interpret.

                                        But we can make an informed guess here, which is that probably the happy path is around 10ms now, and was probably around 750ms before, which is a quite nice improvement. As far as the unhappy path, my suspicion is that JVM gc pauses are better than Python gc pauses, but it’s quite difficult to tell for sure. My guess would be that their gc pauses are on the order of 100s of milliseconds, and were previously also on the order of 100s of milliseconds, so that the p9999 is probably still better than the p50 they saw previously.

                                        Anyway, this is just to say that averages are useless, but also that just knowing the p50 or p99 is also sort of useless. Really what I want to be able to see are the actual histograms. As a side note, finatra exports real histograms, so if you get ahold of one of the duolingo people, would be pretty interested to see some of those graphs.

                                        1. 2

                                          Agreed, a histogram – or any more details around performance – would have been useful. It’s unclear what they measured and what was sped up, so it’s hard to evaluate anyway.

                                          And this is the problem with precision, but not accuracy: if you’re telling me 750ms and 10ms, that send me a different signal than 750ms and 14ms. In fact, if I wasn’t going to dive deep into the perf aspects, I might have either dropped numbers altogether (and stated “more than an order of magnitude improvement”), or said “50 times faster”, and then I would’ve gotten the gist of the speedup (which seems awesome) without tripping over the concrete numbers (especially 14).

                                        1. 3

                                          I think this article is interesting, but unrelated to the core interests of lobsters, which are computer technology. I think folks can find these kinds of articles on other websites, and it would be best to keep this kind of stuff off lobsters to keep the signal to noise ratio high.

                                          1. 16

                                            I slightly disagree. It fits in well. UI design is a very important part of CS. I will caveat my comment with this probably should have the off-topic tag.

                                            Edit: I swear there used to be an off topic category.

                                            1. 19

                                              I think CS people tend to find these kinds of metro-layout type discussions interesting for the even more specific reason that they’re closely related to CS sub-areas like automated graph layout. In fact if automated graph layout worked ‘perfectly’ for some definition of perfectly, you would just use that to make metro maps.

                                              1. 4

                                                Indeed. The blog post from transit app, was an excellent read on this topic:


                                            2. 8

                                              I disagree completely. computer science is largely about representation and communication of data, and few sets of data affect more people than those related to transit systems. even if one finds the visual / graphic aspects uninteresting, the implicit analysis of the data set can inform all manner of algorithmic thinking.

                                            1. 22

                                              Are there any stripe folks on lobsters who know why stripe chose Consul as the service-discovery tool, instead of straight-up DNS or zookeeper? b0rk phrases it as “effective and practical” but hashicorp’s first commit to Consul was almost exactly three years ago, so if they’ve been using it since it came out. In comparison, kubernetes that she contrasted as “a new technology coming out”, had its first commit two and a half years ago, but it was already 250 files, so it was probably in development for at least half a year before that. I wonder if maybe the way Stripe talks about Consul has changed since they started using it–since they’ve used it for a couple of years, they think of it as battle-hardened, even though in the larger world of distributed systems it is not particularly broadly used. This might be true for Stripe, since they have already worn down the rough edges for their use case, but I don’t think if I was looking at choosing a service discovery system for a company that I would consider choosing Consul anti-flashy.

                                              One thing that worried me about Consul is exactly what Stripe ran into when running it, that it’s built on top of a framework for guaranteeing consistency. In practice, strong consistency might not be what you want out of a service discovery framework. It might be appropriate if you really don’t want your servers to ever accidentally talk to the wrong server, and you reuse names (or IP addresses, which is often what service discovery nodes point to), since after you remove a node from a cluster, you can be pretty certain that all clients will see it provided the service discovery cluster is talking to people. In the long run, a better solution to this problem than requiring strong consistency in your service discovery tool is requiring that clients and servers authenticate each other, so they agree that they’re talking to who they think they’re talking to. If I was picking a flashy new service discovery framework, I would probably look at eventually consistent tools like Netflix’s Eureka. If I was trying to do something battle-hardened, I would probably pick Zookeeper.

                                              Looking at Zookeeper naively, you might ask, “Why is this strongly consistent hierarchical file system the go-to default for service discovery?” One thing is that it was designed for generic configuration changes, so it receives updates via the “watches” API, and the “ephemeral znodes” API, which can be the fundamental building blocks of a service discovery tool. That’s the long and short of why people have used it for practically a decade.

                                              Other than that, zookeeper doesn’t have a lot that particular commends it. It does great throughput (for a strongly consistent store), and many people have used it for service discovery for a long time so you can be pretty confident in it. On the other hand, when it’s not confident it can make progress safely, it just doesn’t make progress–this can mean that new nodes can’t start up (because you can add them to a cluster) and that old nodes can’t be removed (because you can’t remove them from the cluster). Leader elections can also be pretty painful. Unfortunately, these are also problems that Consul faces, because it made the same choices about consistency that Zookeeper did.

                                              Now that they’re using DNS on top of Consul, they have two single points of failure. Although we treat DNS like air, and assume it’s an unlimited resource, DNS is still a name server, and it can still go down. With that said, DNS is really battle-hardened, so usually the problem comes when somebody poisons your DNS server somehow. This is problem is mitigated by being in an environment where experts run your DNS servers, but it can still be bad.

                                              The other thing is that network partitions are real, and you don’t necessarily want to take down your own website because your service discovery cluster can’t talk to every remote host. Just because your service discovery cluster is partitioned from them, doesn’t mean that they’re partitioned from each other! The nastiest problem then isn’t when Consul is down, but when Consul is up and is sending you garbage that makes you think everyone is down. An end solution ends up being to only trust Consul as a source for adding new information–your load balancer assumes that Consul tells you about new nodes, and ignores information about dead nodes until it can validate for itself that they’re dead.

                                              As b0rk mentioned, DNS can be pretty slow to update, which is usually the reason why people don’t want to use just DNS. If you’re happy with DNS’s propagation speed, it might make sense to cut the middleman and skip running your own service discovery tool. With that said, it can be a hassle to have to wait on the order of minutes for a service discovery update–in particular, it makes rolling restarts especially slow, since if you want to sequence rolling 5% of your cluster each time, you’ve added at least twenty minutes to your deploy. You can use blue/green deploys to make it easier to roll back, but as your cluster size grows, it becomes increasingly expensive.

                                              With all that said, I think this is a really cool story of taking a technology, getting it to work (upstreaming bugfixes! be still my heart), improving reliability by relaxing consistency, and simplifying healthchecking. Despite my skepticism of the newfangled hashicorp stuff, service discovery is a well-known danger area, so having zero incidents in a year is pretty dang good. I hope companies continue to put out blog posts like this one–the history and explanations of why decisions were made are great. Stripe does have an advantage since they’re employing b0rk though ;)

                                              1. 1

                                                Are these guys duplicating their own work? http://www.scala-native.org/

                                                Sorry, bad wording: are Scala guys duplicate the work done by Java guys?

                                                1. 3

                                                  I think scala-native doesn’t target the JVM. AOT compilation allows a mixed-mode style, so that it’s still on the JVM but some libraries are precompiled to native code. Note that there’s even a style of running that still allows JIT-ing of AOT-ed code.

                                                  For what it’s worth, the scala-native people aren’t coming up with something that has never been done before. There are existing tools to compile java to native code, most notably gcj. I think what’s exciting about scala-native are the proposed extensions to scala which allow you to control your memory in a much finer-grained way, and the improved FFI.

                                                  1. 1

                                                    Is it the same guys?

                                                    I would expect scala-native to support the native ABI (or at least lightweight bindings to it), whereas that doesn’t seem to be a goal for this project. Whether that’s actually an important use case (and/or worth the cost of working without support for anything written in Java) is an open question.

                                                    1. 1

                                                      More like Oracle trying to badly duplicate what already works in Scala.

                                                      Will be Scala.js vs. GWT all over again: Two implementations, one works, one doesn’t.

                                                      (I expect that Java-AOT like GWT will not even try to have any sensible kind of interop/integration into JS/native, making them foreign objects in the respective place. Java-AOT will likely be some shrink-wrapped JVM+app code thing.)

                                                    1. 1

                                                      This will be really useful when it comes out. From the perspective of someone who helps write (gently) latency-sensitive systems, we expect that we have to warm up all JVM-based services in order to help with many things. Of the top of my head, hydrating caches, resizing socket buffers, JIT-ing, getting GC heuristics going, connection establishment, and ensuring lazily evaluated code is already run. All of these have been tunable or fixable, except for JIT-ing, which absolutely must happen, and for which there is no other way of doing other than exercising the code paths. This change will allow us to consider a brave new world where we don’t need to figure out how to coordinate warm up requests for all of our applications. This will be especially useful for applications with a broad workload, where it’s a hassle to figure out how to warm up every workload, and for cases where it’s difficult or impossible to send synthetic traffic that doesn’t mutate a persistent store.

                                                      I’m pretty excited for JDK9. It has been a long time coming, but it looks like there will be some really exciting goodies in there.

                                                      1. 6

                                                        These articles are interesting, but they often gloss over what is the most interesting part to me, the decision to go from something off the shelf to something home-grown. Why doesn’t HAProxy scale horizontally? Why not something like nginx? Are there hardware load balancers that could have helped here? What’s the expected ROI of scaling load balancing horizontally instead of vertically? What are other folks doing in the industry?

                                                        1. 2

                                                          I’ve noticed an interesting desire amongst engineers to save money and use open source, ignoring the cost of their effort.

                                                          A site we run at $work also had performance issues with our HA Proxies. We simply replaced them with commercial software LBs and went on to other things. We didn’t blog about it because the project took a month or so and wasn’t very glamorous.

                                                          1. 5

                                                            I don’t think it’s necessarily a desire to save money, it’s a desire to use software you can understand, modify and enhance as needed. I’m guessing the commercial load balancer you’re using is pretty much a black box - if you have problems you’re going to have to rely on vendor support ([insert horror story here]) instead of being able to fix it yourself. Troubleshooting is a helluva lot easier if you have source code…

                                                            Yes, going with a commercial product is better in a lot of cases, but there are always trade-offs.

                                                            1. 4

                                                              Agreed - there’s always the risk of bugs and black boxes. On that topic, the question is if the velocity you gain is worth it? After all - many are comfortable to run on EC2 with ELBs, despite both of them being very opaque.

                                                              Bug wise, I can only talk about my experience; we’ve had no major implementation bugs and the experience has been very smooth. We have been running these devices for several years.

                                                              This of course could change but as a reference point, I also have a cluster of several hundred Nginx proxies which very work well, but we’ve had some showstopper bugs over the years. At those times, having the ability to dive into the code has not been helpful due to the complexity of the product and the fact that these bugs happen infrequently enough that we don’t have an nginx code internals expert on staff. Sure we can read/fix the code, but the MTTR is still high.

                                                              In GHs case, they now need to maintain at least 1 or 2 members of staff full time on this codebase otherwise their knowledge will begin to degrade. The bus factor is high.

                                                              For future features, they can at best have a small team working on this problem without it becoming a distraction for a company their size. I do see they plan to open source this, which may reduce the size of that issue, assuming the project gets traction.

                                                              In my case, I pay a vendor for a team of hundreds of specialists working full time on this problem. We have gained many features for “free” over the past years.

                                                              In terms of debugging - the inspection capabilities on the software we chose have been unmatched by anything else I’ve used. We can do realtime deep inspection of requests. This isn’t anywhere near the blackboxyness of ELBs which most are comfortable to use.

                                                              For control, the cluster has a very complete REST API and to assist teams, somebody wrote a Terraform provider in their 20% time.

                                                              We run the devices in a service provider model, meaning we have a centrally managed hardware platform and then we do containerised virtual loadbalancer deploys so that teams who have unique use cases can get their own instances. The devices are integrated into our BGP routing mesh and so traffic is seamlessly directed to where it should be. This is all vendor supported.

                                                              ITO traffic, we do tens of gigs over millions of connnections at peak. We use many protocols - not just HTTP and a good portion of our requests are short lived.

                                                              As you might infer, I’m very happy with my choice :)

                                                        1. 2

                                                          I haven’t ever really thought of the distinction, but looking over the list of usages, I think that tuples feel the most pythonic in situations where the arity is fixed. Overloading tuple as “read-only list” is of course possible but feels like a waste of tuple as vocabulary.

                                                          It’s kinda weird that this post starts off as pointing to the “read-only list” as a bad explanation, but then basically recommends using tuples as read-only lists.

                                                          1. 1

                                                            I think that the author is pointing out tweets that he disagrees with, not saying that “read-only list” is a bad explanation.


                                                            Now to start off, I want to say that I respect the hell out of David Beazley. The guy literally wrote the book on Python, and he knows way more about Python than I ever will. He’s also one of the most entertaining Python people you can follow on Twitter. But hey, that doesn’t mean I can’t disagree sometimes.

                                                          1. 1

                                                            Could we make the science tag work the way the pdf tag works, where you also need another tag to submit it? That way we could keep the “computer-y” science and keep filtering / searching for it neatly, but stuff that’s just science can see itself out?