1. 3

    This is a description of production search engine of Bing.

    1. 6

      I’m an author on this paper. 2 things:

      1. A copy-paste-with-some-cleanup version of the code that serves production traffic in Bing is available here: https://github.com/BitFunnel/BitFunnel This was mostly done by Dan and Mike.
      2. Bing is actually multiple search indexes. So, BitFunnel serves every production query, but it “only” maintains the “SuperFresh” index, which is to say, the index of documents that need to be updated really frequently.
    1. 7

      I find it very odd he’s completely chucking out PowerShell. PowerShell, both as a scripting language and as an interactive shell, is actually one of the best environments I’ve ever used. It’s definitely not perfect, but it honestly gets a lot of things right. The trivial extensibility from .NET, the entire remoting/workflow system, things like Out-GridView, the Interactive Scripting Environment (ISE) for writing scripts…seriously, they really got a hell of a lot of things right.

      I’m really excited to have bash on Windows because it means that bash is now the lowest common denominator for a quick script (v. writing separate PowerShell/batch and bash scripts), but if you’re just talking about day-to-day usability, I don’t actually think bash helps a ton.

      1. 9

        Author here.

        I’m not sure what you mean when you say I’m completely chucking out PowerShell. I called it “the cure for polio” and Jeff Snover “the Jonas Salk of the Windows ecosystem”. To be completely honest, I feel like I was a little hard on Bash if anything.

        1. 5

          I think I took the first at least really differently; in context, I took it to mean “a great cure for something else” (i.e., fixing the wrong problem). I’ve heard a lot of devs say that (“it’s a better WSH, but we didn’t need a better WSH”, for example, or “It’s a better shell, but the console subsystem is still crap”, and so on), so maybe that’s where my head was at. The post being called “the Windows command line,” and not “cmd.exe”, seemed to cement that.

          At any rate, I wasn’t trying to mischaracterize your writing. People always glom onto random parts of my posts, extracting a meaning I not only didn’t intend but actively disagree with. Sorry I was the one doing it here.

        2. 1

          I don’t think he’s throwing it out per se, I just think he’s not really talking about it (and using a slightly click-bait-y headline.) I mean at the end he says your choices are batch, bash (now) and powershell. Fairly clearly only one of those is not the right choice anymore.

        1. 4

          OSv is a new kernel written in C++ with Linux compatibility. They claim 2x throughput for unmodified Redis.

          http://osv.io/benchmarks/

          1. 2

            Yes, but the point is that we want to lower latency, not throughput.

            1. 3

              I think 2x throughput will also mean 0.5x latency in this case.

            2. 1

              Oof. Looks like that OS doesn’t support users, which might be at least moderately reasonable. But it might not support processes, which might make certain servers difficult to handle. Maybe it silently translates all forks into thread spawns?

              Anyway, interesting idea, but there are some tradeoffs I’m not yet comfortable with.

            1. 4

              Very interesting, thanks for posting this.

              For someone who hasn’t had the chance to read through all the documentation (yet), what are the main ways Bond differs from Protocol Buffers?

              1. 9

                Hey, OP here.

                The current offerings (Thrift, ProtoBuffs, Avro, etc.) tend to have similar opinions about things like schema versioning, and very different opinions about things like wire format, protocol, performance tradeoffs, etc. Bond is essentially a serialization framework that keeps the schema logic stuff the same, but making the tasks like wire format, protocol, etc., highly customizable and pluggable. The idea being that instead of deciding ProtoBuffs isn’t right for you, and tearing it down and starting Thrift from scratch, you just change the parts that you don’t like, but keep the underlying schema logic the same.

                In theory, this means one team can hand another team a Bond schema, and if they don’t like how it’s serialized, fine, just change the protocol, but the schema doesn’t need to.

                The way this works, roughly, is as follows. For most serialization systems, the workflow is: (1) you declare a schema, and (2) they generate a bunch of files with source code to de/serialize data, which you can add to a project and compile into programs that need to call functions that serialize and deserialize data.

                In Bond, you (1) declare a schema, and then (2) instead of generating source files, Bond will generate a de/serializer using the metaprogramming facilities of your chosen language. So customizing your serializer is a matter of using the Bond metaprogramming APIs change the de/serializer you’re generating.

              1. 9

                The short, unhelpful answer is that it depends on what you’re doing.

                The somewhat longer, more useful answer is that there’s no good way to do this, so you should do it only a couple of times if you can. Concretely, in the OSS web infrastructure world (which is where I am from), you will typically pick a small set of very flexible infrastructure projects that you know really well, and deploy them everywhere, for as many things as you can. It’s easier to locate errors. It’s easier to deploy. It’s simpler to reason about the infrastructure.

                A concrete example is, if you have a really kickass Hadoop team, then it’s worth it to phrase your problems as MapReduce jobs if you can, even if it’s a slight abuse of Hadoop, because then you can just farm it out to your Hadoop cluster, and your problem is solved incidentally by your Hadoop team. Same goes for Redis, Riak, RabbitMQ, whatever.

                Another thing to consider is that, in most cases, your team’s competence will limit you much sooner than your stack will. This is another reason to make big infrastructure choices as little as possible: it lets you deal primarily with one issue (your teams competence) rather than two issues (competence AND crazy stack that you don’t understand).

                1. 4

                  Reminds me that a coworker at fog creek fixed a bug using reflection once, which is perhaps a shade simpler than rebuilding the whole dll. We reported a bug to MS, they basically sent us here and said the referenced reflection fix was as good we would get for our version of .NET since they only backport the super serious fixes.

                  1. 6

                    Author here.

                    One of the advantages of working at MS (btw, I work at MS) is that I can just bother my friends until they fix it, and then grab the newest version on the release branch!

                    Or that’s how it would work if I was not an incredibly impatient man.

                  1. 4

                    The abstracts are here[1] – it’s sort of hard to tell which talks you’ll like a priori without them.

                    [1] http://bangbangcon.com/speakers.html

                    1. 1

                      Continuation-passing style is a powerful and mind-warping technique that lets code play with its own control-flow (its “future”, so to speak). For example, it lets you elegantly express backtracking search algorithms such as regular expression matching. This curious technique also has deep connections to topics as diverse as compiler optimization, programming language design, and classical versus constructive logic.

                      I’ve been interested in this for a while but didn’t know the name for it. Thanks.