1. 11

  2. 4

    A well-behaved HTTP server would see that it received a request specifying a Content-Length, and an incomplete or non-existent body, and reject this request. etcd is written in the Go programming language, and uses the Go standard library net/http HTTP request handlers for their v2 keys API. In order to parse the form body sent from clients, it uses the net.http/Request.ParseForm()method. This method does not check to see if the request body’s length matches the length that was specified in the Content-Length header.

    Ouch, that is a nasty surprise out of a well regarded standard library.

    1. 2

      Is this interpretation true though? According to the spec: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

      When a Content-Length is given in a message where a message-body is allowed, its field value MUST exactly match the number of OCTETs in the message-body. HTTP/1.1 user agents MUST notify the user when an invalid length is received and detected.

      It sounds like this Content-Length check is meant to be done only by HTTP clients, not by servers. As per https://www.w3.org/Protocols/rfc2616/rfc2616-sec1.html#sec1.3

      user agent: The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools.

      1. 4

        That’s two separate sentences. All messages (request or response) with a body, when not using chunked encoding, must have a correct content length. The second sentence says, in addition, that a user agent must report the error if it happens in a message from the server.

        1. 1

          Got it, thank you. Looking into it more, apparently the server just needs to close the connection (optionally with an error response) if the content-length doesn’t match the body length: https://tools.ietf.org/html/rfc7230#section-3.4

          A server that receives an incomplete request message, usually due to a canceled request or a triggered timeout exception, MAY send an error response prior to closing the connection.

          It seems that NodeJS has the issue inherently too: https://github.com/nodejs/node/issues/17978

          Looks like the only practical way to handle it is to timeout and close the connection while waiting to read chunks of the body from the request.

    2. 3

      I appreciate them publishing the report, which I think more companies should do. There’s plenty to learn from them, and at least I personally feel less like “oh well, it went down again” when I actually get to know of what happened.

      1. 1

        Not only is there quite a bit to learn from this, but it also puts the engineering team in a good light for reacting quickly, correctly and with a detailed analysis of the incident.

      2. 3

        Beyond Go’s server spec violation, one of the main factor for the long outage seems to be that clients didn’t fail gracefully when deserializing JSON, which is a major oversight in client-server testing. Not sure it’s fair to blame Go standard library when clients blindly trust any input/response to be valid.

        1. 2

          Indeed, I was thinking that from a certain angle this could be seen as a type error.

          1. 1

            Assuming sufficiently powerful types, everything is a type error.