A well-behaved HTTP server would see that it received a request specifying a Content-Length, and an incomplete or non-existent body, and reject this request. etcd is written in the Go programming language, and uses the Go standard library net/http HTTP request handlers for their v2 keys API. In order to parse the form body sent from clients, it uses the net.http/Request.ParseForm()method. This method does not check to see if the request body’s length matches the length that was specified in the Content-Length header.
Ouch, that is a nasty surprise out of a well regarded standard library.
When a Content-Length is given in a message where a message-body is allowed, its field value MUST exactly match the number of OCTETs in the message-body. HTTP/1.1 user agents MUST notify the user when an invalid length is received and detected.
That’s two separate sentences. All messages (request or response) with a body, when not using chunked encoding, must have a correct content length. The second sentence says, in addition, that a user agent must report the error if it happens in a message from the server.
Got it, thank you. Looking into it more, apparently the server just needs to close the connection (optionally with an error response) if the content-length doesn’t match the body length: https://tools.ietf.org/html/rfc7230#section-3.4
A server that receives an incomplete request message, usually due to a canceled request or a triggered timeout exception, MAY send an error response prior to closing the connection.
I appreciate them publishing the report, which I think more companies should do. There’s plenty to learn from them, and at least I personally feel less like “oh well, it went down again” when I actually get to know of what happened.
Not only is there quite a bit to learn from this, but it also puts the engineering team in a good light for reacting quickly, correctly and with a detailed analysis of the incident.
Beyond Go’s server spec violation, one of the main factor for the long outage seems to be that clients didn’t fail gracefully when deserializing JSON, which is a major oversight in client-server testing. Not sure it’s fair to blame Go standard library when clients blindly trust any input/response to be valid.
Ouch, that is a nasty surprise out of a well regarded standard library.
Is this interpretation true though? According to the spec: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4
It sounds like this Content-Length check is meant to be done only by HTTP clients, not by servers. As per https://www.w3.org/Protocols/rfc2616/rfc2616-sec1.html#sec1.3
That’s two separate sentences. All messages (request or response) with a body, when not using chunked encoding, must have a correct content length. The second sentence says, in addition, that a user agent must report the error if it happens in a message from the server.
Got it, thank you. Looking into it more, apparently the server just needs to close the connection (optionally with an error response) if the content-length doesn’t match the body length: https://tools.ietf.org/html/rfc7230#section-3.4
It seems that NodeJS has the issue inherently too: https://github.com/nodejs/node/issues/17978
Looks like the only practical way to handle it is to timeout and close the connection while waiting to read chunks of the body from the request.
I appreciate them publishing the report, which I think more companies should do. There’s plenty to learn from them, and at least I personally feel less like “oh well, it went down again” when I actually get to know of what happened.
Not only is there quite a bit to learn from this, but it also puts the engineering team in a good light for reacting quickly, correctly and with a detailed analysis of the incident.
Beyond Go’s server spec violation, one of the main factor for the long outage seems to be that clients didn’t fail gracefully when deserializing JSON, which is a major oversight in client-server testing. Not sure it’s fair to blame Go standard library when clients blindly trust any input/response to be valid.
Indeed, I was thinking that from a certain angle this could be seen as a type error.
Assuming sufficiently powerful types, everything is a type error.