Author spends a lot of time talking about wire protocols but maybe don’t get why certain ones are there. My memory problem might affect me here so people jump in if you see an inaccuracy. But…
“our favourite, stateless resource transfer protocol HTTP. It’s got a lot going for it: it’s simple, stable, reliable, extensible.”
HTTP wasn’t everyone’s favorite. People early on were building WAN protocols to attempt to do all these things we bolted onto, under, and around HTTP to make up for it being too simple. HTTP took off due to Worse is Better effect of being a simple-to-code solution that worked well enough for the old use case. It’s now a legacy tech we’re locked into that we’ve been bandaging ever since. It has been replaced where possible in client-server apps and internal networks to achieve goals HTTP itself won’t deliver on.
“In fact, it’s so versatile that it’s quite often used as the basis for completely different protocols. “
This again had nothing to do with superior design. The market dominance of HTTP meant there were HTTP clients and servers everywhere. More important I was told was the fact that there were firewall ports configured for it with easier policies than say email. It was simply easier to setup a HTTP-based protocol vs a custom one. So, all kinds of kludgy, inefficient protocols start getting built to latch onto the HTTP ecosystem’s benefits. So, once again, someone might want to change the wire protocol to be more effective in a modern environment. And as I said, people have been doing that.
Back when looking into this stuff, Active Messages, UDT, and Tsunami were my favorite replacement protocols for various things. Now, we have Protocol Buffers, ZeroMQ, and Cap n Proto that make it easy to customize the transport part to the application. Whether FastCGI’s wire protocol is good is another question. I don’t think the author really understands the context in which those solutions were made and didn’t weigh in better stuff like I just mentioned. This post would’ve been better if it arrived in the late 90’s.
When writing kfcgi (process manager for the FastCGI implementation in kcgi), I also came up against the bottleneck of managing worker saturation. Short story: FastCGI is built to let a bunch of worker processes accept on a socket nice and neat. That’s great for a fixed pool of processes. But what if we have more load? At some point the socket backlog will fill and connections will fail. And what happens if we have too many processes? They just sit.
One solution is to make the web server handle load management. But then we’ve just moved the problem into the web server. So it’s delegated to the process manager. But that means the process manager will need to do the accepts, which basically involves yet another brokering of I/O between the manager and the workers.
I know php-fpm has a way of handling “ondemand” pools, but it doesn’t say how. I ended up put together my own FastCGI Extension for Management Control to deal with this in passing file descriptors instead of I/O streams. I prefer plain CGI.
Why can’t you socket, bind, listen, dup2 the listening socket to fd 3, fork and exec your workers, and accept on fd 3?
I agree with the general idea that fastcgi is just HTTP in different clothes. But my shared web host (dream host) allows running fastcgi processes but not HTTP processes. I thought that is because fastcgi has a protocol for process lifecycle management. Like pooling/starting/killing processes.
If not, it should. But you can do the same thing with HTTP of course. I think Heroku has some good guidelines on HTTP servers that can be run robustly. Basically they have to be really stateless so they can be killed/restarted/replicated at any time. The persistent process is purely an optimization and the semantics are as if there is a new process created every time, like CGI.
So, is FastCGI just a means of ensuring that someone doesn’t expose an incomplete/brittle HTTP implementation needlessly?
FastCGI was from an era when putting a scalable HTTP server in your process was super expensive. This is still (largely) the case for some languages, e.g. Ruby and Python, which is why things like rack, and wsgi exist. Those solutions are just specialized, Lang specific implementations of the FastCGI ideals, really.
So, the history here, briefly:
(I’m on my phone, so some liberties taken)
PHP over FastCGI rises (for some reason?)
PHP over FastCGI rises (for some reason?)
I’m now using Apache + PHP-FPM.
Yeah, I just lost track of PHP after 2008. I expected the pressure of other servers. But, I also suspect the language’s evolution from “I’m a templating language” to “I’m a scripting language with classes, and closures” has a lot to do with it, too.
FastCGI and other similar protocols also have the advantage that they don’t muddle or confuse HTTP headers and so on from the forwarding with HTTP headers from the original request (because the forwarding is handled through a completely different protocol). Given FastCGI or SCGI, it’s very easy for a HTTP app to be given the complete original headers and to act as if it’s directly in the context of the original HTTP server; this is not so true for HTTP forwarding, where you’re going to be stitching up various things at some level (either the app or in your reverse proxy server).
Yeah, this was my favorite feature when I used it — that and the fact that you had a stream to send messages back to the webserver’s error log.