A frequent problem with goroutines in long-running applications is handling panics. A goroutine spawned without a panic handler will crash the whole process on panic. This is usually undesirable.
I would argue that in a long running server, by default you do want a panic to crash the server in order to discover bugs that need to be fixed.
I would argue that in a long running server, by default you do want a panic to crash the server in order to discover bugs that need to be fixed.
This is a good take, but it does depend on a few things. Most production codebases I’ve worked on will have (or I’ve set up) some kind of log aggregator that picks up panics as red alerts the difference between panics and logs marked as “error” is a minimal detail of the log system’s configuration. A lot of the time, code is built with the assumption that things will crash (or at least, it should be) but it would be nice to avoid crashing if at all possible. If the only reason for crashing is to trigger alerts, you can do that with standard logging practices too and do just as well without a full process restart.
Sounds like a reasonable position. I would add one caveat, and that is that you don’t want to give an attacker an easy and performant way to learn about your system/environment until it has found a successful way to break in. I.e. learning about certain addresses in memory or paths in the filesystem. Configuring a delay in the order of seconds before service restart might strike the right balance between not being compromised and being available.
Some context on why it is sometimes useful to recover panics in spawned goroutines, which the conc.WaitGroup and conc.PanicCatcher types assist with. This is particularly relevant when writing servers with package net/http.
Note that the documentation for the http.Handler type says:
If ServeHTTP panics, the server (the caller of ServeHTTP) assumes that the
effect of the panic was isolated to the active request. It recovers the
panic, logs a stack trace to the server error log, and either closes the
network connection or sends an HTTP/2 RST_STREAM […]
As one might expect from the documentation, in a handler like below, the net/http stack will recover the panic, and the whole process won’t crash:
On the other hand, and perhaps a bit unexpectedly, in a handler like below, the whole process will crash. This is because there is no way for the net/http stack to recover the panic of a different goroutine (i.e. the one spawned from inside the handler).
func handler(w http.ResponseWriter, r *http.Request) { go boom() }
In this situation, if instead of:
go boom()
one were to do:
var p conc.PanicCatcher
go p.Try(boom)
then you avoid the crash here, as p.Try will recover panics in the function supplied to it.
Instead of using p.Try, one could also recover the panic themselves like so:
go func() {
defer func() {
if r := recover(); r != nil {
...
}
}()
boom()
}()
cool project!
I would argue that in a long running server, by default you do want a panic to crash the server in order to discover bugs that need to be fixed.
This is a good take, but it does depend on a few things. Most production codebases I’ve worked on will have (or I’ve set up) some kind of log aggregator that picks up panics as red alerts the difference between panics and logs marked as “error” is a minimal detail of the log system’s configuration. A lot of the time, code is built with the assumption that things will crash (or at least, it should be) but it would be nice to avoid crashing if at all possible. If the only reason for crashing is to trigger alerts, you can do that with standard logging practices too and do just as well without a full process restart.
Sounds like a reasonable position. I would add one caveat, and that is that you don’t want to give an attacker an easy and performant way to learn about your system/environment until it has found a successful way to break in. I.e. learning about certain addresses in memory or paths in the filesystem. Configuring a delay in the order of seconds before service restart might strike the right balance between not being compromised and being available.
Some context on why it is sometimes useful to recover panics in spawned goroutines, which the
conc.WaitGroup
andconc.PanicCatcher
types assist with. This is particularly relevant when writing servers with package net/http.Note that the documentation for the
http.Handler
type says:As one might expect from the documentation, in a handler like below, the net/http stack will recover the panic, and the whole process won’t crash:
On the other hand, and perhaps a bit unexpectedly, in a handler like below, the whole process will crash. This is because there is no way for the net/http stack to recover the panic of a different goroutine (i.e. the one spawned from inside the handler).
In this situation, if instead of:
one were to do:
then you avoid the crash here, as
p.Try
will recover panics in the function supplied to it.Instead of using
p.Try
, one could also recover the panic themselves like so: