I’m not sure I buy the argument against error logging. In a service, where one bad client should not stop the whole thing but errors might happen that halt that users sessions, why is it inappropriate to log something at an error level? I cannot pass it up and I cannot handle it. Something terrible has happened. Let’s say I only have “info” log level. I feel confident that most developers will add some sort of special text to the log message, for exampe “ERROR”, in order to make it grep-friendly when reviewing logs.
I feel confident that most developers will add some sort of special text to the log message, for exampe “ERROR”, in order to make it grep-friendly when reviewing logs.
Indeed. I get the annoyance from the OP, but frankly, those extra levels tend to come out over time because of maintenance requirements. When developing, it’s easy to only see two levels: debug stuff for me (the dev) and still other stuff for me that maybe someone else would find useful (let’s call it “info”).
Over time, though, in operations, some information is simply more important. Someone made a connection? Great, useful info for metrics. Someone lost a connection? Well, maybe I need to pay attention to that. Wait, how do I check for that again? Oh yeah, look for this regexp. Huh, I want to be notified of these sorts of things when they happen. But now that means some giant regexp to catch all the cases. Gee, a specific prefix on the log messages would help here…
Someone made a connection? Great, useful info for metrics. Someone lost a connection? Well, maybe I need to pay attention to that.
You should be instrumenting your code for normal events like this, they should not bubble their way into logs. Then you build dashboards that aggregate e.g. connection and connection-failure rates, parameterized on error codes or reasons.
Another way of saying Dave’s point, I think, is that logs should be used for actionable information. Actionable either by the user/developer tailing the logs, or by the machine that will parse them. Nothing else belongs there.
What is instrumentation in this case but logging? Where the GP talked about text logs, logging things into other data stores follows the same principle. With a data store keeping all the error codes, you basically have an n-level log.
I don’t really understand your differentiation between logs and instrumentation. How are logs not instrumentation? I wish logs were performant enough that all logs/metrics/etc went through them so I could just consume them in something like Splunk and put it all in one place.
I liked the post even though I have no experience with Go (other than the occasional reading of a post like this), so take the following comment lightly.
If find a lot of statements that could sound “wrong” to a sysadmin, but I can see the logic behind them as a developer.
For instance statements like these
Nobody reads warnings, because by definition nothing went wrong. Maybe something might go wrong in the future,
but that sounds like someone else’s problem
As a developer a “warning” usually means that my app will probably continue working. However as a sysadmin, i’m the guy who takes care of these warnings, and have to make sure these don’t happen.
I cant say with confident that i grasped what the “Lets talk about fatal & error” sections where talking about, as they seem to be specific to the internals of Go.
However, i’d add one more item to the list of things you should log
3. Things that system administrators and the guys who actually RUN your software on their servers care about.
I think one casualty of the DevOps era is that many teams don’t appreciate what has to go into a service to make it operatable. For many teams it probably is only a big pain point once or twice a year, so maybe it’s fine, but I have seen a tendency for teams to think “we’re the developers, we’re the operators, we know what’s going on in the service so we don’t need lots of metrics or logging like operators would need”. Unfortunately, many people overestimate their ability to debug a complex system without help. I’ve seen several devopsy talks about how to do it and they often bring up “lots of metric and logging” as it’s a new idea. Now you have Docker being a rage, where everything is even more blackboxed.
Many people are doing it well and figuring these things out, etc. But I think there is a loud (hopefully) minority that might be good developers but have no idea about operations and are giving other people advice.
Regressing Go libraries towards 1990 until they suffer from the same impedance mismatch with large, complex use cases that the core language does is perhaps not the solution I would have gone for, but you have to admire the consistency of the ideology. <halfjoke>
I think we can all agree that despite your approach to logging, serializing objects to the log as a part of normal operation is literally the worst thing.
At a previous gig, the devs were fond of doing this. On the operations team, we had to “matrix” the logs: just sort of defocus your eyes and let the logs stream by. It was distressing how well it worked. When the developers wanted to centralize logs and lots of syslog messages were getting dropped, someone did some analysis and found that lines over 14MB long we’re unheard of, and over 5MB weren’t uncommon.
Every time you serialize an object to the log, Yahweh kills a kitten.