Great post, sums up most of the arguments about structured logging and best practices, although mostly for central log collection.
Don’t let the first paragraph scare you away from this post (systemd and journal is mentioned) its well worth reading.
Well, it was a bit of a weird post too. It seemed the author was arguing for structured logging (which is great), but also oddly arguing for logging in a binary format – but the conclusion seemed to be that you should store your logs in an log-processing/introspection-engine which may itself use a binary format (like logstash, splunk, whatever). Eh? Well sure.
I think when most people talk about “you should store logs in text format”, they are talking about archival logging. I don’t consider introspection engines as really being archival. Just have them ingest the log.
I am ALL FOR structured logging, and/or logging in some parseable text format (json, netstrings). I am “ok” with logging in well understood/popular binary formats (like protobufs, thrift, capnproto), as long as the schema is usually stored close by and clearly identified. Less optimal but maybe you really want to save time on future ingestion if you do it often. However, I am generally not in favor of bespoke binary formats for archival logging.
Text has a nice property of being relatively readable and easily discernible for future developers, perhaps even if the original system that used it is long gone. It might not be pretty, and the format may change over time, but still the barrier is relatively low.
Custom binary formats on the other hand, may have to be reverse engineered (which can be a fun process in its own right I suppose) before even determining if the data is relevant or useful, and there may also be many different versions/revisions/etc.
I have seen big systems crash and burn hard, and being able to replay easily understood archival data saved the day. If your data isn’t that important, or is only transient, then it doesn’t really matter much either way I guess.
I (author here) wouldn’t recommend neither a proprietary logging format, nor something which isn’t well understood and documented. Custom formats can be incredibly useful, if used well. Having the tools and documentation about the format is essential nevertheless, be that any kind of storage format.
Thanks for clarifying! I guess in the end we don’t really disagree after all then. :)
Another common problem with text logs is that the usefulness of grepping/awking/whatever them goes way down if they don’t have a clean one-line-per-record format. For example, Postfix’s log of delivery attempts is spread across multiple lines (possibly interleaved), which you have to reconstruct if you want to get all the information about the delivery.
Since I don’t want to submit another - closely related - story, I’ll mention it here that based on some of the feedback I received from here and elsewhere, I wrote a followup to clarify a thing or two: http://asylum.madhouse-project.org/blog/2015/05/07/grepping-logs-is-still-terrible/
It has some regexp love and a simple example (that still satisfies my small system requirements).