logfmt is great! Splunk has amazing searching for logfmt values too. You can get the request ID from a customer’s issues and then instantly go around and search for it, seeing the entire distribution of everything across the entire cluster. It’s great to see more people discover it.
@apg I made a simple logging package in Go called ln for doing this with full context-awareness. You can stuff key->value pairs in contexts as well as make any other data type feed out key->value pairs. The heart of this library is my favorite Go function I’ve ever written:
Here’s an analogy. Let’s say you have an interface for temperature. The interface specifies a getCelsius() method. Let’s say you have a temperature in celsius. How do you implement the interface? By just returning yourself.
Of course, the whole reason the interface was called F is because logrus used Fields which was overly verbose. I wanted something easy to type. It had an interesting property, though, that you’re often logging because of an F’in failure, so it was perfect.
After trying logfmt at my current job, we actually ended up giving up and moving back to json logging. Writing logfmt logs is supported fairly well, but parsing it is not. When we tried a year ago SumoLogic doesn’t have a good way of parsing logs formatted like this and it’s particularly challenging to approximate something using a regex because fields can be quoted or not (and quotes can be escaped).
On the other hand, json works out of the box and is fairly standard. We lose some of the human readability, but as we’re using python it’s fairly trivial to simply use a different log formatter.
That’s a bit sad, it seems to me it’d be quite easy to write a parser for this format by hand. Not with a single regex, but really, the delimiters are =, , and ", right? Tokenizing and unquoting strings doesn’t seem infeasible. Of course if all you have is a custom regex in a tool, yes, that’s a pity.
It definitely would be fairly easy to write a parser by hand. Unfortunately (and with good reason) you can’t do this with most hosted logging platforms. So we went with the least common denominator so it should work everywhere. I generally prefer interoperability anyway.
We decided to embrace our inner neckbeard and claim that syslog has worked for for that last 40 years, so it’s probably ok for another 10. In a nutshell we add JSON structure to our logging by just adding it at the end of the log text. Here’s an example of how that might look:
The first level of keys in the JSON object is something we internally call “well known” fields recordingId, userId and sessionId are al indexed in our log db. data is a free form field, also somewhat searchable, not indexed, but it can have any shape that is useful to the event.
The advantage of this approach is that any “standard” of the shelf software that talks syslog, can be slotted into our logging infra without translation.
We do support apiKey and some other “tags”, but this is not really security, but rather a mechanism we could use in case of DDOS attacks or similar.
AWS kinesis means we can spike in input without dropping anything. The log ingester parses and prepares stuff for the indexing in logdb.
Then we have bespoke UI and CLI tools to query this in realtime. I can just write lq -B10m -a myapp -f to get the last 10m of input to myapp and “follow”. Seriously. After a long spell of using external log collection providers, having bespoke logging funnel and tools tailored to your needs is amazing.
logfmt is great! Splunk has amazing searching for logfmt values too. You can get the request ID from a customer’s issues and then instantly go around and search for it, seeing the entire distribution of everything across the entire cluster. It’s great to see more people discover it.
@apg I made a simple logging package in Go called ln for doing this with full context-awareness. You can stuff key->value pairs in contexts as well as make any other data type feed out key->value pairs. The heart of this library is my favorite Go function I’ve ever written:
Can you explain that function? I’m not sure how to read it.
It’s an identity method, kind of like
in Java (if that’s even valid Java), unless I’m missing something.
But it takes no parameters somehow? Just type parameters? Or am I still misreading the Go?
A method in Go “implicitly”/“forcefully” takes a parameter (the
(f F)
), which is like thethis
in Java.Here’s an analogy. Let’s say you have an interface for temperature. The interface specifies a getCelsius() method. Let’s say you have a temperature in celsius. How do you implement the interface? By just returning yourself.
Of course, the whole reason the interface was called
F
is because logrus usedFields
which was overly verbose. I wanted something easy to type. It had an interesting property, though, that you’re often logging because of an F’in failure, so it was perfect.Isn’t that just an identity function for a constructor, like
__new__(cls, item): if isinstance(item, cls) return item
?After trying logfmt at my current job, we actually ended up giving up and moving back to json logging. Writing logfmt logs is supported fairly well, but parsing it is not. When we tried a year ago SumoLogic doesn’t have a good way of parsing logs formatted like this and it’s particularly challenging to approximate something using a regex because fields can be quoted or not (and quotes can be escaped).
On the other hand, json works out of the box and is fairly standard. We lose some of the human readability, but as we’re using python it’s fairly trivial to simply use a different log formatter.
That’s a bit sad, it seems to me it’d be quite easy to write a parser for this format by hand. Not with a single regex, but really, the delimiters are
, and
=
,"
, right? Tokenizing and unquoting strings doesn’t seem infeasible. Of course if all you have is a custom regex in a tool, yes, that’s a pity.It definitely would be fairly easy to write a parser by hand. Unfortunately (and with good reason) you can’t do this with most hosted logging platforms. So we went with the least common denominator so it should work everywhere. I generally prefer interoperability anyway.
We decided to embrace our inner neckbeard and claim that syslog has worked for for that last 40 years, so it’s probably ok for another 10. In a nutshell we add JSON structure to our logging by just adding it at the end of the log text. Here’s an example of how that might look:
The first level of keys in the JSON object is something we internally call “well known” fields
recordingId
,userId
andsessionId
are al indexed in our log db.data
is a free form field, also somewhat searchable, not indexed, but it can have any shape that is useful to the event.The advantage of this approach is that any “standard” of the shelf software that talks syslog, can be slotted into our logging infra without translation.
We do support
apiKey
and some other “tags”, but this is not really security, but rather a mechanism we could use in case of DDOS attacks or similar.Our log collection paths are roughly.
AWS kinesis means we can spike in input without dropping anything. The log ingester parses and prepares stuff for the indexing in logdb.
Then we have bespoke UI and CLI tools to query this in realtime. I can just write
lq -B10m -a myapp -f
to get the last 10m of input tomyapp
and “follow”. Seriously. After a long spell of using external log collection providers, having bespoke logging funnel and tools tailored to your needs is amazing.I think Avro with a pubsub system like Kafka has much better coherence.
That sounds very heavy-handed. Why avro and not protobuf, for a start? And can you use angle-grinder to parse the logs afterwards?
Logs can also be a thing that exists on one machine, with syslog or journald.