1. 14
    1. 4

      logfmt is great! Splunk has amazing searching for logfmt values too. You can get the request ID from a customer’s issues and then instantly go around and search for it, seeing the entire distribution of everything across the entire cluster. It’s great to see more people discover it.

      @apg I made a simple logging package in Go called ln for doing this with full context-awareness. You can stuff key->value pairs in contexts as well as make any other data type feed out key->value pairs. The heart of this library is my favorite Go function I’ve ever written:

      // F makes F an Fer
      func (f F) F() F {
        return f
      }
      
      1. 2

        Can you explain that function? I’m not sure how to read it.

        1. 2

          It’s an identity method, kind of like

          public class Class {
              public Class Class() {
                  return this;
              }
          }
          

          in Java (if that’s even valid Java), unless I’m missing something.

          1. 1

            But it takes no parameters somehow? Just type parameters? Or am I still misreading the Go?

            1. 3

              A method in Go “implicitly”/“forcefully” takes a parameter (the (f F)), which is like the this in Java.

        2. 2

          Here’s an analogy. Let’s say you have an interface for temperature. The interface specifies a getCelsius() method. Let’s say you have a temperature in celsius. How do you implement the interface? By just returning yourself.

          Celsius getCelsius() {
              return this;
          }
          
      2. 2

        Of course, the whole reason the interface was called F is because logrus used Fields which was overly verbose. I wanted something easy to type. It had an interesting property, though, that you’re often logging because of an F’in failure, so it was perfect.

      3. 1

        Isn’t that just an identity function for a constructor, like __new__(cls, item): if isinstance(item, cls) return item?

    2. 3

      After trying logfmt at my current job, we actually ended up giving up and moving back to json logging. Writing logfmt logs is supported fairly well, but parsing it is not. When we tried a year ago SumoLogic doesn’t have a good way of parsing logs formatted like this and it’s particularly challenging to approximate something using a regex because fields can be quoted or not (and quotes can be escaped).

      On the other hand, json works out of the box and is fairly standard. We lose some of the human readability, but as we’re using python it’s fairly trivial to simply use a different log formatter.

      1. 1

        That’s a bit sad, it seems to me it’d be quite easy to write a parser for this format by hand. Not with a single regex, but really, the delimiters are =, , and ", right? Tokenizing and unquoting strings doesn’t seem infeasible. Of course if all you have is a custom regex in a tool, yes, that’s a pity.

        1. 2

          It definitely would be fairly easy to write a parser by hand. Unfortunately (and with good reason) you can’t do this with most hosted logging platforms. So we went with the least common denominator so it should work everywhere. I generally prefer interoperability anyway.

    3. 2

      We decided to embrace our inner neckbeard and claim that syslog has worked for for that last 40 years, so it’s probably ok for another 10. In a nutshell we add JSON structure to our logging by just adding it at the end of the log text. Here’s an example of how that might look:

      <142>1 2019-03-18T13:12:27.000+00:00 my-host fumar \
                   1.2.3 - [fumar@53595 apiKey="secret stuffz" env="development"] \
                   Hello world! {"recordingId":"abc123","userId":"martin",\
                   "sessionId":"my session","data":{"stuff":42}}
      

      The first level of keys in the JSON object is something we internally call “well known” fields recordingId, userId and sessionId are al indexed in our log db. data is a free form field, also somewhat searchable, not indexed, but it can have any shape that is useful to the event.

      The advantage of this approach is that any “standard” of the shelf software that talks syslog, can be slotted into our logging infra without translation.

      We do support apiKey and some other “tags”, but this is not really security, but rather a mechanism we could use in case of DDOS attacks or similar.

      Our log collection paths are roughly.

      • => http-to-syslog-service => AWS kinesis => log-ingester => logdb
      • => AWS kinesis => log-ingester => logdb

      AWS kinesis means we can spike in input without dropping anything. The log ingester parses and prepares stuff for the indexing in logdb.

      Then we have bespoke UI and CLI tools to query this in realtime. I can just write lq -B10m -a myapp -f to get the last 10m of input to myapp and “follow”. Seriously. After a long spell of using external log collection providers, having bespoke logging funnel and tools tailored to your needs is amazing.

    4. 2

      I think Avro with a pubsub system like Kafka has much better coherence.

      1. 1

        That sounds very heavy-handed. Why avro and not protobuf, for a start? And can you use angle-grinder to parse the logs afterwards?

        Logs can also be a thing that exists on one machine, with syslog or journald.