1. 7

How do you guys handle centralized logging? I was thinking of using ELK stack. If anyone has used it, how has been your experience?

To elaborate
- I’ll push nginx logs & application logs
- Nginx logs is at 3 gigs a day
- Application logs are at 30 gigs a day

  1.  

  2. 1

    I use a SEK (syslog-ng, ElasticSearch, Kibana) stack. For various reasons, I’m not a big fan of LogStash, but ES & Kibana are terrific tools. So I use syslog-ng to collect pretty much all logs, and I push them to a central syslog server after I’ve done some pre-processing on the client side: all logs are pre-parsed and formatted to JSON for easy indexing purposes. The clients store the last day’s worth of logs on local disk, but also send the same data to the central collector. The central collector pushes the data to ES, and may do further processing (for example, corellation) in addition to that, depending on the configuration. For example, related application logs are indexed both individually, and combined too.

    I’m pretty happy with this setup so far.

    1. 1

      How much log data (in size) do you retain on ES?

      Could you explain a bit more on “related application logs are indexed both individually, and combined too”?

      1. 1

        On my personal system (2 servers + a desktop + 2 laptops), which is fairly small, I have about 1Gb of data so far.

        As for indexing related app logs: when I send an email from my laptop, the local msmtp generates logs. Then the mail reaches my server, and postfix generates some more logs. Both the msmtp and the postfix logs are indexed separately. However, I do corellation too, and pull the msmtp + postfix logs together into one event, and that gets indexed too.

    2. 1

      Few more questions
      - if your requests span across multiple services, do you send a tracking id across them that get printed in the logs?
      - if using ES, are logs in ES your source of truth or do you also keep a copy of it on disk?
      - ELK solves monitoring for patterns, aggregations. Do you also use it for “live tailing” during deployments to monitor errors? Or you some other distributed tailing/multitail for that?