1. 9

  2. 2

    That looks very, very, nice. I haven’t tried the code out yet; but the docs, at least, look excellent.

    About half a year back, pushcx posted this article: Provenance and Causality in Distributed Systems, by Jessica Kerr. Kerr’s idea is not quite like your Eliot, but similar enought that I thought I’d post it here. Here’s its first paragraph:

    Can you take a piece of data in your system and say what version of code put it in there, based on what messages from other systems? and what information a human viewed before triggering an action?

    The last paragraph:

    I think this is something we could do, it’s within our ability today. I haven’t seen a system that does it, yet. Is it because we don’t care enough – that we’re willing to say “yeah, I don’t know why it did that, can’t reproduce, won’t fix”? Is it because we’ve never had it before – if we once worked in a system with this kind of traceability, would we refuse to ever go back?

    In Kerr’s sketch, actions are motivated by information at a point in time-and-space, and the causal logging system generates a DAG graph (yes, yes, PIN number) where each action is a descendant of the (relevant subset of) information available on that device at that time. This lets one explain actions (and their results) after the fact, even when system changes state in the meantime.

    Eliot’s model, unless I am mistaken, is that actions are motivated by higher-level actions, all the way up to a user-initiated task: each action generates a tree of actions. No less nifty. I look forward to getting a chance to use it.

    1. 2

      That sounds rather more ambitious than Eliot, and pretty fascinating: I’ll take a look. Thanks for the link!

    2. 1

      This is kind of a minor thing, but when I think about deploying Eliot in production I want to know that I can collect logs from disparate hosts and display the cool eliot-tree output for analysis. The documentation includes a recipe to collect logs from one host (via the systemd journal) and I can easily imagine pulling out an eliot-tree compatible stream from there. The documentation also includes a recipe to collect logs from multiple hosts (via LogStash/ElasticSearch) which would be great, but I’m not sure how to produce eliot-tree output from that.

      I could probably figure it out myself with a week of messing about, but I’d rather go to my boss with “here’s a cool thing I know I can do” than “here’s a cool thing that I might be able to do if we invest some time and effort into research”. And a solution where we configure an ELK stack in AWS ourselves without having to convince the Infrastructure team to set up a centralised logging system for us would be even better.

      1. 2

        eliot-tree just takes a dump of JSON messages, and ElasticSearch has the ability to easily dump messages with curl. So it’s basically curl http://elasticsearch/<something> | eliot-tree.

        Alternatively Kibana can show you a pseudo-tree view if you just sort by the task_uuid + task location.

        More broadly, there’s two issues which I think are best thought of separately:

        1. Log collection and aggregation. This is the ops’ team purview, typically.
        2. Generating logs. This is the developers’ purview, typically.

        Eliot doesn’t address the first problem much, its main focus generating logs in a way that makes them amenable to interpretation. You can use e.g. Python standard library logging and still be in same boat, needing to setup collection and aggregation.

        For aggregation, AWS has hosted ElasticSearch, but that doesn’t solve the log collection issue. You’d have to setup Logstash or Fluentd on each host. If you’re using something like Kubernetes then log collection is happening for each container and I assume (dangerous word, I know) there’s some way to easily aggregate it.

        1. 2

          I filed a couple of issues to cover this in the documentation.