1. 1

    Thanks this was really useful. I’ve reached for git blame a bunch of times and I don’t think I’ve ever actually found what I needed from it.

    1. 1

      Wow, this (simple) data pipeline architecture is almost exactly what I was looking for 5 years ago for something at work. I ended up writing something very similar to this so we could wire up disparate APIs to talk to each other and synchronize data across different providers, platforms & environments.

      I’d love to see some in-depth, real-world (read: gnarly) setups that use this; I’m curious how it handles integrating all of these features into a functioning system.

      1. 2

        It’s a little old now but there’s a blog from Meltwater loosely outlining how they use it for enrichments: https://underthehood.meltwater.com/blog/2019/08/26/enriching-450m-docs-daily-with-a-boring-stream-processor/, the input/outputs are fairly basic as it’s consuming from Kafka to Kafka, but in between it has to perform a network of HTTP enrichments each with a unique API and format for batching messages (JSONL, JSON array, individual parallel requests, etc).

        The more modern way of doing it is outlined in a simpler example here: https://www.benthos.dev/cookbooks/enrichments/.