1. 11

  2. 1

    Wow, this (simple) data pipeline architecture is almost exactly what I was looking for 5 years ago for something at work. I ended up writing something very similar to this so we could wire up disparate APIs to talk to each other and synchronize data across different providers, platforms & environments.

    I’d love to see some in-depth, real-world (read: gnarly) setups that use this; I’m curious how it handles integrating all of these features into a functioning system.

    1. 2

      It’s a little old now but there’s a blog from Meltwater loosely outlining how they use it for enrichments: https://underthehood.meltwater.com/blog/2019/08/26/enriching-450m-docs-daily-with-a-boring-stream-processor/, the input/outputs are fairly basic as it’s consuming from Kafka to Kafka, but in between it has to perform a network of HTTP enrichments each with a unique API and format for batching messages (JSONL, JSON array, individual parallel requests, etc).

      The more modern way of doing it is outlined in a simpler example here: https://www.benthos.dev/cookbooks/enrichments/.