1. 15

  2. 10

    Given the title, I was hoping a little more for lessons learned, and some reflection of benefits/costs versus alternatives after one year, rather than largely being a description of what ES/CQRS is.

    1. 6

      Same here. The few folks I’ve talked to first-hand that tried ES/CQRS systems (a very small set size for sure!), ended up running screaming for the hills after a while (and/or doing a rewrite). Maybe they did it wrong, or maybe doing it right is hard? Unsure.

      So, I’d be sure be interested in hearing more anecdotes/stories/tales about how ES/CQRS went right, or didn’t.

      1. 11

        The few folks I’ve talked to first-hand that tried ES/CQRS systems (a very small set size for sure!), ended up running screaming for the hills after a while (and/or doing a rewrite). Maybe they did it wrong, or maybe doing it right is hard?

        ES is a quagmire like ORM (and really, OOP in general): never-ending domain-modeling churn, with the hope that a useful system is “just around the corner”.

        This stuff is catnip for…

        The context of the project was related to the Air Traffic Management (ATM) domain

        …government/defense contractors. The drones are obsessed with building and rebuilding the One True Hierarchy of interconnected objects. But taxonomy is the lowest form of work.

        According to Martin Fowler Event Sourcing:

        Ensures that all changes to application state are stored as a sequence of events

        Indeed, that sounds awesome. And in order to do that you need actual computer science (e.g. datomic), not endless domain-modeling.

        Domain Driven Design (DDD) is an approach to tackle …

        Like clockwork :)

        1. 2

          Some great links in there, and some good reading. Thanks!

        2. 10

          I have been working on an ES/CQRS system for about 4 years and enjoy it, but it’s a smaller system than the one the article describes. It’s a payment-processing service.

          Because it’s a much smaller service, I haven’t really gotten into a lot of the DDD side of things. The system has exactly one bounded context which eliminates a lot of the complexities and pain points.

          There was definitely a learning curve; this was my first exposure to that architecture. I made some decisions early on in my ignorance that ended up being somewhat costly to change later. However, I’ve been in this game a pretty long time, and I could say the exact same thing about plenty of non-ES/CQRS systems too. I’m not sure this architecture makes it any more or less painful to revisit early decisions.

          What have the costs been?

          • The message-based model is kind of viral. If you have a component that you could almost implement as a straight system service that’s just regular code called in the regular way, but there’s one case where an event would interact with what it does (example: a customer cancels a payment that hasn’t completed yet) the path of least resistance is to make the whole component message-driven. This sometimes ends up making the code more complicated.
          • Ramping up new engineers takes longer, because many of them also have never seen this kind of system before.
          • In some ways, debugging gets harder because you can no longer look at a simple stack trace to see what chain of logic led you to a certain point.
          • We’ve spent an appreciable amount of time mucking around with the ES/CQRS framework to make it suit our needs. Probably still less time than we would have spent to implement the equivalent feature set from scratch, but I’ve had to explain why I’m spending time hacking on the framework rather than working on the business logic.
          • If you make a significant change to the semantics of the system, you may need to deal with old events that no longer have any useful meaning. In many cases you can just ignore them, but sometimes you have to figure out how to translate them to whatever new conceptual model you’re moving to.

          What have the benefits been?

          • The fact that the inputs and outputs are constrained makes it phenomenally easier to write meaningful, non-brittle black-box unit tests. Like, night-and-day difference. Tests nearly all become, “Given this initial set of events, when this command/event happens, expect these commands/events.”
          • Having the ability to replay the event log makes it easy to construct new views for efficient querying. On multiple occasions we’ve added a new database table for reporting or analysis that would have been difficult or flat-out impossible to construct using the data in existing tables. With ES, the code to keep the view model up to date is the same as the code to backfill it with existing data. For a couple of our engineers, this was the specific thing that lit the light bulb for them: “Wait, you mean I’m done already? I don’t have to write a nasty migration?”
          • In some ways, debugging gets easier because you have an audit trail of events and you can often suck the relevant events into a development environment and replay them without having to sift through system logs trying to manually reconstruct what must have happened.
          • The “dealing with old events” thing I listed under costs is also a benefit in some ways because it forces you to address the business-level, product-design question up front: how should we represent this aspect of history in our new way of thinking about the world? That is extra work compared to just sweeping it under the rug, but it means you’re never in a situation where you have to scramble when some customer or regulator asks for history that spans a change in product design.
          • Almost nothing had to change in the application code when we went from a single-node-with-hot-standby configuration to a multiple-active-nodes configuration. It was already an asynchronous message-passing architecture, but now the messages sometimes get delivered remotely.
          • And finally the main reason we went with ES/CQRS in the first place: The audit trail is the source of truth, not a tacked-on extra thing that might be wrong or incomplete. For a financial application that has to answer to regulators, this is a significant win, and we have had meaningful benefit from being able to prove that there’s no way for information to show up in a customer-facing report without an audit trail to back it up.

          The main conclusion I’ve reached after working on the system is that ES/CQRS is a tool like any other. It isn’t a panacea and like any engineering decision, it involves tradeoffs. But I’m happy to have it in my toolbox to pull out in cases where its tradeoffs are the right ones for a project.

          1. 1

            Thanks for the comprehensive answer! <3

          2. 8

            Like with all Design Patterns, ES/CQRS is a means masquerading as an end, and good design will be found to have naturally cherry-picked parts of it without needing to name it as such.

            Anecdotally, I’m dealing with a system that is ⅔ ES/CQRS and ⅓ bandaging together expanding requirements, new relationships between entities, increasing latency due to scale – basically everything that wasn’t accounted for at the start. It works, but I wouldn’t choose it over a regular database and a transaction log.

            1. 6


              As with outcomes, sadly so elusive.

              It occurs to me that our industry would be well served with a “Glassdoor” for IT projects. One where those involved could anonymously report on progress, issues and lessons learned. One which could be correlated with supplier[1], architecture type, technologies, project size etc.

              [1] supplier, e.g. internal or specified outsourced supplier i.e. Accenture, Wipro, IBM etc.

          3. 1

            A feat of architectural planning. I love it, real world example of well known patterns.