I am not experienced in this area. Question for those who are - is event sourcing essential to any microservice project? Or is logging sufficient to diagnose/troubleshoot problems?
Coming from my background with single tenant deployments of monolithic server APIs, I can get a stack trace when the server runs into a problem. I can generally figure out what happened by reading said stack trace and maybe inspecting the database logs or server logs. But we may not have a single stack trace in the distributed microservice context. So, is event sourcing a substitute?
Event sourcing is more about application data representation - rather than store the effects of user actions (“there is a post with content x”) you store the action itself (“user y posted content x”).
Distributed tracing is indeed a problem in microservice systems. Easiest fix is to build monoliths instead. But if you must, the fix is distributed tracing systems like OpenTelemetry.
Often I wish there were “Team Monolith” shirts and other paraphernalia.
Event sourcing, or technical messaging (in memory or persisted event queues) – is almost always necessary for Microservices in business apps.
Reason is simple: the microservices need to communicate to each other.
That communication must include ‘commands’ and ‘data’.
Some teams use ‘messaging’ or ‘events’ or gRPC calls to send ‘commands’ only.
Then, they require all the microservices to get data from ‘central database’.
That’s a big problem, essentially a database becomes a critical integration point (because the ‘data’ to execute the commands is there, in that central database).
That kind of approach eventually becomes a bottleneck for microservices (unless the database becomes an in-memory data cluster…).
So the alternative is to send ‘commands’ plus the external data that’s required to execute the command.
Sort of like withing a language we call ‘function’ (which is a command) with arguments (which is data).
But data can be complex, lots of it, and you need to have a mechanism that makes sure the a ‘command’ is sent ‘just once’ (unless we are dealing with idempotent commands).
When you want the invocation to be asynchronous, you use a message bus, or thalo, or kafka, or zero-mQ type of systems, or UDP-styled message passing
When you need the invocations to be synchronious you use RPC/REST/etc (or you can use TCP-styled message passing).
In that model, where the necessary external ‘data’ is sent together with commands – the micro-services can still have their own databases, of course (to manage their own state) – but they no longer rely on a centralized database for a data exchange. The other benefit of it, is that Enteprises avoid that ‘schema change’ bottleneck in a centralized database (the message schemas are much easier when it comes to schema changes, than database schema changes)
A message bus also in some limited sense, solves the ‘Service registration/Service naming’ question (consumers are registered, and unregistered as needed).
But in a more general case, when microservices need to scale up and shrink elastically across VMs (depending on demand) – you will also end up using a Software Defined Network + Naming Service + Container Manager.
And those things are done by Kubernetes or by nomad+envoy+consul.
Event sourcing can help with examining application behavior in the wild. If you have a full log of the semantic mutations made to your system state, you should be able to figure out when (and whence) unexpected changes happened.
You’ll still need tracing if you want to determine why an update failed to apply. If your backing queue for the events is backed up, for example, you probably want a fallback log to see what proximal faults are/were happening. As the old guidance goes, figuring out why something is “slow” is often the hardest problem in a microservice architecture, esp. in the presence of metastable faults.
IMHO event sourcing is largely a better option than generic trigger-based DB audit logs. The latter tends to be noisy and often does a poor job of indicating the initial cause of a change; putting things into an event log can provide some context and structure that makes debugging and actually reviewing the audit logs tractable.
I have to tell you that I had been porting Ruby’s Madeleine library to Rust as a side project and this just blows it away. How lovely!