Sagas — Part 2: Sagas in Distributed Systems
Sagas — Part 1: An Introduction
Sagas — Part 2: Sagas in Distributed System
Sagas — Part 2b: Sagas in Distributed Systems Continued
Sagas — Part 3: Choreography Instead?
Sagas — Part 4: Design Considerations
In the previous post in this series, I briefly touched on why distributed transactions are not suitable in distributed systems:
- participating services could take an indeterminant amount of time, leading to higher chances of deadlocks,
- all participating services must be available at the same time for the duration of the transaction — and the rollback, if one should occur,
- The service hosting the saga should itself be available for that time too — or at least whenever it's needed during the process.
An alternative is to use Sagas. Sagas maintain data consistency across services in distributed systems without the need for long-lived, distributed transactions. This is done by breaking transactions into smaller, local microtransactions—one per participating service.
Sagas sequentially instruct services to carry out steps in a business process. Each microtransaction is a step in a Saga, and each step has a compensating action in case of failure. If a step cannot be completed, the Saga fails, and it has to go through all previously completed steps and undo them by invoking their compensating actions. This is done in reverse order.
Interactions with participating services in a Saga can be a mix of synchronous or asynchronous, depending on the required guarantees. It’s not a purely technical decision, and it’s something that should be analysed with the business.
Sagas: an Example
Take food ordering apps, for example, Deliveroo, in my case, as this is the one I often use. They use a saga to fulfil orders (these are my observations, and I have no knowledge of the actual implementation). I often place orders that get rejected at various stages and receive notifications (e-mails) of previously completed steps being reversed. The process I have observed is the following:
Every participating service in the Saga provides a way to achieve its part of the transaction as well as a way to reverse it.
The Saga process for the above diagram goes from left to right: take payment, prepare items and assign a driver to deliver the order. If they are all completed successfully, the Saga is completed.
If any service fails fulfilling its part, you cannot get your order, so the Saga is cancelled. When a Saga is cancelled, it starts undoing all already completed steps by calling the provided compensatory actions in reverse order. Hence, if assigning a driver fails, the Saga calls the restaurant service to cancel the preparation, then the payment service to refund the payment.
Though Sagas may look simple, there are several design considerations that are crucial to how effective a Saga is. These range from how you order the steps to managing anomalies raised by running multiple Sagas. This will be the topic of the next post in this series.
There are undoubtedly many more nuances to the Deliveroo Order Saga, but the above represents an oversimplified high-level process from the outside looking in. It serves the purpose of demonstrating Saga concepts.
What Sagas are
Sagas are lightweight and can be executed in a centralised, procedural manner or non-centralised via a Routing Slip.
They do not maintain a state and use the information in the incoming messages alone to make decisions. They are used to model linear business processes where the steps to complete the process are known beforehand.
Sagas can be thought of as failure management systems. They are built to deal with business transactions and, by extension, business errors. They do not deal with technical errors well. I will elaborate on this more in the next post in this series.
Sagas vs Process Managers
I often come across blog posts or take part in discussions where Sagas and Process Managers are used interchangeably. And that’s OK. At the end of the day, as software practitioners, we are paid to solve problems. Clients don’t care whether we use a Saga or a Process Manager or whatever, as long as it’s the best solution to the problem. However academic this discussion is — it’s good to clear the confusion.
The main difference is that Sagas only use the information provided in the messages they consume to make decisions, and they have no persistent state. On the other hand, process managers maintain the state of the process and use that state to drive the flow.
In other words, Process Managers implement business processes as first-class citizens and are used for more complex, non-linear business processes. They contain workflow-specific logic that determines how the process should proceed.
Steps in a Process Manager also do not have to provide compensatory actions — but can do if necessary.