Read Models Spanning Microservices’ Boundaries
Or glorified data caches
Read models are materialized views of denormalised data, mostly serving user interfaces but could also be used to provide data for other microservices. They are usually formed using data from a single microservice and live within that microservice’s boundary. However, they can span microservices, and they often do — though they do so under different guises sometimes.
I have blogged about re-designing an aggregation (reporting) microservice where data comes from numerous services, but that post was about solving integration issues for a valid use case. Reporting services are meant for analytical data, but how do we tackle presenting transactional data sourced from multiple services back to the users? Can we use the same technique?
To recap, the technique was to avail the data exposed by the various microservices via the AtomPub protocol and take advantage of the immutability of the events to cache them in the infrastructure so they can be re-used to accommodate changes when needed. If you’re interested in learning more, check out the 3-part series starting from part 1 in the link below.
Re-designing an Aggregation Microservice — Part 1
I was involved in a recent discussion regarding how to improve a reporting microservice that aggregates data from…
AtomPub to the rescue again?
Similar to the aggregation microservice discussed in that post, we can create a global read model that processes the various data from those services and present them to the user. Since we have access to the data, we can rebuild the read models when we want to present the data differently. So this will work, right?
Well, not entirely. The difference between the reporting service and the read models is two-fold: firstly, the reporting microservice is an analytical, not a transactional data service; It’s not accessed when making decisions by any of the microservices, but rather it’s used to display those decisions already made. This is a somewhat subtle but important distinction.
Secondly, ownership: how the data is presented is no longer controlled or owned by the producers. Changes to the reporting needs are owned by the reporting service/team/data scientists, etc.. and the owner/producer is now a client that issues requests for alterations.
A glorified cache?
Conversely, the read model is still owned by the producers; they decide how they want to present their data and how/when to change it — if you think about it, it is a glorified cache.
Caching raises a few issues but having the events populating this cached read model available without resorting to the producers addresses some of those. However, it still suffers from another couple of major shortcomings: if one of the several services chooses to alter the way it represents the data to the users, the entire read model must be purged and rebuilt. This is an expensive and time-consuming operation, even when using cached events from the AtomFeed.
The other major issue is that when it comes to transactional data, not everything is cacheable. Many systems I have worked on require the information presented back to the users to be tailored on the fly based on a number of criteria. For example, the users’ region/country, permissions, etc. How would you cache that? Cache all the possible permutations across all services in the read model? That’s just not feasible.
What’s the alternative, then?
I will discuss one alternative that has worked for me in the past in the next post.