NoSQL Zone is brought to you in partnership with:

I am an author, speaker, and loud-mouth on the design of enterprise software. I work for ThoughtWorks, a software delivery and consulting company. Martin is a DZone MVB and is not an employee of DZone and has posted 82 posts at DZone. You can read more from them at their website. View Full User Profile

Aggregate Oriented Database

  • submit to reddit

One of the first topics to spring to mind as we worked on NosqlDistilled was that NoSQL databases use different data models than the relational model. Most sources I've looked at mention at least four groups of data model: key-value, document, column-family, and graph. Looking at this list, there's a big similarity between the first three - all have a fundamental unit of storage which is a rich structure of closely related data: for key-value stores it's the value, for document stores it's the document, and for column-family stores it's the column family. In DDD terms, this group of data is an aggregate.

The rise of NoSQL databases has been driven primarily by the desire to store data effectively on large clusters - such as the setups used by Google and Amazon. Relational databases were not designed with clusters in mind, which is why people have cast around for an alternative. Storing aggregates as fundamental units makes a lot of sense for running on a cluster. Aggregates make natural units for distribution strategies such as sharding, since you have a large clump of data that you expect to be accessed together.

An aggregate also makes a lot of sense to an application programmer. If you're capturing a screenful of information and storing it in a relational database, you have to decompose that information into rows before storing it away.

An aggregate makes for a much simpler mapping - which is why many early adopters of NoSQL databases report that it's an easier programming model.

This synergy between the programming model and the distribution model is very valuable. It allows the database to use its knowledge of how the application programmer clusters the data to help performance across the cluster.

There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences.

This is why aggregate-oriented stores talk so much about map-reduce - which is a programming pattern that's well suited to running on clusters. Map-reduce jobs can reorganize the data into different groups for different readers - what many people refer to as materialized views. But it's more work to do this than using the relational model.

This is part of the argument for PolyglotPersistence - use aggregate-oriented databases when you are manipulating clear aggregates (especially if you are running on a cluster) and use relational databases (or a graph database) when you want to manipulate that data in different ways.


Published at DZone with permission of Martin Fowler, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Eric Samson replied on Fri, 2012/01/20 - 2:02am

I didn't know the term aggregated databases. It looks a little bit like hierarchical databases from the 70's to me. Best regards, Erix.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.