Polyglot Persistence
In late 2006 Neal Ford wrote about Polyglot Programming and predicted the wave of language choice we are now seeing in the industry to use the right language for the specific job at hand. Instead of assuming a "default" language like Java or C# and then warring over the many different available frameworks, polyglot programming is all about using the right language for the job rather than just the right framework(s). For a while now I've thought about the fact that, paralleling Neal's description of polyglot programming, a relational database seems to be the accepted and default choice for persistence. Sometimes this is due to the fact that organizations have standardized on RDBMS systems and there isn't even any other choice. Other times it is simply what we're used to doing, and possibly we don't even consider alternatives. But now, with things like Amazon SimpleDB, Google Bigtable, Microsoft SQL Server Data Services (SSDS), CouchDB, and lots more, it seems like we're now seeing the beginning of Polyglot Persistence in addition to polyglot programming.
Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. For example, some co-workers of mine on one project are effectively using Lucene as their primary datastore, since the application they've built is mainly to do complex full-text searches very fast against huge datasets. Most people probably don't think of Lucene as a data store and just consider it as their full-text search engine. But for this particular application, which aggregates multiple disparate datasets, glues them together, and performs full-text search against the consolidated view of the data, it makes a good deal of sense. It also helped that in a bake-off against a very popular traditional RDBMS system's full-text add-on product, the Lucene search solution blew the doors off the traditional RDBMS in terms of performance, and that was even after a team of consultants from the vendor came in and tried to optimize the search performance. So, in this case a non-relational data store made more sense in terms of the problem context, which was data aggregation and fast full-text search.
Within the past few years we've started to see and hear about how companies like Amazon and Google are using non-traditional data stores such as SimpleDB and Bigtable for their own applications. Google App Engine in fact provides access to Bigtable, described as a "sparse, distributed multi-dimensional sorted map," as the sole persistent store for Google App Engine applications. Other organizations like the Apache Software Foundation have gotten into the non-relational data store market as well with things like CouchDB which is described as "a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API." One of the common threads among all these non-relational stores is that they are distributed, designed for fault tolerance, embrace asynchronicity, and are based on BASE (Basically Available, Soft State, Eventually Consistent) and CAP (Consistency, Availability, Partition Tolerance) principles as opposed to traditional ACID (Atomicity, Consistency, Isolation, Durability) properties found in traditional RDBMS systems. In addition, they are almost all either "schemaless" or provide a flexible architecture that promotes ease of schema changes over time, again as opposed to the rigid and inflexible schemas of traditional relational databases.
I don't think it's a coincidence that the companies creating and now offering these alternative data stores - free, commercial, or hybrid models like Google App Engine which is free up to a certain point - are all giants in distributed computing and deal with data on a massive scale. My guess is that perhaps they initially deployed some things on traditional RBDMS systems and outgrew them or maybe they simply thought they could do it better for their own specific problems. But as a result, I think over time that organizations are going to start thinking more and more about the type of persistence they need for different problems, and that ultimately the RDBMS will be but one of the available persistence choices.
From Scott Leberknight's Weblog
- Login or register to post comments
- 7480 reads
- Printer-friendly version
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)




Comments
Jochen Bedersdorfer replied on Thu, 2008/10/23 - 5:21pm
Thanks for the nice overview of emerging database systems.
I wished the predominance of RDBMS would go away and IT departments at customer sites would be more willing to accept vendor chosen database systems. No, they have their oracle cluster farm and want to use it... :)
The first version of our companies main application used versant OODBMS. Never again was persisting objects easier. And the damn thing was FAST!
Slava Imeshev replied on Mon, 2008/10/27 - 2:31pm
in response to: beders
[quote]My guess is that perhaps they initially deployed some things on traditional RBDMS systems and outgrew them or maybe they simply thought they could do it better for their own specific problems. [/quote]
My guess is that people there just got bored with the existing just-works solutions :-)
Regards,
Slava Imeshev
jame jack replied on Wed, 2009/06/17 - 7:38pm