NoSQL Zone is brought to you in partnership with:

Doug has been engrossed in programming since his parents first bought him an Apple IIe computer in 4th grade. Throughout his early career, Doug proved his flexibility and ingenuity in crafting solutions in a variety of environments. Doug’s most recent work has been in the telecom industry developing tools to analyze large amounts of network traffic using C++ and Python. Doug loves learning and synthesizing this knowledge into code and blog articles. Doug is a DZone MVB and is not an employee of DZone and has posted 31 posts at DZone. You can read more from them at their website. View Full User Profile

Codd’s Relational Vision: Has NoSQL Come Full Circle?

12.13.2013
| 12311 views |
  • submit to reddit

Recently, I spoke at NoSQL Matters in Barcelona about database history. As somebody with a history background, I was pretty excited to dig into the past, beyond the hype and marketing fluff, and look specifically at what technical problems each generation of database solved and where they in-turn fell short.

However, I got stuck at one moment in time I found utterly fascinating: the original development of relational databases. So much of the NoSQL movement feels like a rebellion against the “old timey” feeling relational databases. So I thought it would be fascinating to be a contrarian, to dig into what value relational databases have added to the world. Something everyone thinks is obvious but nobody really understands.

It’s very easy and popular to criticize relational databases. What folks don’t seem to do is go back and appreciate how revolutionary relational databases were when they came out. We forget what problems they solved. We forget how earlier databases fell short, and how relational databases solved the problems of the first generation of databases. In short, relational databases were the noSomething, and I aimed to find out what that something was.

And from that apply those lessons to today’s NoSQL databases. Are today’s databases repeating mistakes of the past? Or are they filling an important niche (or both?).

The First Databases

To appreciate how revolutionary the relational model is, we need to appreciate what the earliest, pre-relational databases looked like. These databases largely reflect what we would build if we were tasked by our boss to “build a database”. Maybe, if we’re tracking movie rentals, in our ignorance we might start out with a line-by-line csv listing of customers and what movies they’ve rented. Maybe something silly that looks like this:

vectors are fun

Poor Man’s Database

We might also add an “index” to the back of the file to help us find specific records, in the same way we’d use an index in a book. Here the index is telling us that the “doug” record is 512 characters into the file while the “rick” record is 9212 characters into the file.

Now the earliest designers of these data storage systems would have had to face the problem, “How do I store movies as their own record?”. For example, we’re going to need to start storing how much a movie costs to rent and how many we have in stock. Should movies like “Top Gun” be treated as top-level records? Or should they be as parts-of (something owned by) the user records we already have? With movies as their own records, we don’t have to duplicate all the inventory and price data everywhere, but we’ll need to create another construct (an additional record type?) to specify the relationship.

vectors are fun

Poor Man’s Database

The databases that aggregate videos into users are considered to be following the “hierarchical” model. The databases that break out movies into their own records with the ability to link records fall into the “navigational” model.

You can imagine, if we were interacting with this database, our first reaction would be to create a rather primitive interface – create something low-level that talked only in movies and users. Luckily, others saw past this and realized that patterns like the ones in our video rental and other data stores could be generalized–that we could abstract the notion of “record” and “ownership” into something much more powerful. Eventually this is exactly what Codasyl/DBTG (Database Task Group) did, creating a standard language for creating and interacting with navigational and hierarchical databases.

So reflecting on our data model, you could create a record in one of these databases by using the RECORD command:

Record Name is USER
   Location Mode is Calc Using username
   Duplicates are not allowed
   username Type character 25
   address Type character 50
   phonenumber Type character 10

And when we want to declare a relationship between two records, we can define a set that maps them in an owner-ownee relationship.

Set Name is USER-VIDEOS
   Order is Next
   Retention is Mandatory
   Owner is USER
   Member is VIDEO

This model reflects first building many databases and then attempting to generalize important abstractions. We can think of the early history of databases like our video rental example. Since the advent of storage media, folks needed to store all kinds of data. Eventually somebody did something laudable, reflected on all the patterns being used in data storage and management and created a generalized database that was codified in the DBTG language.

Enter Codd

Whereas the first database’s abstractions grew out of patterns learned building from the bottom-up, the relational model did just the opposite. The relational model creates an amazingly powerful abstraction rooted in predicate calculus, and then expects the implementation details to follow (which as we all know they did).

When Codd wrote his paper, he criticized the DBTG databases of the day around the area of how the application interacted with the databases abstractions. Low-level abstractions leaked into user applications. Application logic became dependent on aspects of the databases: Specifically, he cites three criticisms:

  1. Access Dependencies: We often need to navigate from users -> videos to get at the videos. Application logic depends on how records are linked or aggregated. I must use one record type to get another record type.
  2. Order Dependencies: Applications depend on how data is physically stored in the database. Notice the “ORDER is NEXT” line in the Set declaration above. This specifies storage based on insert order, so retrieval will in turn happen on index order.
  3. Index Dependencies: When accessing indices, these databases required database indices be referred to explicitly by name.

Codd proposed to get around these limitations by focusing on a specific abstraction: relations. A relation is simply a tuple of elements. The ith element of each tuple is a member of some set, known as that element’s domain. Perhaps a given element’s domain is the set of users, user ids, possible movies to rent, etc. So for our videos, a tuple might look like:

 (user=u, address=a, list of movies rented=l)

Or in other words

 (doug, 1234 Bagby St, [Top Gun 3.99, Terminator 12.99])

One can interpret this relation to “mean” any number of things. We might equate this to a statement that “Doug rents Top Gun at 3.99 and Terminator at 12.99” or “Doug lives at 1234 Bagby St”.

Published at DZone with permission of Doug Turnbull, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Lukas Eder replied on Sat, 2013/12/14 - 1:26pm

NoSQL really isn't a concept per se . It's a shame that recent non-relational databases (some of which are quite impressive, e.g. Neo4j, MongoDB) are just categorised with all the other stuff that simply doesn't do SQL - or better - doesn't implement the relational model.

But I like your point about recent NoSQL "movements" being a step towards pre-Codd times. In other words, NoSQL and PreSQL could be seen as being the same.

Lloyd Mangnall replied on Sat, 2013/12/14 - 9:58pm

I think the biggest message to take is that we have many new specialized tools for complex use cases that aren't best handled by rational models.  As is always the case with architecture, look at the problem domain from all perspectives, weigh all options carefly, and make pragmatic well informed decisions in your solution.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.