NoSQL Zone is brought to you in partnership with:

Trisha Gee has been a Java developer for over 12 years, and due to a low boredom threshold has worked in loads of different industries for many types of companies. Trisha is a developer at 10gen (the MongoDB people). She has expertise in high performance Java systems and is leader in the London Java Community. She is also involved in the Graduate Development Community. Trisha believes we shouldn't all have to make the same mistakes again and again. Trisha is a DZone MVB and is not an employee of DZone and has posted 63 posts at DZone. You can read more from them at their website. View Full User Profile

NoSQL is a Stupid Name

10.24.2012
| 9087 views |
  • submit to reddit
So, I've finished my first full week in the new job and I've learnt lots of new stuff. Which is great, because that's usually why you change jobs.

I'm learning a lot about these new-fangled NoSQL database thingies. The LMAX architecture was based on keeping everything in memory and reducing the waits for IO - messages were journalled to disk, and reads and writes to the MySQL database were off the critical path. Therefore doing anything radical to the storage side of the architecture was just not high on the list of priorities.

Everything I knew about NoSQL I learnt from the various conferences I've been going to in the last year, and even then that's limited - without a business reason to pursue knowledge I know it'll just leak out of my brain, so I avoid sessions with no immediate applicability to me.

Let's summarise what I knew about NoSQL databases before last week:
  • They don't use SQL. Who knew? 
  • There are different flavours.  There's a graphy one and key-value things and... others...
  • They're "scalable" (yes, yes, it's web scale). 
  • Some/many/all(?) embrace the idea of eventual consistency 

I was suspicious of the hype surrounding NoSQL, partly because it's associated with the meaningless marketing term "Big Data" and partly because I'm a cynic that sneers at things that get too popular. Here's what I think when I hear the following terms:
  • Cloud - Fire your systems people and ditch your comms room!
  • Big Data - Parse Twitter in order to learn how to read your customer's minds!
  • NoSQL - Stop paying Oracle!
  • Functional - We couldn't get good enough at mainstream programming languages so we switched to something more difficult!

I don't know if it's healthy to be this cynical, but I'm too old to jump on every bandwagon that comes along.

Anyway. Back to the people who now pay my bills.

It's unfortunate that the lack of SQL is the thing that captured the imagination, rather than the lack of tables and a relational structure. SQL was never (in my mind) a particularly evil thing, it's a pretty good language for saying "I want this stuff from this place that fits these criteria", and that's something we're going to have to do at some point whatever the technology.

It's rather more important that it's the structure of the data that's different in NoSQL databases.
In a traditional relational databases you have tables, and relationships between those tables are achieved with foreign keys. I'm starting to think of these as something kind of grid-shaped with links between them:

Series of database tables and their relationships.  Honest.
(Yes, I'm experimenting again. This time with my shiny new iPad, a stylus and Penultimate. It's good for ad-hoc drawings, but lacks the precision of the graphics tablet and flexibility of GIMP).

At the very high level, it seems like there are four (ish) types of NoSQL databases:
  1. Column Family 
  2. Key/Value 
  3. Graph 
  4. Document 
Column Family
Column family databases feel to me, as a newbie to the field, similar to key/value, which I'll come on to. I've mostly heard Cassandra used as an example of this type of NoSQL database. I guess the way I think of this, and of course I could be wrong/over-simplifying, is a unique key linked to a set of key/values:
ID: 63537
   Name: Trisha
   Twitter: @trisha_gee
   Location: London


Which I'm translating into groups of key/value pairs, with a the ID as a sort of header:
Key/value pairs grouped by ID
You need the key in order to look up all the details about me. The way I hear it, it's great for writing data, but it's less flexible for ad-hoc queries.

Key/Value
These types of NoSQL database (e.g. Riak) are pretty much as schema-less as you get - just dump key-value pairs into them. To be honest, the best description I found was on dba.stackexchange.com, so I'm not going to re-write that with my (at this point) limited understanding.
Never ending lists of key/values
From what I've heard so far, both Key/Value and Column Family databases embrace eventual consistency. I don't know how much of that is a function of their data model and how much is decided by the individual products. For some people eventual consistency is deal-breaker, but in many cases it seems to me that it's just a matter of getting your head around this and designing your application appropriately.

Graph
I came across graph databases when I stumbled across Neo4j, chatting to some of the very smart guys there. A graph database lets you model you data as a series of nodes and relationships. And if I think about it, this is not a massive step from either relational models or object models. It doesn't just apply well to the social networking domain (where it's very easy to think in terms of users and their relationships), in actual fact lots of things we design could be modelled this way. Not having used it, I'm not sure just how much of a mental leap you need to take to start thinking that way, but it seems like it might be a good fit for many problems.
Graph of nodes with annotated relationships
I'd be interested in what the architectural trade-offs in using this model are.

Document
Now MongoDB falls into category four, the document database. And as a NoSQL n00b, this is now the product and area I know most about, and am clearly going to be more excited about since 10gen are indoctrinating me in the MongoDB way.


Documents are a familiar structure for developers, especially if they've been working with JSON. So, a document might be:
{
  name: "Trisha",
  twitter: "@trisha_gee",
  address: [
    { line1: "not telling",
      line2: "no really",
      town: "London" }
  ]
}


To me, this looks like it maps onto to my domain-shaped Object Model more easily than a relational database, which always needs some sort of O-R mapping (whether you do this with hibernate or use Spring to do it yourself, you're still mapping tables into objects and vice versa). What I like about the document format is the nested sub-documents for data that belongs together. In relational databases you often end up denormalising for performance anyway, so why not just accept that up front and have it as part of the thing you're storing?


This does have a cost, of course - nothing is without trade-offs. Every time you request this document, you get the whole lot. You can't have the person without the address. So, you do need to understand the relationships (still) and whether you're usually going to want to get all that data at the same time or whether you might want to make two separate calls.

Which brings me on to another thing which is familiar from relational days - foreign keys. A field in your document can be the ID of another document, so you can follow the links through and retrieve other documents associated with the starting one. Again, there are trade-offs here - each link you follow is a different request to the database. These database requests can be very quick, but if you wanted this data every time, you'd probably want it embeded in your first document to save the additional call. I guess it's a latency vs throughput question really - a single query which returns a chunky document, or multiple queries that return smaller ones.


So schema design is still important in document databases even if you don't have a relational schema. No new technology is an excuse to stop thinking about the problem you're trying to solve and understanding the tradeoffs in design.

One of the advantages, it seems, of something like MongoDB over some of the key/value databases is the ability to write ad-hoc queries and to tune for those queries. The data is structured (it's in a document) and it doesn't have to be in the same structure every time - not every document relating to a person needs all the fields that another person might have. But you can still query for people who have blue cars or people who live in London, or people who's surnames begin with G. If you find yourself doing the same query a number of times, you can add indexes to MongoDB the same way you would a relational database.

Semms like I'm getting into more of the nitty-gritty MongoDB details, so I'll stop there and leave that for another time.

In Summary
Classing a whole swathe of products as "NoSQL" is misleading and confusing.  The only thing they all share in common is that they are not traditional relational databases.  Other than that, some of them are as different from each other as they are from relational databases.  I haven't even mentioned caching technologies - these products have functionality which overlaps with NoSQL databases as well.  But even then, the purposes are somewhat different, and not even mutually exclusive.

As with anything, it's really important to understand the strengths and weaknesses of a technology, and the demands of your domain.  These different ways of organising data, and different products, are going to perform really well in certain circumstances, and pretty poorly when used in others.  Getting an understanding of what those strengths and weaknesses are is going to be important in making the correct product/architecture/design decisions.

None of this information is new, there's a lot of material on the web about the different types of NoSQL databases. I'm writing it more for my own benefit than anything else, my memory is notoriously shocking.  For more in-depth (and probably more accurate reading) there's:
  • Martin Fowler's NoSQL Distilled
  • ...and his introduction to the subject
  • Tim Berglund (@tlberglund) did a great overview of three types at JAX London last week.  There's a video of the same content (different conference) here.
  • http://nosql-database.org/ appears to list all the products that fall under the massive umbrella, but isn't the most usable of sites.
  • And yes, I used Wikipedia.  Which is probably where I went wrong...
   
Published at DZone with permission of Trisha Gee, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Ronald Muller replied on Wed, 2012/10/24 - 4:44am

Tip:  Read "Seven Databases in Seven Weeks". I just finished it and you get a reasonable good "hands on" insight into all kind of database products

Trisha Gee replied on Wed, 2012/10/24 - 6:31am in response to: Ronald Muller

Thank you, that sounds very useful!

Mark Unknown replied on Wed, 2012/10/24 - 1:04pm

Even "stupider" - You can use SQL with many "NoSQL" databases.

 

 

Trisha Gee replied on Wed, 2012/10/24 - 2:18pm

Yeah I know! I didn't want to confuse things even more by mentioning that though, things are bad enough as it is. 

Yeshwant Nandi replied on Wed, 2012/10/24 - 9:36pm

Also, have you heard of "HANA" the new and improved "in-memory" database being pushed heavily by SAP.   It is a "columnar datastore" with compression and it does not have any data aggregation.  All data aggregation is on the fly.  SAP is trying to move all its enterprise apps from Oracle to HANA. 

God knows what all this means ..... but looks like the traditional RDBMS/SQL based on mathematical set-theory is under heavy attack from many sides ...  Dr Codd must be rolling in his grave ...  

Caspar MacRae replied on Thu, 2012/10/25 - 6:37am

 

I'm pretty sure the next version of JPA will have an alternate criteria API from Java 8's Lambdas*, and that standardised support for JPA on NoSQL dBs will also be introduced**.

 So the one part of relational dB's that will persist (sic) is a Standard Query Language - so yes, agreed the name is stupid.

 

* It won't smell as much as Linq, Brian Goetz has done a nice job

** EclipseLink supports several NoSQL dBs and several NoSQL dBs have partial support for JPA baked.  http://wiki.eclipse.org/EclipseLink/Examples/JPA/NoSQL

hans horwath replied on Fri, 2012/10/26 - 12:00pm

And what about intersystems-Caché? I wonder why this one is nearly never mentioned - but IMHO it's great.

Ed Draper replied on Wed, 2012/10/31 - 9:59am

Back in the 80s there was a class of products called "object databases."   I wish we'd re-adopt the term. Why should I care what underlying data structure is being used (i.e. Graph, Document, Key/Value)?  This may likely change over time, but the nature of the product(s) would be the same.

I agree that NoSQL makes about as much sense as calling monochrome "no color."



Ted Young replied on Wed, 2012/10/31 - 6:45pm

That's why I prefer the term NoJoin databases, since most of these types of database sacrifice the ability to do joins in order to have performance and scalability. Of course, even that's not entirely correct, since some of them also sacrifice indexing flexibility, etc. Non-Relational is possibly an even better term.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.