SQL Zone is brought to you in partnership with:

Matt is a software engineer and web developer. He is the author of several books on programming and technology, and is a frequent open source contributor. Matt holds a Ph.D. in philosophy, and has taught both Philosophy and Computer Science at Loyola University Chicago. Matt is the lead cloud engineer at Revolv. Matt is a DZone MVB and is not an employee of DZone and has posted 12 posts at DZone. You can read more from them at their website. View Full User Profile

NoSQL No More

04.15.2014
| 21881 views |
  • submit to reddit

I work at Revolv, and right now we're just about finished with a major migration off of a NoSQL database. We're moving onto a traditional RDBMS (PostgreSQL, to be precise). Many people seem surprised that we, a tech-savvy startup, would be moving to an "old" technology. But in fact our reasoning is pretty simple: A relational system better meets our needs and is more cost-effective.

There are three main reasons why I say this.

1. Our data is relational

NoSQL databases are notorious for not handling relations gracefully. NoSQL design eschews relational modeling by suggesting that objects ought to be stored in such a way that relations are not needed as often. For example, instead of storing a "chair" with a relation to four "legs", each leg should simply be stored on the chair's record. There is no separate "legs" table. At times, there is definite value in this approach.

But what about data that really is relational? When it comes to authors and articles, for example, copying author data onto every single article object results in severe data de-normalization. Copies of author bios everywhere! Conversely, storing articles as attributes of an author adds unnecessary complexity when, say, retrieving a list of the most recent articles for all authors. And the more complex and numerous the relationships, the harder it becomes to model and efficiently query in a NoSQL database. Our data is far more relational than this example. It is filled with many-to-many relationships where no particular record type takes precedence as a "first class" type under which others may be subservient.

Some data fits the NoSQL model well. That's great. But some data simply does not fit the assumptions of NoSQL design. Shoe-horning data into a database--making concessions on structure, query-ability, and performance--is a good indication that you're using the wrong tool for the job.

2. We need better querying

Our NoSQL-based application was getting pretty heavy on a specific kind of logic: building up and executing queries. With no intermediary query language, we ended up writing special code for every single query. And each language's API was different to the point of requiring different techniques when executing database operations on different tools.

And then there were security considerations, type checking, query performance, and so on. We had to write code for each language we used.

This whole scenario got more problematic when we needed to run ad hoc queries. When asked "How many X's do we have that are running Y?", we could only respond with, "Give me a few days to code up a query, test it, deploy it, and then I'll let you know." And sometimes we'd come back with the response that writing the query was too time consuming to justify the expense.

At one point, things got so bad that it was faster to dump the entire database into CSV files and use Excel! Maybe this really is the right solution for ad hoc queries, but it just seems dirty and inelegant.

Now before anyone gets too bent out of shape, it is true that part of this is an attribute of the particular NoSQL database that we selected. The query API for our old NoSQL database is primitive and there is no higher-level query language. The same does not hold true for all NoSQL databases. But few NoSQL databases have complete query languages built-in. (Surprisingly few have command-line monitors that support ad hoc queries.) Switching to a SQL-based database completely alleviates this problem.

3. We have access to better resources

Frankly, it's frustrating to have to build and rebuild even trivial tools for NoSQL databases. And it's equally frustrating to find so many knowledge gaps and documentation shortages. So much time is spent on figuring out and doing the mundane. The SQL database world just looks much better on this front.

There are SQL workbenches. Their are data visualizers. Backup tools. Import tools. Fantastic libraries in all major languages. There are 60 years of theory, hundreds of books, and thousands of articles. There is a degree of consistency (though not perfect) across most SQL-based RDBM systems. There are tried and true methodologies and patterns to follow. There are techniques for scaling, for segmenting, and for replicating -- techniques that work.

And yet with all of this, SQL databases don't come with the kind of enterprise cruft or cognitive overhead that most technologies accrue over such a long lifetime. Most RDBM systems remain carefully focused on providing ACID reliability with high performance storage, not on adding crazy new fringe feature sets.

Even while there are several good NoSQL databases, none really boast the kind of fostering resource platform that RDBM systems have. The field is still so fragmented that I don't see this situation improving in the near future.

The bottom line: choose the right tool

I am at pains to emphasize that I am not bashing NoSQL in theory or practice. I am not claiming that NoSQL databases are valueless, or bad, or necessarily inferior. But right now the landscape is fragmented, inconsistent, and still immature. Great things can be built atop NoSQL databases, and NoSQL solutions are indeed the best for some classes of problems.

But there is a danger in assuming that just because NoSQL is newer it's better. Likewise, there is a danger in claiming that NoSQL solves some problems that RDBM systems do not solve, therefore NoSQL is always better than SQL-based solutions. I was, and still am, a big fan of some of the NoSQL databases out there (MongoDB maybe most of all). But I've found myself more inclined to choose the tool that naturally matches a data model. (This means, of course, that I do the data model before selecting the database.)

Finally, I've realized the value of an established ecosystem. It was fun to write a cutting edge document database using a shiny beta NoSQL storage engine. It was decidedly less fun to write tools to export data, automate tasks, run complex queries, and manage security. And it was downright painful to right one-off programs every time sales, marketing or somebody else needed to "see some numbers". SQL may not be sexy, but it sure is useful.

Published at DZone with permission of Matt Butcher, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Zsolt Kúti replied on Wed, 2014/04/16 - 9:45am

I would like to know in the light of the above conclusions, why you have on the first place choosen NoSQL?

Patrick Mcbride replied on Wed, 2014/04/16 - 6:14pm

I'm not convinced that Item 1 is accurate.  Versant's DB4O, for example, will not save a new object if it is not new to the database.  To use your example, I have an Author object and save it to the the database as a property of a Book object.  I then create another Book object and reference the previous Author object, only one Author is saved to the database.  Depending on how the objects are designed/coded you can easily query for Books by Author.  This includes the ability to load an existing Author from the database and reference it to new books as one would with a foreign key reference.

In the .NET world, LINQ works nicely for querying your objects.  Once a developer becomes familiar with the constructs of LINQ it becomes very helpful for routine list related tasks.  One can see the potential of base queries against the database that could be extended at will to meet the needs of the project as stored procedures and views would.  JAVA 8 approaches the same functionality?


Shashikant Mourya replied on Wed, 2014/04/16 - 11:25pm

I agree with you at the points mentioned, but one thing I would like to ask - "Didn't you do the data modelling task before you decided to go for NoSql?"

I mean studying your requirements and deciding the technologies you're going to use, is the first task you do before diving into the coding or selecting/writing database.

Jonathan Fisher replied on Sat, 2014/04/19 - 6:36pm in response to: Shashikant Mourya

What's humorous is that the OP calls calls relational databases 'old technology', when in fact, relational is 'new' compared to NoSQL. See NoSQL was first, then relational because people realized that analytics on unstructured data is really hard. Most of the first databases were in fact, just transactional containers (see: http://en.wikipedia.org/wiki/Adabas). The NoSQL crowd defenestrated transactions (a good thing for anyone doing anything complicated) and high structure (good or bad depending on your problem).


Good read: http://www.brentozar.com/archive/2013/03/you-dont-have-a-big-data-problem/

Zsolt Kúti replied on Sun, 2014/04/20 - 3:45am in response to: Jonathan Fisher

The link you gave is awesome :-)

Jonathan Fisher replied on Mon, 2014/04/21 - 10:17pm in response to: Jonathan Fisher

I should clarify, after reading my comment it sounds like I crap on transactions.... They're actually awesome. I use them everywhere, and it's one of the most powerful tools. 

Unfortunately most everywhere they're implemented poorly with poor lock management and lots of shared resources. NoSQL threw transactions out the window, when in fact they're very much needed (Transactions are _not_ a business concern). What we needed was global transactions with much better lock (or lock free) data management.

Michele Mauro replied on Wed, 2014/04/23 - 6:14am

If you have "discovered" after a while that your data is, indeed, relational, then you moving towards a relational database is just the obvious thing to do. It's not NoSql's fault; indeed, probably a NoSql db served you very well enabling you to ship while you were still figuring out what your data will look like.

If you knew that your data was relational, but you chose NoSql because it was cool, then it was all your fault; you are just correcting it.

Either way, NoSql is not the tool for your job, so you'll be foolish to use it. Where is the problem?

Robert Weissmann replied on Wed, 2014/04/23 - 6:40am

In my opinion there is no this OR that. Just get Martin Fowlers book on NoSQL or watch this : https://www.youtube.com/watch?v=qI_g07C_Q5I  

In future I am sure there will be hybrits and you have to choose what DB you use for which domain and they will work together. Use both, but make sure you clearly seperate your domains.

Kaveh Shahbazian replied on Wed, 2014/04/23 - 9:39am

The bottom line (my version): Still not conquered all other schools; the bottom line is we will pledge ourselves to Lisp school. Not that we will write Lisp; but the homogenous ecosystem of "code as data" plus itself, revered (data as code).

That being said, it does not mean what has been done so far, was wrong.

But impedance mismatch is real. IMHO that's why NoSql school got some momentum. Because using SQL, we are "analyzing" data. What we need is "to shape data". Rational data has already it's shape; so we have to think in that manner and (in not so rare occasions) adopt our data to fit our tools.

But I am sure; as so many problems got solved so far; this will be too.

Michele Mauro replied on Wed, 2014/04/23 - 10:53am in response to: Kaveh Shahbazian

There is no problem to solve: sometimes your data is fully analyzed, and you need a relational database because your requirements call for that. Sometimes your data is not well or completely structured, and a NoSql solution may be the right way to go.

It's not a "problem". It just depends on what you need.

Tomm Carr replied on Wed, 2014/04/23 - 5:55pm

 I would add data integrity to the list. Data integrity is such a universal benefit of a (properly designed) relational schema that it is often taken for granted. When our data finally comes to rest (that is, when we insert it or update it in our database), we're not just saying, "Here. Hold this," but also "Check this out. Let me know if there are any anomalies in the data."

This is such a critically important feature that the "old" relational dinosaur will be around for quite a while yet.

Scott Warren replied on Wed, 2014/04/23 - 6:51pm

I have had an eye on the whole NoSQL movement to a while and I cannot bring myself to use it. We used eXistDB for while and it served us well But the lesson learnt was that the document thing was NOT relational. We ended up rewritting this on the MySQL database. 

The reason I cannot see how NoSQL fits in to our toolset is because of Reporting. There are great tools for generating reports and none of these seem to play well with NoSQL .

If we need to have "dynamic" structures we now use the JSON Data Type in Postgres too. At least then most of the data is accessible to a reporting tool (and the JSON is too if required). 

Jeroen Wenting replied on Thu, 2014/04/24 - 3:37am

Hence no doubt the sarcastic/synical quotes around "old".

NoSQL is as so many hypes a throwback to "the good old days".
It's old, just given a new fancy name and suddenly it's "hip" and anyone not getting on the bandwagon is considered "old fashioned".

Same with RIA. It's just client/server rebranded.

"The Cloud" (mind the capitals!) is just another name for distributed storage and computing.

"Thin client" (yes, it's old now that we have RIA, few years ago it was all the hype) is just dumb terminals talking to an all knowing server.

The list goes on and on, in cycles.

Jeroen Wenting replied on Thu, 2014/04/24 - 3:42am in response to: Michele Mauro

 " Sometimes your data is not well or completely structured, and a NoSql solution may be the right way to go.It's not a "problem". It just depends on what you need."

what, pragmatism? There's no place for that in software engineering. Have to jump on the latest bandwagon and proclaim it the greatest thing ever, whatever it is...

And yes, that was sarcasm, you're quite correct that you should select the solution based on the problem rather than let a mindless choice of technology determine what the problem should be tortured into being.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.