NoSQL Zone is brought to you in partnership with:

Ayende Rahien is working for Hibernating Rhinos LTD, a Israeli based company producing developer productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com) and more. Ayende is a DZone MVB and is not an employee of DZone and has posted 462 posts at DZone. You can read more from them at their website. View Full User Profile

Why NoSQL is Not Just For Google and Amazon

01.04.2012
| 10927 views |
  • submit to reddit

In fact, for over 30 years or so, the Database Wars have been settled, the relational databases have won the fight, and the decision left was which relational database to use. Everyone “knows” that NoSQL is something that Google invented to handle the amount of data and users they have, and that is it somehow related to scaling to Google or Facebook levels.

And since most of us aren’t working on applications that have millions of users, we get to keep using the tried and true methods, no need to bother with something that is only relevant at extreme high scale. The relational database has served us well, and can continue serving us in the future. Learning SQL was a very smart investment, after all.

Let us go back a bit to those Database Wars that I mentioned. In the 70s there were actually quite a lot of different types of databases competing with each other, anything from ISAM to DBase to relational databases. And as you can see, the relational database has won such a decisive victory that it rules undisputed for over a generation.

But one thing that we have to remember is that those Database Wars were fought on a drastically different ground than the one we have today.

1980 vs. 2012

In 1980, a 10 MB hard disk (that is in megabytes, about 2 songs or 3 pictures) cost around 4,000 $ US. Adjusting for inflation, that comes at about 11,000 $ US in today’s dollars. Just to compare, today a 10 MB of disk space would cost you about half a dollar. And those aren’t the only changes, of course. Computation speed, memory sizes and networks are all many orders of magnitude faster and cheaper than they were in the 80s.

Even more interesting is the differences in the type of applications being built. In the 80s, a typical application had exactly one user. Multi users’ applications had to support… 3 users. All of them at the same time! The UI paradigms were drastically different, as well. At the time, the master – details form was the top of the hill, the uncontested king of good UI design. But today… there are usually so many items and active elements on a single web page today as there were in entire applications then.

Why the history lesson, you ask? Why, to give you some perspective on the design choices that led to the victory of the relational databases. Space was at a premium, the interaction between the user and the application closely modeled the physical layout of the data in the database. That made sense, because there really were no other alternatives given the environment that existed at the time.

That environment is no longer here, and the tradeoffs made when 30 MB would cost as much as an annual salary are no longer relevant. In particular, one immensely annoying aspect of relational database is very much a problem today. Relational databases trade off read speed in favor of write speed. Because it made perfect sense to make this trade off when disk space was so costly, and making users wait an extra second was no big deal.

In separate studies conducted by both Google and Amazon, they found that even additional 100ms added to the latency of a page severely impacted their bottom line. And yet relational database tradeoff read speed (having to do joins and extra loads) for write speed (having to write small amount of data).

RavenDBliconBurgandyAnother problem that pops up frequently with relational database is their inability to handle complex data types, such as collections or nested objects. Oh, you can certainly map that to a relational database, but that requires additional tables, and each additional table is going to make it that much more complex to query the data, work with it and display it to the user.

For many years, I have been working with customers on optimizing their applications using relational databases, and I’ve seen the same problems occur over and over again. That led me to the belief that the NoSQL databases aren’t suitable just for extremely scalable scenarios. NoSQL databases make sense for a wide range of options, and this realization led me to RavenDB.

In my company, we are using RavenDB as the backend database for everything from a blog, our ordering and purchasing systems, the daily build server and many more.

The major advantages that we found weren’t the ability to scale (although that exists), it is the freedom that it gives us in terms of modeling our data and changing our minds.



Source:  http://ayende.com/blog


Published at DZone with permission of Ayende Rahien, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jilles Van Gurp replied on Wed, 2012/01/04 - 2:47pm

There are a lot of good, valid reasons to rely on a relational database despite many, as of yet unaddressed, or poorly addressed issues. For example transactionality is a nice thing to have, support options are critical for complicated deployments and tooling around specific products really can make life easy. However, things like sharding, replication (multi data center), schema evolution, etc. all tend to be kind of sucky. Mysql replication sort of works if you have a mysql dba handy to set it up and manage for you (if so, lucky you). Sharding kind sucks (at least with mysql) and in any case will make doing querying a lot harder). Schema evolution is tedious at best.  As far as I know, this isn't necessarily much better with commercial options like Oracle.

So, a relational database is a safe but lazy option with plenty of downsides. You won't get the best possible performance. You will have to deal with arcane languages like sql that programmers I've seen in action give plenty of evidence of not understanding the very basics of. You will have to deal with certain inflexibilities (like you sql schema is hard to evolve) and you will have to deal with a lot of cruft like hibernate to make the fact that you are using a database tolerable for your developers. Sometimes it is worth the tradeoff but blindly making that tradeoff is in my view dangerous and no longer necessarily valid given the fact that there are now several well supported, mature nosql products out there.

The dirty little secret about most relational database deployments is that they are over engineered data stores that basically were optimized to fit the in memory object model rather than any specific query, storage, or transactional requirements. If people really needed proper relational databases, mysql wouldn't be so popular because it is kind of limited in a lot of ways. The reason mysql is so popular is that relational needs tend to be quite modest.

 I've been building on various nosql technologies as well as mysql over the past year and I appreciate both worlds. The way I see it is that relational databases are all about deciding what you need indexed in your data structures in advance and then developing a schema + DAO layer around that. That's a waterfall type decision making that poorly matches my projects where I have to deal with a lot of uncertainty around such requirements and can't really afford to have those set in stone. Nosql + solr keeps life simple for me.

With some nosql solutions you get limited indexing solutions. With others you need something external like solr.  Either way, it is going to be very different from what you would get with a relational database (though not necessarily worse).

Wwhile solr sucks for things like joins, it excells at indexing and searching tens of millions of records in all sorts of ways that are hard to retrofit in e.g. mysql or oracle. It's not impossible but it is unlikely that you can make it perform as well. So nosql with  a solr based indexing solution can give you a lot of value and flexibility that would be hard to match and even harder to evolve using a traditional sql based solution. I've replaced a shitload of complex joins with a simple boolean solr query on several occasions. It's kind of cool when you see that it produces results in ms where you had mysql pondering results for seconds (or plainly timing out) before.

The google scale is billions of records. Few of us have a need for that. But scaling sql at tens of millions of records is already a challenge and that is well in range of many enterprise needs. Once you hit that kind of scale you have a need for senior dbas and expert sql developers (rather than your average hibernate code monkeys) to optimize you database designs and configuration. I've seen pretty naive designs scale very well on a simple key value store + solr.  I've seen naive sql solutions crash and burn when they met with real data as well. It's kind of hard to do it properly and IMNSHO most java devs I've met suck at doing sql based systems properly. Hibernate will only get you so far pretending that you don't need to understand databases.

So, keeping an open mind is actually pretty sane advice. There are a lot of good nosql products out there that solve real problems that you may or may not have or that you may be unaware off entirely. Sticking with relational has a couple of risks in terms of scalability, maintainability, agility, etc. that you need to take into account before making a choice. It's easy to be conservative and it is a lot harder to take risk and be competitive. To be competitive you have to offer something extra. Be it performance, scaling out, development speed, or other things. If you don't do take the risk, your competitor might and end up being more successful than you. If you get it wrong, you lose. But then SQL as a default choice has that risk as well and only a fool would ignore that.

 

Mitch Pronschinske replied on Wed, 2012/01/04 - 3:45pm in response to: Jilles Van Gurp

Great comments, Jilles!  If you wanted to revise this comment into an article, I'd be glad to help you post it on Javalobby.

 

Nicolas Bousquet replied on Wed, 2012/01/04 - 6:07pm

Some points:

Google didn't invent noSQL. The term is new, but most legacy system didn't use SQL nor RDBMS concept and used simpler key value store. Specialized databases did already exist also for specifics needs like greographical or medical data. NoSQL concepts are not new. NoSQL is a marketing term like web 2.0. Nobody really know what it really mean but everybody over abuse it to benefit of the hype.

You don't have 10MB for half a dollar now, but more about 5GB. Check latest prices.

In the 80s (and before) mainframe applications did exist. That is big computer used by hundred concurrent users, using dumb terminals. You can think about Personal computers that appeared at that time with only one user, but that not really the type of computer that come with databases on it anyway.

The point is not being traditionnal or not, not choosing to have a costly DBA or not. (if you think your noSQL datastore will not need any maintenance you are really naive).

The question is more what is the problem you are trying to solve. This should dicdate your storage choices.

 

Yaron Levy replied on Sun, 2012/06/10 - 10:43am

I checked the source control, and the source control history for this goes back only to the beginning of February 2012. I assume that this is older than this, because the codebase is pretty large.

But to summarize, what we actually have is a highly abstracted project, a lot of abstraction. A lot of really bad code even if you ignore the abstractions, seemingly modern codebase that still uses direct ADO.Net for pretty much everything, putting a WCF service in the middle of the application just for the fun of it.

Lukas Eder replied on Sat, 2013/12/14 - 2:19pm

But... storage and hardware issues weren't the only reasons why there was need for a relational model. In fact, that was the least important reason for inventing relational theory and SQL. A very important key to success of SQL is the fact that it is a standard, which is implemented pretty well by most RDBMS today. Before, you had millions of ways to query databases, and all vendors had to re-invent the wheel afresh in all aspects. SQL solved this issue as well.

Today, with the NoSQL vs SQL debate, we're again back in pre-Codd times where hundreds of vendors compete with vendor-specific models, each one solving only a simple little problem as can be seen in this blog post:

http://www.opensourceconnections.com/2013/12/11/codds-relational-vision-has-nosql-come-full-circle/

While they might solve that simple problem better than most SQL vendors, SQL also solves 100s of other problems pretty well. And SQL has already integrated OLAP, hierarchical queries and some NewSQL features (column stores). Why shouldn't SQL eventually integrate "NoSQL" features (mostly, unstructured querying and graph databases)?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.