NoSQL Zone is brought to you in partnership with:

I have been in the Software business for close to 30 years now, and I have 25+ years with SQL-based relational databases. Although I have used Unix even longer than that, I am pretty much an operating system agnostic. Over the years, I have work in many positions, from support engineer to sales engineer and consultant. Anders is a DZone MVB and is not an employee of DZone and has posted 50 posts at DZone. You can read more from them at their website. View Full User Profile

NoSQL for us RDBMS folks - MongoDB 101

12.01.2011
| 6221 views |
  • submit to reddit
As you probably know, I have been doing RDBMS work for many years, some 25+ years by now. At Recorded Future I am the database architect, and although an RDBMS is used extensively, MySQL in this case, we are looking at options, and are currently doing more and more work using a NoSQL Solution, probably te most popular one by now, namely MongoDB.

And before you complan: NoSQL is not a good term, but someone with a longer NoSQL background should then find something better, not yours truly. And for all intents and purposes, you know what I am talking about, right?

I plan to post a few MongoDB posts here, how it looks like from an RDBMS dudes (like myself) POV. This is the first installment, but there will be more. I should also point out that I am no expert in NoSQL technologies in general, nor specifically in MongoDB, but I am trying, and our MongoDB setup is large and complex enough to serve as a decent example: We have a couple of Tb of data in Mongo, we use Replication and we use Sharding / Clustering whatever you want to call it. In addition, we run this puppy on Amazon EC2.

So, why did we move to MongoDB, and what are the results? I think the reason is two-fold:

  1. Better performance. Not better performance per se, but by using sharding and large in-memory stores, we can achive this.
  2. Better and easier to set up scalability. Again, this doesn't come by itself, nor is it without problems. But the built-in sharding in MongoDB actually does work reasonably well, and is largely transparent to the application.

This sounds great, doesn't it? Everyone wants better perforamnce and better scalability, right? And why can't we have MySQL to scale in the same way as MongoDB? Well, there are drawbacks to all this, but in our case, I think we can live with it.

An important thing to note here, I think, is this: Scalability is transparent and easy to use and implement because the basic idea of a NoSQL system (a key-value store in this case) is so simple: One unique key points to one value. That's it. This is easy to shard. Dead easy. A multi-table relational schema is much less so. But you could simplify the schema in MySQL to make it just as easy to shrd, but then what is the point of an RDBMS at all? And MongoDB Scalability is an incredibly important part of MongoDB performance (in addition to it's in-memory nature).

In our case, using Amazon EC2, scaleability is really important. A single Amazon instance is limited in terms of memory and CPU power. The limits are high (some 70G RAM for example), but there is still a limit. If you have, say, 500 Gb or a Tb or so of data, 70Gb might not be that much, so you have to shard or cluster or something. And this is where MongoDB rocks, it is easy to shard, the simple data structures in a NoSQL database mena this this can be done largely transparent, and all means that MongoDB Rocks for us.

Do we have other options? You bet! MySQL Cluster, NimbusDB (aka NuoDB), Oracle, Cassandra, etc etc, but we ended up with MongoDB for now.

So from a DBA POV, what is mongo then? To begin with, there is no SQL interface and no "query language" per se, rather to the user, MongoDB looks a lot like a Java Script environment with a lot of space for variables, sort of. MongoDB has databases, and in databases there are collections, which are sort of like tables in an RDBMS. A collection is a key-value store, you can define a key for the collection, or MongoDB will generate one for you. MongoDB is schemaless, which means that a collection doesn't not have any predefined columns or anything. Instead, if an object that you store has an attribute foo then you assign a value to it in your object and insert the object. That's it. And you can have an index on foo if you want to. And index in this case is a traditional B-Tree index, nothing more exciting than that.

An Object as above in Mongo is a JSON structure, which internally in Mongos is represented as BSON. To access the collections you use Java Script as I said before, like this to insert an object into the collection bar with the attribute foo:

db.bar.insert({foo: "foobar"});

In SQL this means: INSERT INTO bar(foo) VALUES('foobar')
Used this way, Mongo will assign the object a unique id, called _id, which is similar to the PRIMARY KEY in an SQL RDBMS. To retrieve the row we just inserted, use the command:

db.bar.find()

Which in SQL means: SELECT * FROM bar
And you get something like this back:

{ "_id" : ObjectId("4e8c9b2e2fde67676c2381ae"), "foo" : "anka" }

MongoDB has a big bunch of built-in Java Script methods, what I showed above was just a very basic sample. To use Mongo, knowing Java Script and JSON in general is more or less a requirement, and if you don't know these, and start using MongoDB, you will know then soon.

I will write some more blog-posts on Mongo eventually, but before I close this post, let me tell you one more thing. I never created the collections (bar) I used above. The reason is that yiu do not have to, it is created for you as you access it. The same goes for databases. This is the reason that you often end up with a bunch of misspelled databases and collections in mongo, which you can of cource drop after you have made a mistake, but as we are all lazy, you mostly have a bunch of non-used databases and collections.

/Karlsson

Source: http://karlssonondatabases.blogspot.com/2011/10/nosql-for-us-rdbms-folks-mongodb-101.html


Published at DZone with permission of Anders Karlsson, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Aniceto Garcia replied on Thu, 2012/01/26 - 4:38am

Despite the many manageability improvements claimed by RDBMS vendors over the years, high-end RDBMS systems can be maintained only with the assistance of expensive, highly trained DBAs. DBAs are intimately involved in the design, installation, and ongoing tuning of high-end RDBMS systems. NoSQL databases are generally designed from the ground up to require less management: automatic repair, data distribution, and simpler data models lead to lower administration and tuning requirements — in theory. In practice, it’s likely that rumors of the DBA’s death have been slightly exaggerated. Someone will always be accountable for the performance and availability of any mission-critical data store.

Java Collection

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.