German has posted 19 posts at DZone. View Full User Profile

(Distributed) Object Database Tutorial

04.25.2010
| 10460 views |
  • submit to reddit

Unlike the developers of the pre-internet era, today's developers are writing inherently object oriented code.  The object oriented approach is required, as software complexity is growing under the pressure to deliver higher business value while dealing with the challenges of larger amounts of data and increasingly concurrent systems.  

So, we're writing our applications using object modeling, but just as we had pre-internet era, we're still mostly storing our data relationally.  This is causing big time problems as the relational database is a runtime relationship engine and needs to constantly re-calculate relations.  Yes, that's right, it's called a relational database, but it does not store relationships, it stores discrete data values and then at runtime it uses set based logic to parse the discrete data and "relate" the discrete data as it is requested.  Not hard to imagine how that becomes problematic when you start dealing with millions of instances ( oh excuse me, records ) that are related in your object model, not to mention recursive relationships, networks of objects, collections of collections, etc.   This is why we are seeing the emergence of non-relational database dominance in the data warehousing space, why we are seeing the emergence of NoSQL solutions in the web space and why even those still holding on to their RDB's are commonly only storing unrelated data and building layers to manage identity (relations) within the application space.

Since you are an object oriented developer, why not store your objects in an object database and bypass the runtime re-calculation efforts.  You get much better performance and at the same time, eliminate all of the effort required to do mapping and the creation and synchronization of a separate data model.  Object databases store the relations as first class citizens inside the database and let your application model represent your database schema (yep, no separate DDL). 

For example, when your application needs to find the shortest path between 2 people in a social network, the relationships between each successive level of people are immediately available without the complex self referencing JOINs needing to be calculated across those 100M people.  It's as quick as a b-tree lookup, no matter how many people are in your database.  Its as easy as taking a person reference and sending the getRelations( ) method which returns the collection object holding the related people, then creating an iterator and using it.  Sounds like an ORM right, but that's because the ORM API's originate in the object database space.  Under the covers though, you still have boo koo mapping with the ORM and your database will still be runtime re-calculating relations.  Using the object database you don't have the mapping and you have your relations already stored as part of the database.

Now here's a really cool point.  Even though you're using and storing objects directly into the database and don't need to runtime calculate relations, you can still query!   It's not a big deal, it's just using classes and attributes instead of tables and columns.  Even better, because an object database understands things like inheritance and polymorphism, you can even do neat things like query on a super class and return all instances of subclasses that satisfy a predicate criteria and sort those results up and get them back as a collection of objects.  There are all the usual query operators: logical, character matching, arithmetic, containment, aggregations, etc.   

Ok, lets go one step further.  The object database manages identity from the client coupled cache, so it can deal with data distribution across many physical machines and make it look like one big database.  Imagine, you do a query, but it executes in parallel across N physical machines or some application defined logical subgrouping.  Then when you use the results, related objects can reside in any physical node in the system, it just looks like one huge database.  You're in a transaction and you create things, delete a few things, change some things and commit.  You don't have to do anything to deal with the data distribution, the object database knows how make all of your changes push out to transaction active nodes.   Of course, it's not magic, you do need to think about where you want things to go when you create them, just as you do in any system of scale requiring partitioning, but the burden ends there at implementing the partition strategy. 

Object databases come in lots of varieties, so you should look carefully at the underlying architecture and implementation to make sure it suits your needs.  These things are wicked powerful and once you give them a try, it's very difficult to go back the pre-internet approaches of data modeling under the application domain.  Given my background I can tell you about a couple of them from Versant. One is called db4o (database for objects) which is popular open source embedded application object database and the other is called the Versant Object Database (VOD), a commercial implementation which provides enterprise grade features like fault tolerance, SNMP monitoring, distribution and more. 

db4o was introduced to Java developers in "db4o: Simple POJO Persistence", an article that can give you a heads up on db4o's flexibility and ease of use (we could call db4o the "swiss army knife" of persistence for the modern OO developer). You can find tons of resources to get started with db4o if you point your browser to http://developer.db4o.com and very nice intro videos in http:.//db4o.blip.tv (both for .NET and Java).

With regards to db4o's big brother, VOD (http://www.versant.com) we would like to share a presentation with you by Robert Greene (Versant's VP Open Source Operations). This hands-on presentation takes a working in memory Java application and shows every step necessary to add an object database as the form of persistence. It then takes the application and makes it fault tolerant and distributed and then shows off some parallel queries. Finally, tools are used to analyze performance and show cache load optimization. All done from soup to nuts in ~45 minutes.

 

Video Index:

00:00:00 Intro
00:04:00 Summary of hands on exercise
00:05:48 How to make domain classes persistence capable
00:11:52 How to create sessions and define transaction boundaries
00:16:50 Queries
00:18:20 Database creation and use of monitoring tools
00:22:03 How to make the app fault tolerant
00:26:25 Transparent distribution/migration mechanism
00:33:22 Parallel distributed queries
00:37:13 How to do performance optimization and monitoring

Published at DZone with permission of its author, German Viscuso.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)