Nitin has posted 391 posts at DZone. View Full User Profile

A Data Stores Comparison

04.19.2010
| 10742 views |
  • submit to reddit

Joe Ottinger, Senior Developer at Gigaspaces, has written an interesting article that provides an evaluation of a wide range of data storage solutions for architects.  Joe compares the following data stores in his article:

  • Relational Data Stores (JDBC)
  • Object-Oriented or Hierarchical Data Stores (i.e. Hibernate, other ORMs)
  • Object databases (i.e. DB4O)
  • Document-oriented data stores (Javaspaces, Blitz, Gigaspaces)
  • Java Content Repository
  • Memcached
  • mongoDB

....he compares these data stores through a number of different lenses:

  • Popularity
  • Ease of use
  • A rough analysis of strengths and weaknesses
  • Speed of data access
  • Flexibility of generating large reports (which involve spanning lots of data.)

 

Throughout the article, Joe also compares the performance of the various data stores using a basic benchmark he has created.  He uses the benchmark DAO to perform a number of simple, concurrent operations, using a simple data model.

The following is an excerpt from the article, examining the relational data store model:

 

JDBC

The first data store for Java we’ll look at is the basis for most data stores: JDBC, the Java Database Connectivity API. This is basically a mechanism by which Java applications can issue SQL through which a relational database can be accessed.

Relational databases are the heart of the “SOD” approach, and when people talk about persistence, they usually mean that the data is being stored in a relational database somewhere. It’s probably a conservative estimate that 99% of the world’s data is stored in relational datastores – chances are it’s closer to the magical five nines (meaning “99.999%”) than it is to a mere 99%.

Most interactions with JDBC are streamed, which means that from the application’s perspective, a set of records being returned from a SQL query are held in memory one row at a time. This is actually very good, because it means your application can start processing data as soon as it comes from the database, as opposed to waiting until the entire dataset is ready.

Relational databases tend to be pretty fast for most query types, provided your query is simple enough. SQL is also good enough that you can usually make any query work well, especially if you can hand-tune it (and you really, really know what you’re doing.)

The good news is that JDBC is pretty simple (if slightly verbose), and relational databases are pretty ubiquitous. The bad news is that JDBC is fairly verbose, and relational databases tend to be fantastic at data retrieval and not so great at update transactions.

JDBC also has a, well, scarcity of data types. That doesn’t really sound so bad, except it means that you – the coder – tend to have to write conversions or validations yourself, even if the underlying database supports custom data types.

To understand more about this, think about a latitude/longitude pair. Latitude and longitude can be represented as a pair of floating point variables, so it makes sense that you could store a pair as, well, two floats. However, that puts a burden on the database: both variables must be present if either of them are, any ranges have to be coded either by the programmer or by the DBA (depending on the database’s capabilities), and it’s possible to get partial data (“give me only the latitude, not the longitude.”)

Likewise, an address is typically split out into its components (two lines of street address, possibly an apartment number, city, state, and zip code – if you’re only considering addresses in the United States!) For querying on city, that’s fine, but it still puts a lot of burden on the developer.

As a contrast, an object database can store an Address data type, and can still query by city without having to treat it as an entirely separate column in the data store. What’s more, an object database is not unique in this regard.

The data types issue is also what led Java into the object-relational zone, what Ted Neward rather colorfully called the “Viet Nam of Computer Science.” More energy has been spent trying to bolt data types onto JDBC than almost any other problem space in Java (with web applications frameworks running a close second, and perhaps griping about how haaaaard JavaEE is being a close third.)

Object-relational mapping is horribly useful, and it’s also a great way to suck the life right out of a promising application. Virtually everyone uses ORMs now instead of JDBC; they’re the other part of the “SOD it all” mentality.

Actually using JDBC directly from code is fairly quick in terms of code runtime; the worst aspect of using JDBC from a performance perspective is in actually acquiring a database connection. For this reason, most JDBC users rely on a database connection pool (JNDI for Java EE applications, or Commons-DBCP or something similar for standalone applications or developers who are unreasonably afraid of JNDI.)

From a coding standpoint, JDBC is verbose but simple. The basic structure of a JDBC query looks like this:

Connection connection=null;
PreparedStatement preparedStatement=null;
ResultSet resultSet=null;
String query=“select foo, bar from baz”;
try {
connection=acquireConnection();
connection.setAutoCommit(false);
preparedStatement=connection.prepareStatement(query);
resultSet=preparedStatement.executeQuery();
while(resultSet.next()) {
processRow(resultSet);
}
connection.commit();
connection.setAutoCommit(true);
close(resultSet);
close(preparedStatement);
close(connection);
} catch(SQLException sqle) {
sqle.printStackTrace();
connection.rollback();
connection.setAutoCommit(true);
} finally {
close(resultSet);
close(preparedStatement);
close(connection);
}

In this code, acquireConnection() gets a Connection from JNDI or some other mechanism.

processRow(), which is really poorly named, handles a row from the ResultSet (which would probably normally accept specific values from the ResultSet rather than getting the ResultSet itself); since we’re using transactions, assume it does something strange, like update the database somehow.

The various close() methods would call the close() method on the closeable object assuming it’s not null or in an error state.

This code will generally run properly and fairly safely and quickly under most normal conditions, assuming the SQL query isn’t complex, and the update process isn’t slow, and the transaction doesn’t build up to thousands of queued updates… you get the idea.

The benchmark used Spring to manage JDBC operations, which simplified them dramatically. The DAO’s read operation looked like this:

@Transactional
@Override
public ModelContainer read(Object key) {
try {
ModelContainer container = jdbcTemplate.queryForObject(
"select id, name, description from model where id=?",
new Object[]{key},
new RowMapper() {
@Override
public ModelContainer mapRow(ResultSet resultSet, int i)
throws SQLException {
ModelContainer mc = new ModelContainer();
mc.setId(Long.valueOf(resultSet.getLong("id")));
mc.setName(resultSet.getString("name"));
mc.setDescription(resultSet.getString("description"));
return mc;
}
});
return container;
} catch (EmptyResultDataAccessException erdae) {
return null;
}
}

Complex graphs (or relationships) between tables are easy to manage; transactions would factor in there, too, but they’re fairly straightforward. You’d simply update both tables in a given transaction, and commit when the data is right; any errors that the database can detect (through foreign key relationships) would throw an error and you’d be able to manage the problems from an exception handler.

Again, JDBC is verbose, but simple enough. In many ways, that’s good – it means that few coders will look at JDBC code and not be able to understand what’s going on, assuming your SQL isn’t incredibly complex. (It may be. Some SQL queries are maddening.)

As you can see, dependency injection frameworks like Spring can make much of the boilerplate code around JDBC disappear; the transaction support and template capabilities can make all of the error handling and connection acquisition code disappear.

JDBC Summary

JDBC is the lowest common denominator for relational database usage in Java, serving as the underlying data storage mechanism for almost every ORM (Hibernate, et al), and other data storage mechanisms too (for example, most Java Content Repository implementations in production use a database in the end unless the user’s insane. Don’t be insane, users.)

CRUD in JDBC is very straightforward, but suffers from repetitive and verbose patterns (see the code example, which could serve as a boilerplate for all JDBC operations.) The same goes for querying a SQL database. SQL can be hand-tuned by a competent DBA, and accessing the specific database’s capabilities is very easy (stored procedures, anyone?) Also, failures tend to be dramatic and simple, as the type mapping isn’t very flexible; do it wrong, and JDBC will let you know pretty quickly.

Transaction support is horribly common, being supported by the basic API.
Generating reports with JDBC is also pretty efficient, because the schema typically reflects what reporting engines need: rows and columns of related data that can easily be summarized.

Finding support for JDBC is pretty easy; the Java part isn’t that complex in the first place, and SQL is, shall we say, very well-known.

JDBC can be a performance bottleneck; as JDBC does no caching in and of itself, repeated queries for the same data will hit the database server over and over again. (Some databases will cache responses of common queries, but the data still has to be transmitted to the application.)

The benchmark data for JDBC was quite good:

Average write time 7.30ms
Average read time, keys in serial order 1.13ms
Average read time, keys in random order 1.09ms
Average delete time, keys in serial order 3.93ms
Average delete time, keys in random order 4.09ms
Average query time, value in serial order 1.16ms
Average query time, value in random order 1.12ms

Most Java developers no longer use JDBC directly (or don’t admit to it); most use an ORM instead.

You can read the complete article here.

 

Published at DZone with permission of its author, Nitin Bharti.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Andy Leung replied on Tue, 2010/04/20 - 9:25am

I thought Data Store is the storage? JDBC and ORM are basically API frameworks, ain't they???

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.