Evolving JDBC and Persistence for the Enterprise - A Tech Chat with Jesse Davis
DZone recently sat down with Jesse Davis, Senior Engineering Manager at DataDirect Technologies. In this interview, Jesse talks about how the JDBC (Java Database Connectivity) API is evolving to meet the needs of the modern enterprise, both from a performance and security standpoint. Leading up to the JDBC 4.1 release, which is expected early next year, he discusses some of the work currently being done to better integrate JDBC and JPA (Java Persistence API), including support for bulk load, data source annotations and improvements in the date/time and timestamp API (via JSR 310). He also delves into the object-relational impedance mismatch and how he sees the data persistence landscape evolving over the next several years.
The complete transcript of the interview has been provided below.
DZone: We're sitting today with Jesse Davis, Senior Engineering Manager at DataDirect Technologies, responsible for the JDBC product development initiatives. Jesse, welcome. Can you tell us a little bit about yourself?
Jesse: Sure. I've been at DataDirect since 1997. I joined while I was still in NC State University here in Raleigh and I worked initially on our ODBC development; I moved over into the Java side right before the crash of 2000.
I've been working in JDBC for over 10 years now. I'm a member of the JDBC expert group and what I do at DataDirect is a lot of research into new functionality--things that are going on at the Java space--and then figure out ways to put that into the product to make it better.
DZone: What fundamental problem does JDBC solve?
Jesse: Well, the easy answer is that JDBC allows you to access data from Java. That's an easy answer. The better answer though is that it allows you standards-based access in Java. For example, when ODBC was originally written, the biggest problem that we had was that every individual vendor had their own way to access data in the database.
So you couldn't easily have an application that was built on Oracle and move it over to a SQL server database. You would incur the cost of changing your code and re-qualifying. It was a real pain in the rear end.
And so what JDBC does, is allow you to have a single API into a driver, whether it be Oracle, SQL server, DB2 and allow you to write your code in such a way that you don't have to concern yourself with the underlying data source. Providing that standard is a really good way of allowing you to move those applications without having to incur those costs.
DZone: When would I consider using ORM (Object Relational Mapping) versus pure JDBC? Are they mutually exclusive?
Jesse: They're not mutually exclusive. I think they're more symbiotic and I'll explain. ORM technologies are based off of writing your code in an object way. We're humans, we think in terms of objects: a camera, a lightbox, a chair, our glasses. We tend to think in terms of objects because that's how our brains process those things.
Relational-based data access through SQL or a JDBC driver, gives you a lot more control, meaning you have control over the SQL statements that you write. You have control over more of the data that comes back.
With the ORM, the programming style is just different. So it's mostly a matter of taste. Some would argue that it's a matter of power, but I think it's mostly taste. And when you come down to it, the ORM model is an excellent model when designing for very complex systems.
But in order to use an ORM, you're going to have to live with some sacrifices. You're going to have to sacrifice some performance because again, an ORM is useless without data connectivity. And so the ORM is built on top of the driver. Again you have the ability and the query language - I'll take JPQL for instance. It's an object-based query language. So you can query your objects but when that object model gets down to the wire level, it still has to be translated into a SQL statement before going into the database. And so you have to live with giving up a little bit of your control of your SQL statements in order to use the more ease-of-use features of an ORM model.
DZone: How has JDBC evolvoed over the last several years? How has it evolved to address the needs of the modern enterprise?
Jesse: We've come a long way. JDBC started out with the initial version years ago being very - how would I say that? Being very "feature-lacking" for my bad English. But what we've done as the JDBC expert group is we've tried to listen to the community. As in any type of Java community process, the JDBC JSR has really tried to listen to customers’ needs and put those into the drivers, into the standard, in a way that makes everybody happy.
To give you some examples, we started with ODBC and there were some holes in the ODBC API that we sought to plug with the JDBC. More enhanced metadata, more auto-generated key support - the whole idea that you would have an object-based data access API on top of something relational was a big deal back in those days. But over time, some of the things that we've had to add are more enterprise features. Security, single sign-on, Kerberos, NTLM authentication - all of these things that we needed as we moved through the early 2000s, so that we could integrate our drivers with more complex application servers and enterprise features.
Another feature that's been really big in JDBC, for us at least, is working with other technologies. In the Java space, as you mentioned before, ORM; but aside from ORM, JDBC is embedded in app servers, we use them in application servers. You use also JDBC in other types of dependency injection models. I'm thinking OSGI.
So things continue to evolve and we've tried to make sure that we meet the needs in the API itself of our enterprise features and what our customers truly want.
DZone: So earlier this year at JavaOne you had met with the JDBC and the JPA expert groups. Can you tell us a little bit about some of the discussions that came out of that?
Jesse: Sure. JDBC 4.1, we're working on getting that pushed through into Java 7, which is coming out next year. There are a lot of features going into it but it is a maintenance release.
We just had 4.0 released late last year and a lot of the things that we put into 4.0 were just big ticket features. We had tons of metadata enhancements. We enhanced the blob in the clob API, lots of usability stuff. We didn't get in annotations. We really wanted to, and so you're going to see some of those we're going to push back through the expert group and into the JDBC 4.1 specification.
There are a couple of interesting things that we had on the list at JavaOne. When we get up and we do our JavaOne presentation, if you want to go take a look at the slides, you'll notice how we broke it up: first, we went over the features that are planned, the things that are solid. Then we went over what we're researching in the expert group and the things we wanted to get into the spec and we wanted feedback on. One of the things that we asked for feedback on was - I guess there is no other way to say it, but SQL interception.
So you would pass a SQL statement to the driver and be able to have an event list or some type of handle at a lower level and intercept that before it's sent to the database so you can modify it or change it. This has been a request so for instance, you're being used under Hibernate. So you're using our driver under Hibernate, you say select M, movies M from some object. You get back what you want but you want to tweak your SQL statement a little bit. So how would you do that? Well, you can't because your JDBC drive is under the ORM. So what this will do is it will allow you a handle, so that you can not only see the SQL that the ORM model is generating underneath, but allow you to change it so you can add better filtering through a ware clause or something.
Another aspect that we're looking at is a performance feature. We call it Bulk Loading, we're coming with a bulk interface. That's one of the things that is on the road map, that people got really excited about at JavaOne, but we're not sure yet if we're going to be able to have the interface defined in time to make it into Java SC7.
DZone: Bulk Load sounds like a game-changing concept. Can you tell us a little bit more about it?
Jesse: I'm glad you said that, it gets me excited when I talk about it. We've come up with a way to bring a technology that has been locked away in C-code into the Java world. This is code that has been locked up in C-land for decades.
I'll take a second and explain what it is. Every company that I've ever interacted with has to move huge amounts of data, just enormous amounts of data. I'm not talking two-three megabytes of result sets. We've come across companies that are using FTP as the transport for just 300/400 GB of data per night.
So you think about where we've gone, in the data space, right? We're now using data warehouses. It's very rare to find a company that has a single database in their entire organization. They usually want to standardize on one.
I'll give you an example, say you standardized on DB-2, that's your data warehouse. But you have some data over here in SQL Server and some over here on MySQL, and some over here in Sybase. What you run all of your reports on, what you have is your master copy, is in you data warehouse - master data management, whatever terms we want to wrap around that - so you have to move all of that data. Then all of the reports, your dashboards for your executives, those types of things are run against this warehouse.
There's a lot of data moving going on; lots of products out there that do data moving. Traditionally, though, in order to move that data, you had to use something called a bulk-moving tool or bulk-loading tool. An example would be a SQL loader, or a Sybase SQL Server BCP or the DV-2 load command. These are tools have been around for decades, that people use and build in their infrastructure in the back-end.
After a while, it gets messy. You start off with a single project, that has to move your data and you have an acquisition. Or you grow. You end up with two or three more. Then you have more and more data to move. Your executives are expecting your dashboards at the end of the day, and all of a sudden you're doing data moving for six hours and your window of getting that dashboard done shrinks and shrinks.
So using those bulk tools is necessary; but the question is, could they be used in a standard way. It's the same argument that we had with why ODBC was invented. You've got SQL loader and BCP and load and they're all different.
What we needed was a standard way to do that. The only standard way to do it up until now, in any API, has through a batch mechanism like JDBC batches or something. That's traditionally been too slow.
What we needed to do was unlock that bulk wire protocol and bring that into the standard space, so that people could design these back-end bulk moving systems in a way that can grow with their enterprise and be faster.
These bulk tools, they're developed, but they haven't been kept up as much--they're these back-end type tools. We're bringing that out in the light; bringing into the Java space, so not only do you not have to worry about which platform you're running it on, but you can now move VSAM data to an Oracle database, to your Oracle data warehouse, in five lines of code. And it's faster than the old bulk tools.
We're using the same protocol, but we're not using the tool--which is a lot fatter than a driver, so it doesn't have that same wire level socket type of performance enhancements that we have. People just love it. That's something that we're really looking forward to making sure we get exactly right, and get into the spec.
It is something that we do at DataDirect, as well, when we come up with an idea and we want to push it into the spec, we put it in our drivers, so it's available now for our drivers. Then we work with the vendor companies to pull that into a standard space.
We're seen more as a Switzerland of data access; we don't have the same pressures that a vendor like Oracle or Microsoft would have, so we tend to do something and then we roll it out to a community and push it into the spec, so it becomes a standard, and everybody's happier.
DZone: So being the Switzerland data of access, I presume it's going to be fairly easy to tie the bonds more tightly between JPA and JDBC, in the coming years?
Jesse: Yeah. That's good question. So the JSR team and the JDBC expert group, we work with them pretty tightly, just like the vendor databases, the ORM tools--there's different camps that we're working with, and JDBC is obviously the JPA standard. It's the standards best way of doing it; Hibernate is wonderful and the JPA guys took that feel of Hibernate and prodded into JPA. We aren't looking to do tighter integration in the expert group. Bulk load is a performance feature.
Usability wise? ORM, we've got to make sure we remain tight there. We get specific requests from the JPA group on things that we can do within the JDBC standard. I'll give you an example of one. The date/time and the time stamps, everybody complains about that. Guys, we hear you. We hear you, and we know it's a problem. With JDBC, we don't have, for instance, timestamps with time zones. But Oracle and SQL Server do.
How do you deal with that mismatch of having your data stored somewhere with the time zone and getting it in another time zone? That's a huge pain for people, and we're working through that. People can follow JSR 310, the date/time and timestamp; we're going to be working through that with the JPA expert group.
Annotations are another one. Like I said, we are going to have at least a data source annotation. That's not just for JPA, more for the J2EE team, but we are looking for tighter integration between some more of these standards to give users a better experience.
DZone: How are organizations improving performance on the back-end?
Jesse: Performance is a big topic. It was a big topic at JavaOne this year, too. Intel has been working closely with Sun Microsystems to do things like smooth out the garbage collector. John the Collector is one of my favorite blog stories, he's fantastic. He's a garbage collection wiz.
Another thing that's really moving the industry, first of all is multicore. You can't even buy a laptop nowadays with a single core processor in it. That's going to continue to grow. You're going to see servers with more and more cores. Intel and IBM, they have 80+ core demonstration machines that they have out.
Being able to split individual tasks at a code level across multiple processors is something that’s really big in the industry. I'll give you an example. There are two schools of thought on it. You've got the guys who say go through code --and this will apply to drivers too-- and find places in the code where you can take a section and then spawn off another thread and run it.
The problem with that is it scales, but it doesn't scale linearly with the hardware. If I find 20 places in my code to spawn off threads, and I'm running on 100 core machine, I'm still not utilizing as much as I can.
The other school of thought is to build a concurrency framework. C++ has a concurrency framework that's fairly stable. I can't remember what professor came up with it, but what it does is, you program to the SQL framework, SQL, and you program everything you do on top of this framework. Everything that goes through the framework, the framework takes care of moving in on different processors underneath.
We're doing a lot of research in those areas as to which of the two we would want to pursue at a driver level. There are several that we could do in either camp, but where we are going to go and end up with, we don't know, but that is certainly a hot research topic.
The next one is 64 bit. It is very, very difficult to find a 32 bit processor anymore. I would say that in the next year or so, it is going to be darn near impossible to find a 32 bit operating system. It is all going to be 64 bit. When was the last time we saw a 16 bit, right? It is going to keep growing over time.
Fifty years from now our grandkids are going to be saying, I can't believe granddaddy programmed on a 64 bit machine. Right? As machines continue to use that extra memory, those extra address spaces, it is something that we need to take consideration for from a performance perspective. As an example, there are some places in your code where you might be using integer based arithmetic to build up something, where you could you could use longs instead and take more advantage of that 64 bit address space.
Those are two of the really big areas that are hot in performance. The SPEC J guys, Intel, Sun, everybody is focusing on it, and we are focusing on it in the JDBC and the driver space as well.
DZone: How is JDBC evolving to support parallel programming?
Jesse: Right. Some of those: there is a JSR-166, there is a fork/join framework, and the concurrency framework. We are studying those to see if it make sense to offer some more concurrent operations at the API level.
Right now, I will give example of one of them, like cancel. You can send over a statement and if that statement is not responding you can send a cancel request. Write on a different thread for that statement.
So that is one way of something that you can do in parallel with a driver. We are going to be hindered a little bit by the wire protocols, the implementations of the database themselves. One good example is bulk load. I am going to keep going back there because I love it. Bulk load operations, some of the databases allow you to be able to send multiple chunks. So, instead of using a single socket, can we use multiple sockets? When I connect to Oracle should I take up one or should I take up three? Are there things that we can do in parallel at the API level to allow users to code their applications that way? That is what we are looking at right now.
DZone: Has our industry reached the point where its finally ready to overcome the object-relational impedance mismatch?
Jesse: That's a good question. People are really, really passionate about one or the other. You could do a Google search for that very question, or just do object-relational impedance mismatch and you will get hits. Nine times out of ten at the bottom of the post, there is a bunch of guys arguing about what the guy said in the post, whether he was good or not, but I have framed myself here. We are getting close.
With JDBC, the API itself is object based. It is still not the type of objects like we eluded to before that we think of every day. So, when you do a fetch you get back a result set object, instead of a movie object, or a person or employee object.
I do think that we can satisfy both schools of thought with some advances in what we have today. I don't think we will ever satisfy the relational guys, because if you have been a SQL programmer for 20 years, you are going to program in SQL. If you were on my team, I would want you to still do that, because that is what you are great at. If you are designing for some more usability stuff or you need a lot more flexibility, like hierarchical relationships, then ORM is definitely the way that you have to go.
So overcoming that mismatch, I still think it is always going to be slightly orthogonal, maybe not completely but slightly. I say that because we have had the whole back and forth on object databases, right? Object databases just, to be honest, didn't last very long. We had object databases for a while so why didn't people like them? They love ORM. Why do they like to store their data as objects, because it is not efficient. I am not going to get into the performance characteristics because I will get flamed, but even though we like to program in ORM, we like to store it in relational. I think that the ORM model as it exists today is a good way because it satisfies everybody.
Even though it sounds like we are all arguing, we are all pushing for the thing we like the best, and we like how we code. So that to me is a success because I can sit in a room with someone who is comfortable programming pure relational, and then go over and talk to someone who is comfortable pure object, and they are both happy with how they do it.
I think going forward, what we should do is continue the integration between JDBC and JPA. As JPA, especially JPA2 gets more mature and more accepted, you will see a lot tighter integration between those two, and it will lead to a better experience for the ORM guys. I am hoping to get to the point where the relational guys can't say things like, you don't have as much control over your performance. You know, those types of things. And in the JDBC realm, we are open to the possibility of adding additional hooks in.
Another thing that I would make as a point is as you go to ORM overcoming that mismatch; we have to be focused on clean JDBC implementations. We said we are the Switzerland of data access. One of the ways we prove it is by not accepting vendor specific hooks into the API. Our job is to make sure that your data access is as good as it can be, and for our JDBC drivers we want what we call a pure or a clean implementation of the spec.
When you throw an ORM on top, if you have to have a vendor specific statement method, you can't do that casting without modifying the code of the ORM. So unless you want to go modifying a JPA implementation or a Hibernate, you need to make sure that those JDBC drivers are clean. By clean I mean they adhere to the standard and do things in a way that is still flexible, though.
I will give you an example: if you wanted to tune memory, your memory usage in Hibernate, with working with the amount of memory used for each parameter binding. With certain vendor drivers, you would have to go in and on your statement cast it, and then use a specific method, and we just don't like that because it is not clean. If you wanted to do that under Hibernate it is now impossible to do that with that driver, but you can do it with our driver.
What we advocate is adding more connection level options and switches so it allows you a way, because you have control over your connection string, so it allows our users of the drivers a way to change the behavior of the driver, the performance characteristics, the memory footprint, without having to worry about what ORM they use.
So, to give you another example: say you did modify Hibernate to use the vendor specific hook; it is not going to work with another vendor, and it is not going to work when you move to JPA. So making sure that we have a clean JDBC implementation is a big key into overcoming that mismatch, because it will allow those technologies to merge much, much tighter, going into the future.
DZone: How do you see the persistence landscape evolving over the next couple of years?
Jesse: I related to that a little bit earlier when I said, you know people like to store things in relational and then they like to get it as object.
People do like to program in objects. I do, and I am a JDBC guy, which is odd; but the next step in what I see is data integration and data services. They are getting a lot of traction. How it will work basically is: let's say you have some data stored in your Oracle database. That is your order records. So all the guys that come to you and place orders, your order data is stored in Oracle. But all of your customer data is stored in SQL server. You now have to do two different connections and one's a Web service and one is relational, right? So now you have to do two completely separate connections to get that data and mange it together for your front end. That stinks; nobody wants to do that.
So what I think is going to eventually happen is that people are going to want to do more of this ORM in a virtualized manner - so data virtualization.
I'll explain a little more. So you take that Oracle database and that SQL server database - you really don't care if you're making decisions where that data came from. You just want the data in a way that you can present it and make decisions based off of it.
IBM, at their Information On Demand conference this year, had some really good quotes up on the screen. Some of the quotes were like, "In the next five years, we're going to have a number of exobytes and petabytes of information. How are you going to be able to mine that data?"
You need to control how your applications access it. And the best way to do that, in my opinion, is a logical model. So you're an application architect; you want to design an application for your company and what you say is, "I've got four departments. I'm going to write an application for each department and each department is going to have to get to these four data sources."
If I had a whiteboard and I drew this out, I would have four things on the top, four guys here in the middle and then all this stuff and then spaghetti in between. It looks really bad. What you want to do is write a single layer for all your data access and then expose it in different ways.
So you can imagine a data access layer that would deliver your data services - and by data services, I mean they could be Web service based. You could access a vH8mC or whatever. And within this layer you would contain a model of how you want to use this data in your application. That would be the model that all your clients on the top would access.
So instead of selecting this table from Oracle and this table from SQL server and connecting it together with a Yahoo! Stock Quote Web service, it would be a Select Customer from Customers. You see? It's much simpler.
DZone: The way it was intended to be.
Jesse: The way it was intended to be. And so what we envision happening is you have all these disparate sources down here at the bottom but what you have at the company level, you have this layer that sits on top. Like a data services layer or a data services platform or something like that.
What this will do is, it will allow data architects to go in and design data object models, hierarchies - a company can have departments which can have employees and addresses.
All of this model that sits up here will be mapped to the different data sources, right? So when you select a company, or your customer for instance, or a stock example, you select your Customer back, what you get back in your application is not a JDBC result set, it's a Customer object. You can say customer.getAddress.getPortfolioValue, right?
That portfolio value is just a summation of the data that they have in their Oracle database with all the stocks they own put together with the Yahoo! Stock stuff from Yahoo's Web service and you aggregate that data together.
But to the person writing the application, the guys who use our APIs, what they'll see is a simple select statement and you'll get back an object, which is the dream--which is the solving of the impedance mismatch.
DZone: It sounds like a really powerful paradigm and I guess, the question of performance always comes up when you add layers of indirection. While there's usually a trade-off, it sounds like performance isn't really comprimised using this approach.
Jesse: It would and I'll tell you another thing. So you imagine this layer, data services; it would be a server in and of itself that ties everything together and when people connect, they connect to that and issue their queries against it. You said performance and that's a good point, because you're going to be issuing this and you're going to have to decompose the statement and filter them out and then bring them back. We're not just talking about queries; we're talking about updates as well. So how do you push updates across those things in a distributed transaction?
But back to your performance thing, you put in a cache. So then you could have a cache at that level and you could really sky-rocket the performance.
Again, as you said, it's not going to be that big of a difference because in all heterogeneous environments you've got all those multiple connections, you can just mange the data together yourself. It's much less worrying; it improves efficiency.
You're talking about auditing and thinking of Sarbox. You could do all of your auditing for all your data services in a single place. That gets people excited.
DZone: You recently published an interesting video on the history of Java and JDBC, which we actually published on JavaLobby. Can you talk a bit about the video and what prompted you to put it together?
Jesse: Yeah, so I'm a Java guy and I love Mac. I'm a Mac guy. I have a Macbook Pro that I've learned to use over the years doing some of the video editing. And we've seen some customers who have started to express more interest in the Java side. They have used our ODBC product, they have some new projects starting in Java and so they wanted to look at some of the stuff.
So I had to do some training. I said, "Alright Jesse, you're the Java guy. Go do some training." So I though to myself, I've got the level set with these guys. A lot of these guys aren't familiar with Java in general. They're not as familiar with it as they have been with some of our other products. How do I get them all on the same page before I start my training?
I did some Googling and I ran across a website that the Sun has and they have this timeline on the history of Java and it's fantastic. I said, "This is what I want to show, so I'll make a slide." Don't ever make a slide with that on it; it's just ugly.
I said, "How can I do it?" You want it to be fun; it's a sales thing. You're training; how are you going to get people interested? And by the way, I was the technical guy after lunch presenting to the sales team. It's not a good spot. You've got to make sure you've got something exciting. So I said, "I'll make a movie."
So I sat down and I just got some picture together and went through that timeline and added a couple of other things - like when I was hired and stuff - to give them an idea of the history and just put it all together.
And the sales guys just loved it so I showed it to some more sales teams. A friend of mine, Jonathan Bruce, he's really big into Java too. He sits next to me. He came over one day and he saw it. He said, "That's just fantastic. You have to post it to YouTube." I said I would never.
DZone: It's been a really popular video.
Jesse: I would have never had thought of that. I said, "Fine" and in iMovie you've got that export to YouTube. So I exported it to YouTube, posted it up there as HD. He said, "Post it on my blog" and I did. People started picking it up. It's just that nobody has ever done it. I'm glad everybody likes it. It was a lot of fun for me to make and it makes people happy; it makes me happy.
DZone: So when can we expect to see your next film in theatres?
Jesse: [laughs] That's a good question. I have gotten a lot of flack.
DZone: Any final words of advice for our members?
Jesse: Yeah, go see my movie. [laughs] And continue to stay tuned to the JDBC expert group with 4.1 and how we're going to work on some of those things. At DataDirect we're always looking for that standards base, so that's what we're passionate about. If you have any questions, go visit my blog or send me an email and I'm happy to assist you in any way I can.
DZone: Thank you for your time today, Jesse.
Jesse: It was great to be here, thanks.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)