Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!

Mitchell is a DZone employee and has posted 1655 posts at DZone. View Full User Profile

How to Use JPA and JDO in HBase

March 17, 2010 AT 11:11 AM
  • submit to reddit
This article is part of the DZone NoSQL Resource Portal, which is brought to you in collaboration with Neo Technology and DataStax. Visit the NoSQL Resource Portal for additional tutorials, videos, opinions, and other resources on this topic.
Thanks to Google App Engine's work with DataNucleus, the GAE users have enjoyed JPA and JDO support.  For its storage system, GAE uses the (NoSQL) Google BigTable implementation.  HBase, under the Apache Hadoop project, is a distributed, column-oriented storage system that has been modeled after BigTable.  There are some usage restrictions, but generally it's pretty easy to store data on BigTable using JPA and JDO.  What you may not know is that HBase can also support these standard APIs for a homegrown system.

For developers who don't want to host their applications or store their data at Google, HBase provides a viable (and Apache community supported) option for building your own open source system.  JDO and JPA are also used through DataNucleus to persist objects in HBase.  To install HBase, just read the documentation, which covers all of the possible pitfalls.  Set up is not very difficult.  Next, Matthias Wessendorf explains how to use the JPA with HBase.  You start with a regular persistence XML file listing your classes and the actual configuration:
<persistence...>
<persistence-unit...>

<class>net.wessendorf...</class>
...

<properties>
<property name="datanucleus.ConnectionURL" value="hbase"/>
<property name="datanucleus.ConnectionUserName" value=""/>
<property name="datanucleus.ConnectionPassword" value=""/>
<property name="datanucleus.autoCreateSchema" value="true"/>
<property name="datanucleus.validateTables" value="false"/>
<property name="datanucleus.Optimistic" value="false"/>
<property name="datanucleus.validateConstraints" value="false"/>
</properties>

</persistence-unit>
</persistence>
In most cases you'll want to add @Entity to your class and try to deal with any limitations.  When your data model is complete, you can, for example, start using the EntityManager natively:
EntityManagerFactory emf = Persistence.createEntityManagerFactory(...);
EntityManager entityManager = emf.createEntityManager();
EntityTransaction entityTransaction = entityManager.getTransaction();
entityTransaction.begin();

entityManager.persist(myJPAentity);

entityTransaction.commit();
More often, you may instead want to move the JPA-dealing code into a DataAccessObject.  

During a Maven build (shown below) you'll have to enhance the bytecode of the actual classes.  Lucky for you, DataNucleus offers a Maven plugin:
<plugin>
<groupId>org.datanucleus</groupId>
<artifactId>maven-datanucleus-plugin</artifactId>
<version>2.0.0-release</version>
<configuration>
<log4jConfiguration>${basedir}/log4j.properties</log4jConfiguration>
<verbose>true</verbose>
<api>JPA</api>
<persistenceUnitName>nameOfyourPU</persistenceUnitName>
</configuration>
<executions>
<execution>
<phase>compile</phase>
<goals>
<goal>enhance</goal>
</goals>
</execution>
</executions>
</plugin>
                                             

There are significant benefits in this method when hosting a normal Java EE application on HBase.  Because Java EE uses the JPA for most of its storage, the integration of JEE applications is a lot easier.  You can also use the 'native' HBase API to read and store data on a JPA/JDO managed HBase table, but the code is not as simple.
Article Type: 
How-to
Neo Technology and DataStax are leading the charge for the NoSQL movement.  You can learn more about the Neo4j Graph Database in the project discussion forums and try out the new Spring Data Neo4j, which enables POJO-based development.  You can also see how Apache Cassandra, a ColumnFamily data store, is pushing the boundaries of persistence with cloud capabilities and deployments at SocialFlow and Netflix.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Matthias Wessendorf replied on Tue, 2010/03/30 - 7:42am

Hello,

 I'd appreciate if you could link to the original version of this blog, located here:

http://matthiaswessendorf.wordpress.com/2010/03/17/apache-hadoop-hbase-plays-nice-with-jpa/

 It would be nice if you'd use my URL (http://matthiaswessendorf.wordpress.com) instead of the nofluffjuststuff thing.

Thanks!

Matthias Wessendorf

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.