Apache Hadoop HBase plays nice with JPA

The Google AppEngine uses the Google BigTable implementation as it’s storage system. Instead of only offering a native (and not so common) API to persist data they worked with the Datanucleus guys to get support for JPA and JDO. There are some restrictions for the usage, but generally folks that are aware of these APIs (especially JPA has a pretty adoption rate) can store their data on BIG_TABLE easily. Thanks to these (common) APIs…

Now, not everybody does want to host their application (and store the data) at Google. A decent alternative is using a “home-grown” system. The Big-Table implementation of Apache HadoopHBase”, can be used the same way! It is possible to easily use JDO/JPA (via Datanucleus) to persist objects in the HBase BigTable implementation. This is really good news!

The installation of HBase is not too complicated. All the gotchas are documented in its documentation. So to start using HBase and JPA, just use a regular persistence.xml file, which lists your classes and the actual configuration, such as:



 <property name="datanucleus.ConnectionURL" value="hbase"/>
 <property name="datanucleus.ConnectionUserName" value=""/>
 <property name="datanucleus.ConnectionPassword" value=""/>
 <property name="datanucleus.autoCreateSchema" value="true"/>
 <property name="datanucleus.validateTables" value="false"/>
 <property name="datanucleus.Optimistic" value="false"/>
 <property name="datanucleus.validateConstraints" value="false"/>


Your Entities are almost “normal”, but there are some restrictions are as well (like managing the @Id)… But generally, just annotated your class with @Entity and deal with the limitations. Once the data-model is done, you could (naivly) start using the EntityManager like:

EntityManagerFactory emf = Persistence.createEntityManagerFactory(...);
EntityManager entityManager = emf.createEntityManager();
EntityTransaction entityTransaction = entityManager.getTransaction();



But the best way (as generally with JPA) is to move that JPA-dealing code into a DataAccessObject… But this is not new and well-known…

During your (maven) build you have to do byte-code enhancing of the actual classes. The DataNucleus guys offer a decent maven-plugin for that:


Now you should be good to go. I wrote a JSF/MyFaces application that uses the DataNucleus JPA-API to store and read objects from Apache HBase. I will make the code available soon… But the above snippet give you an idea on how to configure the stuff, if you are interested in using JPA (or JDO) with the Apache Hadoop HBase project.

Generally the combination for these two is pretty interesting, especially when doing hosting for “regular” JavaEE applications, which mostly will use JPA for its storage API, nowadays. So basically the integration of this and a “normal” JavaEE application is not too complicated. There is also the Spring Framework. Currently there is no explicit support for the DataNucleus JPA/JDO, but I saw some blog talking about Spring and the AppEngine. So integration there is possible too… Perhaps the mentioned “issue” get’s fixed soon as well 😉

Note: It is possible to use the “native” HBase-API to read/store data to a JPA/JDO managed HBase “table”, but you need some code that is not so straightforward on the first look…, as the datanucleus plugin/JPA-impl uses some class-based metadata to manage the table, the column-familiy and its qualifiers. I have some sample code for that as well. With some love of _reflection_ you can get it done in a generic way (I will post an example soon).



Posted in Bigtable, ejb, Hadoop, HBase, spring
7 comments on “Apache Hadoop HBase plays nice with JPA
  1. David Greco says:

    Hi Matthias,
    I published a complete example using Camel + JPA and HBase working together.
    Here the details http://www.davidgreco.it/MySite/Blog/Entries/2010/4/19_HPC_%3D_High_Performance_Camel.html
    It really works nicely.
    Thanks for your post.

  2. Interesting work. Do you know when that code example will be available? Thanks

  3. Venkatesh says:

    Whats the reason behind the byte-code enhancing of the actual classes

  4. matthiaswessendorf says:

    that for the JDO/JPA stuff. OpenJPA does that too…

  5. […] on accessing Apache HBase with JPA 31 08 2010 A few month ago I started to play with Apache HBase, Apache’s Bigtable implementation. Since the Bigtable impl. from the Google AppEngine […]

  6. […] by Matthias Wessendorf’s blog entry “Apache Hadoop HBase plays nice with JPA” I started playing around with integrating HBase into Spring […]

  7. Just to inform people that DataNucleus HBase support has been significantly improved since Matthias’ good introduction (above) to HBase +JDO/JPA. It now supports SchemaTool, increment value generation, relationships and reachability (cascading), datastore identity, versioning and optimistic checking, embedded persistable fields, and limited amount of query evaluation in datastore.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: