NoSQL Zone is brought to you in partnership with:

I'm a pseudo-superhero with my underwear on my head part-time. Most of the time I'm a software developer working at Signpost in New York City. I have a background in consulting for defense and finance and now moving into web and distributed systems. I have worked extensively with C++, C#, and Java as well as a variety of more specialized languages and technologies. I hope my experiences and work can be helpful to others. Contact me if you need any help. Matt is a DZone MVB and is not an employee of DZone and has posted 3 posts at DZone. You can read more from them at their website. View Full User Profile

GuiceyData Generator Makes Quick Work of Data in MongoDB from Java

02.15.2012
| 4819 views |
  • submit to reddit

One of the best things about MongoDB is the lack of an enforced schema for collections. This flexibility gives developers a lot of power in how they work with their data. Embedding records and arrays inside other records allows both a complexity and simplicity of data organization that RDBMSs can only dream of! All of that being said, working with these records in a language like Java and on large diverse teams of people who don’t want to open the database and inspect the records to see what values and sub-records are available, means that you will always spend time wrapping these records in a strong-typed class. Wrapping up loose data into classes that can both access and create that data sounds just like another project I’ve used recently. If you haven’t heard of Google’s Protocol Buffers, you might want to acquaint yourself.

Since I’ve enjoyed working with Protocol Buffers so much, I thought I could mimic their functionality and ease of use with MongoDB. This would also integrate beautifully with the GuiceyMongo project that I released a month or two ago.

What is the GuiceyData Generator?

The GuiceyData Generator is a quick and easy way to specify strongly typed data structures to be stored in a MongoDB database and mapped to wrappers and builders in Java. The resulting classes are strictly data and have (by design) very limited functionality. Their purpose is to make reading and storing data in MongoDB a breeze from Java.

How does it work?

It’s pretty straightforward. You create a data definition file and then run the generator. This will create wrappers and builders for all of the types you define. Here’s a very simple example:

simple.data

data Person {
	string name;
	set<string> alias;
	blob picture;
}

Then you can run the generator:

$ java -cp guiceymongogenerator-0.1.0.jar
com.lowereast.guiceymongo.data.generator.GuiceyDataGenerator -p test.data -s src

This will create the directory structure and files below:

+ src
  + test
    + data
      - Person.java

Wrappers

Once you have generated the code, you can do something like this:

Person p = Person.wrap(personCollection.findOne());
if (p.hasName())
    log.trace(p.getName());
if (p.getAliasCount() > 0)
    log.trace(StringUtil.join(p.getAliasSet(), ", "));
if (p.hasPicture()) {
    Image picture = ImageIO.read(p.getPictureInputStream());
    // do something with the picture
}


Please note that in the example above, blobs will only work if you are using GuiceyMongo to configure your databases, collections, and buckets. Using GuiceyMongo, we could also just do this:

@Inject
void loadPerson(@GuiceyMongoCollection("person")
    GuiceyCollection<Person> personCollection) {
    Person p = personCollection.findOne();
    // ...
}

 

Builders

What would be the use of being able to read data if we couldn’t create it as well? For this, there are builders:

Person.Builder p = Person.newBuilder()
                .setName("Matt Insler")
                .addAlias("Matt")
                .addAlias("Guice & Mongo Guru")
                .setPictureBucket("pictures");
ImageIO.write(picture, format, p.getPictureOutputStream());
personCollection.save(p.build());


And once again with GuiceyMongo:

@Inject
void loadPerson(@GuiceyMongoCollection("person")
    GuiceyCollection<Person> personCollection) {
    Person p = Person.newBuilder() // ...
    personCollection.save(p);
}


It’s really just that easy!

Builder Prototypes

You can create builders based on another object. This can be a wrapper or a builder itself. This is useful when you would like to use a prototype object or just copy an object that you’ve just read.

Person p = Person.wrap(collection.findOne());
Person newPerson = Person.newBuilder(p);

 

More Complex

Here’s an example of a more complex data definition file that shows you more of what the generator can handle:

Contact.data

data Contact {
	data Address {
		string street_1;
		string street_2;
		string city;
		string state;
		int zip_code;
	}

	[identity]
	string identity;

	string first_name;
	string last_name;
	map<string, Address> address;
	map<string, string> phone_number;
	map<string, string> email_address;
	map<string, InstantMessenger> instant_messenger;
	set<string> tag;
	blob picture;
}

data InstantMessenger {
	enum Application {
		AIM,
		ICQ,
		Jabber,
		MSN,
		Yahoo
	}

	string screen_name;
	string alias;
	IMApplication application;
}


Integration with GuiceyMongo Stored Procedure Proxies

Now that you have generated code to access and build your data, what if you want to return that data from a stored procedure? This is pretty easy!

public interface ContactQuery {
    Contact findContactByName(String name);
    List<Contact> findContactsByIMAlias(String alias);
}
 
@Inject
void exercise(ContactQuery query) {
    query.findContactByName("Matt Insler");
}


Not too bad, right?



User Data Types

data [name] {
	[type] [property name];
	...
	OtherData.Embedded [property name];
	data [embedded type name] {
		[type] [property name];
		...
	}
}


User Enum Types

enum [name] {
	[value],
	...
	[value]
}


A note on enum types:
User defined enum types will be stored in MongoDB as [enum value].name(). Conversions will be made when reading and storing these values. Be very very careful when changing enum values. They will need to be changed in the database as well.

Options

Currently there is only one available option. This is the identity option. Options are specified directly above the property they should apply to and are enclosed in square braces.

data Foo {
	[identity]
	string identity;
}
data Bar {
	[identity]
	object_id identity;
}


The identity option specifies that this property should be read and stored as an ObjectId and used as the _id value. The generated code will perform automatic conversions between ObjectId and String if the property’s type is string.

Why not do what others are doing?

In two words, personal preference. While I think that other available libraries have their place, I have certain preferences that have lead me to develop GuiceyData the way that I did.

GuiceyData vs. Morphia

Morphia is a JPA style library that allows you to annotate POJOs and then convert them to and from DBObjects. It’s rather nice and supports a bunch of features like strong types queries, DAO/Datastore generation, embedded and referenced data objects, and more. It’s definitely worth checking out and could be the right library for you.

The main positive about morphia is the ability to persist POJOs that you’ve already created and been using. Just annotate the fields and go! What I don’t like is that I don’t like dealing with full object conversions. The code isn’t as fast or efficient as it could be because it has to use reflection and must process all fields of an object regardless of what fields are actually being used in your code. Plus, with a schema-less database like mongo, you can’t assure that all of the values in the POJO will be filled in when you convert the object, so you can’t use primitive types since they are not nullable. Another problem I have is that I really really really don’t like creating my own custom data stores and the lack of outside control on what database and collections are being used. With GuiceyMongo I can generate object wrappers that will lazily access the data with code written specifically for the object I’m dealing with. No reflection and lazy loading and processing == really fast! GuiceyMongo also separates the data spec from java code. This is very useful for when you want to share objects across an organization that uses different languages to access the database. All languages can be generated from the same specification. In my opinion (realize that I am the creator of GuiceyMongo), GuiceyMongo achieves more than morphia in a cleaner way and is faster.

GuiceyData vs. Sculptor

Sculptor is a code generator that has been adapted to create MongoDB conversion code. It follows the idea of Rails by scaffolding lots of methods and classes to help you deal with your data classes. It’s a bit daunting to understand exactly what’s going on, but you can figure it all out once you play with it a bit. Just like morphia, Sculptor will convert your data objects to and from DBObjects. Unlike morphia, Sculptor generates this conversion code which makes it faster since there’s no reflection. What I really like about Sculptor is query language that it provides. This is a much easier way to query your generated objects than constructing DBObjects by hand or even using QueryBuilder in the com.mongodb package. This functionality has caused me to start creating my own query DSL that I will hopefully release in the next month. Check it out!

Now for the criticisms. My main and largest criticism of Sculptor is that it’s so flexible that it becomes complex. I’m a large proponent of your data objects being just that… Data objects. I don’t see a very compelling use case for scaffolding classes that inherit from your data classes and stubbing out extra functionality. If you want that functionaliy, write it, but leave it out of your data definition. The same goes for repositories. They really aren’t buying you that much. Why do you need to generate the common methods for each and every type of repository? Isn’t this what generics are for? All of the base repository classes will be identical, except for the methods that you add to them. Scaffolding is nice, but not necessary. I also think that generating the conversions is a bit short-sighted. If you’re generating code, why not generate lazy loading in your data objects? I understand that people like POJOs, but they’re very slow. We’re dealing with databases here. Your bottleneck should be the queries and data processing, not constructing the objects around the data that you’ve received or are storing. My last gripe is something that I chose not to support in GuiceyMongo because of the fact that MongoDB is schema-less. Sculptor supports object inheritance. This is a nice feature, but I’m not sure that it’s absolutely necessary. An object in MongoDB can be many different types all rolled into one. I can create an email object and then overlay a task object on top of it. I don’t want an inheritance chain for this. I want a way to convert one object type to another. Wrapping raw DBObjects with accessors is a very clean and fast way to do this. In GuiceyData all you need to do is Email e = Email.wrap(…); Task t = Task.convertFrom(e); The data types can then be stored in the database in the same object or in different ones, or even different collections.

Enterprise Usage

Both GuiceyMongo and GuiceyData were created with enterprise usage in mind. GuiceyMongo allows you to create multiple configuration environments, like Test, QA, Prod, etc. where you can map databases, collections, and buckets (GridFS) to any server, database, or collection that you want. Just by changing the configuration, your data can be loaded from or saved to somewhere else. If this configuration is abstracted into a config file or configuration library, this can be shared across the enterprise and overridden locally if necessary. As for the data generation, the definition files only talk about data. That’s all they ever will talk about. This means that code can be generated in many different languages with ease. If you are writing a data access library that will be distributed to other teams, you can use Google Guice to provide an easy way to configure your library, and package the generated data classes. There is no need to make your users generate the code, just use your jar or so or dll.

The integration of GuiceyMongo and GuiceyData will make it dead easy to get started with MongoDB from Java. In the future I’ll write a quickstart post to s how everyone how quick and easy it can really be!

Try it!

Just head over to the github download page and grab the 0.1.0 jars. To generate your classes run the generator like above. To use GuiceyMongo, add a reference to it and it’s dependencies and look at my other posts on the subject.

Dependencies

guiceydatagenerator-[version].jar None
guiceymongo-[version].jar aopalliance.jar (included with Google Guice), guice-2.0.jar, mongo-1.4.jar (or later)

As always, any issues you encounter, please contact me!


Source: http://www.mattinsler.com/mongodb-java-dao-generator-guiceydata/
Published at DZone with permission of Matt Insler, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Goel Yatendra replied on Thu, 2012/03/15 - 4:00pm

There is currently a limitation in the MongoDB Java driver that prevents it from being used on Android (Dalvik) systems. It seems the Dalvik VM does not include the code from the java.lang.management package, which the Java DB references in a limited number of places.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.