How Do You Handle XML in Your Java Applications?
There's a lot of choice out there for handling XML data in Java applications. Although the technologies have been around for ages, the main options for most developers seems to be DOM or SAX. But even these well-established approaches have their limits. DOM can be a lot of overhead if you want to read a large document, as it keeps it in memory; SAX is more efficient, but can involve a lot of messy code if you want to keep any of the data that you read in memory. A good halfway for me is JAXB - seemingly fast and keeps a meaningful object structure in memory.
But what do you use most commonly? I know there are pros and cons for each approach, as well as some reasons that rule out an entire approach. Please leave comments to let us know which approach you typically take in your development. I think this could be an interesting roundup of technologies.


Comments
James Sugrue replied on Wed, 2010/04/28 - 3:04am
To get things going here, when I need to serialize and read from XML, as I'm running on the Eclipse platform, I use EMF (Eclipse Modelling Framework). Seems to do a really good job of serializing XML. Previously, I had used JAXB and Castor.
It's been a long time since I've needed to use SAX or DOM. I prefer having the object structure matching the XML structure rather than dealing with abstractness.. But that's just me.
James
Greg Hall replied on Wed, 2010/04/28 - 3:24am
Robert Csala replied on Wed, 2010/04/28 - 3:27am
Robert Csala replied on Wed, 2010/04/28 - 3:27am
in response to:
Robert Csala
oded peer replied on Wed, 2010/04/28 - 3:32am
James Sugrue replied on Wed, 2010/04/28 - 3:38am
in response to:
oded peer
Barry Fitzgerald replied on Wed, 2010/04/28 - 3:56am
One of the nicest ways to do this now is to use Groovy!
It can quite happily coexist with your java project and the xml handling is much better than any of the other options.
Peter ___ replied on Wed, 2010/04/28 - 4:09am
I am using SAX and DOM but also joost which is a fast and memory efficient 'xslt' transformator and see a list of xml processing/serializers where simple looks promising. Another nice software is jaxen.
One approach I implemented myself (similar to this) does not hold the entire xml in memory but can be parsed via DOM. E.g.you have the xml file
Then you can parse it product by product via:
ContentHandler productHandler = new GenericXDOMHandler("/products/product") {
@Override
public void writeDocument(String localName, Element element) throws Exception {
(If someone wants the production ready source I could blog about this somewhen)
Jocelyn LECOMTE replied on Wed, 2010/04/28 - 4:09am
A few months ago, I used JAXB to transform XML to POJOs, in a Java SE environment. Everything went well, until a few weeks later, after a java update. JAXB was patched and the parsing of my XML failed... The solution was to enforce a version of JAXB, instead of taking the one embedded with Java.
So, I made a quick tour of existing libraries to see if it was possible to find one matching my simple needs. I found SimpleXML (http://simple.sourceforge.net/) very ... simple, lightweight and easy to add to my project.
Erik Post replied on Wed, 2010/04/28 - 4:50am
Flofighter Flof... replied on Wed, 2010/04/28 - 4:54am
How about Xmlbeans (http://xmlbeans.apache.org/) ?
It's another Java XML binding library. It's widely used in my company to create & edit large xml files (up to 100 MB).It's about as fast and efficient as JAXB and allows you to access and modifiy the full xml model (including comments).
ff aaa replied on Wed, 2010/04/28 - 5:03am
Jason Vedder replied on Wed, 2010/04/28 - 5:31am
James Sugrue replied on Wed, 2010/04/28 - 6:05am
in response to:
Peter ___
I've found myself using that type of approach too Peter...It would be great to see a blog/more detail.
James
James Sugrue replied on Wed, 2010/04/28 - 6:06am
Lots of great suggestions for handling XML - keep them coming!
James
Zqudlyba Navis replied on Wed, 2010/04/28 - 6:28am
JAXB if I need XML Schema validation.
XStream if I want something quick, lightweight and don't require XML Schema validation.
Chad Hahn replied on Wed, 2010/04/28 - 7:05am
Gil Collins replied on Wed, 2010/04/28 - 7:43am
Zviki Cohen replied on Wed, 2010/04/28 - 8:05am
Greg Brown replied on Wed, 2010/04/28 - 8:35am
jacklty lam replied on Wed, 2010/04/28 - 9:33am
Ben Courliss replied on Wed, 2010/04/28 - 9:59am
Jim Watson replied on Wed, 2010/04/28 - 10:02am
Kris Scorup replied on Wed, 2010/04/28 - 10:29am
I second XMLBeans. I would rather work with Objects than XPath expressions, so XMLBeans fits my comfort level.
Rhett Whaley replied on Wed, 2010/04/28 - 10:29am
Jilles Van Gurp replied on Wed, 2010/04/28 - 12:48pm
Mix of xpath, jaxb and string manipulation mostly
The choice is really divided in producing and consuming here. For producing xml, I prefer a template based solution although sadly I am stuck with jaxb currently. There's a reason why there is no serverside dom for html pages in most web servers: it's a waste of time to write the boilerplate pojo code and memory space. The same holds true for most non trivial xml documents in my view. Your db model classes tend to be different than your jaxb classes. So you end up with a lot of stupid conversion: request xml -> jaxb dtos ->db model->sql -> result set -> db model -> jaxb dtos -> response xml. I have a lot of code and tests that are about babysitting data through this chain instead of actual business logic.
For consuming xml, I believe in duck typing and jaxb tends to be too anal about little details for my taste. Xpath gives me all I need here and allows me to be much more relaxed about the content I can accept (which in most real life scenarios is a good thing).
Dom and sax are too low level to be of use. If you want that level of detail, you really want to be using pull parsers like STAX probably. Sax and DOM are there for when you run out of other options, which in Java means you don't need to work with them directly.
When the files get big, you might want to consider hybrid approaches. One case where I had files that did not fit in memory (except with multi GB heap), I used a simple String.indexOf() to examine each line in order to consume chunks of xml that I could feed to Jaxb (convenient because the object model was already there). I actually multi threaded the whole thing to queue chunks and process them with 30 threads. BTW. If you ever find yourself writing this kind of code, take a good look at Spring Batch before you continue with your home grown batch processing system.
Alternatively, use a more batch friendly language like python. It does xml too.
Neil Shannon replied on Wed, 2010/04/28 - 2:22pm
Peter ___ replied on Wed, 2010/04/28 - 5:04pm
in response to:
James Sugrue
juan ponce replied on Wed, 2010/04/28 - 5:52pm
- Apache Digester
- XStream
Peter ___ replied on Thu, 2010/04/29 - 5:36am
in response to:
James Sugrue
Now I blogged about Memory Efficient XML Processing.
Comments are appreciated!