I've been a zone leader with DZone since 2008, and I'm crazy about community. Every day I get to work with the best that JavaScript, HTML5, Android and iOS has to offer, creating apps that truly make at difference, as principal front-end architect at Avego. James is a DZone Zone Leader and has posted 639 posts at DZone. You can read more from them at their website. View Full User Profile

How Do You Handle XML in Your Java Applications?

  • submit to reddit

There's a lot of choice out there for handling XML data in Java applications. Although the technologies have been around for ages, the main options for most developers seems to be DOM or SAX. But even these well-established approaches have their limits. DOM can be a lot of overhead if you want to read a large document, as it keeps it in memory; SAX is more efficient, but can involve a lot of messy code if you want to keep any of the data that you read in memory. A good halfway for me is JAXB - seemingly fast and keeps a meaningful object structure in memory.

But what do you use most commonly? I know there are pros and cons for each approach, as well as some reasons that rule out an entire approach. Please leave comments to let us know which approach you typically take in your development. I think this could be an interesting roundup of technologies.


James Sugrue replied on Wed, 2010/04/28 - 3:04am

To get things going here, when I need to serialize and read from XML, as I'm running on the Eclipse platform, I use EMF (Eclipse Modelling Framework). Seems to do a really good job of serializing XML. Previously, I had used JAXB and Castor.

It's been a long time since I've needed to use SAX or DOM. I prefer having the object structure matching the XML structure rather than dealing with abstractness.. But that's just me. 



Greg Hall replied on Wed, 2010/04/28 - 3:24am

dom4j all the way - simple!

Robert Csala replied on Wed, 2010/04/28 - 3:27am

Depends. If I need the whole content, I create an appropriate object structure and read it with xstream, or sth. similar. If I need just a few pieces of the xml, I use DOM.

Robert Csala replied on Wed, 2010/04/28 - 3:27am in response to: Robert Csala

.. and XPath

oded peer replied on Wed, 2010/04/28 - 3:32am

I want to try out VTD-XML (http://vtd-xml.sourceforge.net/) but haven't found the time (or project) yet

James Sugrue replied on Wed, 2010/04/28 - 3:38am in response to: oded peer

VTD-XML definitely sounds good doesn't it? Promises better memory usage.. If anyone has given it a go, please share your experience with us.

Barry Fitzgerald replied on Wed, 2010/04/28 - 3:56am

One of the nicest ways to do this now is to use Groovy!

 It can quite happily coexist with your java project and the xml handling is much better than any of the other options.

Peter Karussell replied on Wed, 2010/04/28 - 4:09am

I am using SAX and DOM but also joost which is a fast and memory efficient 'xslt' transformator and see a list of xml processing/serializers where simple looks promising. Another nice software is jaxen.

One approach I implemented myself (similar to this) does not hold the entire xml in memory but can be parsed via DOM. E.g.you have the xml file

   <product id="1">  CONTENT1 </product>
   <product id="2">  CONTENT2 </product>
   <product id="3">  CONTENT3 </product>

 Then you can parse it product by product via:

ArrayList list = ... 
ContentHandler productHandler = new GenericXDOMHandler("/products/product") {

            public void writeDocument(String localName, Element element) throws Exception {
                // use stupid simple DOM here 

(If someone wants the production ready source I could blog about this somewhen)


Jocelyn LECOMTE replied on Wed, 2010/04/28 - 4:09am

A few months ago, I used JAXB to transform XML to POJOs, in a Java SE environment. Everything went well, until a few weeks later, after a java update. JAXB was patched and the parsing of my XML failed... The solution was to enforce a version of JAXB, instead of taking the one embedded with Java.

So, I made a quick tour of existing libraries to see if it was possible to find one matching my simple needs. I found SimpleXML (http://simple.sourceforge.net/) very ... simple, lightweight and easy to add to my project.

Erik Post replied on Wed, 2010/04/28 - 4:50am

In my Java projects? Using Scala. It's totally awesome, in case you hadn't heard.

Flofighter Flof... replied on Wed, 2010/04/28 - 4:54am

How about Xmlbeans (http://xmlbeans.apache.org/) ?

It's another Java XML binding library. It's widely used in my company to create & edit large xml files (up to 100 MB).It's about as fast and efficient as JAXB and allows you to access and modifiy the full xml model (including comments).



ff aaa replied on Wed, 2010/04/28 - 5:03am

in the past i used many libraries (dom4j, jibx, xstream, jaxb, xmlbeans). Recently i started using "simple" . i think that is the nicest and easiest solution for many xml related problems (but not all). http://simple.sourceforge.net/home.php

Jason Vedder replied on Wed, 2010/04/28 - 5:31am


James Sugrue replied on Wed, 2010/04/28 - 6:05am in response to: Peter Karussell

I've found myself using that type of approach too Peter...It would be great to see a blog/more detail.


James Sugrue replied on Wed, 2010/04/28 - 6:06am

Lots of great suggestions for handling XML  - keep them coming!


Zqudlyba Navis replied on Wed, 2010/04/28 - 6:28am

JAXB if I need XML Schema validation.

XStream if I want something quick, lightweight and don't require XML Schema validation.





Chad Hahn replied on Wed, 2010/04/28 - 7:05am

I don't use it. JSON, Thrift, Protobufs, Kryo, or something else.  As long as it keeps me away from xmlarrhea.

Gil Collins replied on Wed, 2010/04/28 - 7:43am

I use jdom when it's a small XML, for larger ones I'll use SAX if I need speed and resources are limited.

Zviki Cohen replied on Wed, 2010/04/28 - 8:05am

I agree that there is no one tool to rule them all: each case requires a different tool. Personally, for fast processing with minimal memory footprint, I like the good old SAX approach. I'm using this method or defining Java Enums as SAX elements. Makes the core more readable. More details here.

Greg Brown replied on Wed, 2010/04/28 - 8:35am

I generally use StAX (javax.xml.stream.*), though I also use DOM when appropriate.

jacklty lam replied on Wed, 2010/04/28 - 9:33am

I used Castor to marshal and unmarshal objects from xml

Ben Courliss replied on Wed, 2010/04/28 - 9:59am

Used to use stax but now I just delegate to Groovy for anything I need to do in XML.

Jim Watson replied on Wed, 2010/04/28 - 10:02am

I use Simple http://simple.sourceforge.net, its by far the easiest way to perform object to XML serializations. And its fully bidirectional. All you need to do is annotate your existing POJOs. Also, it has some benchmarks against XStream, which show it to be more than twice as fast.

Kris Scorup replied on Wed, 2010/04/28 - 10:29am

I second XMLBeans. I would rather work with Objects than XPath expressions, so XMLBeans fits my comfort level.

Rhett Whaley replied on Wed, 2010/04/28 - 10:29am

Java and XML? People still do that? It's Groovy or nothing baby!

Jilles Van Gurp replied on Wed, 2010/04/28 - 12:48pm

Mix of xpath, jaxb and string manipulation mostly

The choice is really divided in producing and consuming here. For producing xml, I prefer a template based solution although sadly I am stuck with jaxb currently. There's a reason why there is no serverside dom for html pages in most web servers: it's a waste of time to write the boilerplate pojo code and memory space. The same holds true for most non trivial xml documents in my view. Your db model classes tend to be different than your jaxb classes. So you end up with a lot of stupid conversion: request xml -> jaxb dtos ->db model->sql -> result set -> db model -> jaxb dtos -> response xml. I have  a lot of code and tests that are about babysitting data through this chain instead of actual business logic. 

For consuming xml, I believe in duck typing and jaxb tends to be too anal about little details for my taste. Xpath gives me all I need here and allows me to be much more relaxed about the content I can accept (which in most real life scenarios is a good thing).

 Dom and sax are too low level to be of use. If you want that level of detail, you really want to be using pull parsers like STAX probably. Sax and DOM are there for when you run out of other options, which in Java means you don't need to work with them directly.

When the files get big, you might want to consider hybrid approaches. One case where I had files that did not fit in memory (except with multi GB heap), I used a simple String.indexOf() to examine each line in order to consume chunks of xml that I could feed to Jaxb (convenient because the object model was already there). I actually multi threaded the whole thing to queue chunks and process them with 30 threads. BTW. If you ever find yourself writing this kind of code, take a good look at Spring Batch before you continue with your home grown batch processing system.

Alternatively, use a more batch friendly language like python. It does xml too.

Neil Shannon replied on Wed, 2010/04/28 - 2:22pm

dom4j and XMLBeans presently.  Although I've used Castor before.

Peter Karussell replied on Wed, 2010/04/28 - 5:04pm in response to: James Sugrue

Nice to hear this. I will blog about this the next days and post the link here.

juan ponce replied on Wed, 2010/04/28 - 5:52pm

- Apache Digester

- XStream

Peter Karussell replied on Thu, 2010/04/29 - 5:36am in response to: James Sugrue

Now I blogged about Memory Efficient XML Processing.

Comments are appreciated!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.