Since I am writing this Bio by myself I won't writing it in 3rd Person. The motivation to post here is to get feedback of my post's, which are mostly summary of mine about software development. Other reason to write here is to share my experience with the community. That all together as résumé I would give a name like "The Philosophy of Programming" or in other words "Bullshit Bingo for Coders". Kosta has posted 11 posts at DZone. You can read more from them at their website. View Full User Profile

Java and XML - Part 2 (JDOM2)

09.24.2013
| 4716 views |
  • submit to reddit

In my first article, I mentioned some terms and posted links about Java and XML in general. In this article, I want to write something about the DOM parsing method and the differences between the JDOM2 and DOM parser from the Java Standard Edition.

The DOM parsing method, also known as DOM, is the reference implementation of the W3C parser and the first implementation with this kind of processing for Java, but it's not the only one, and it's not the best one. That's the reason why other parsers were implemented using this kind of parsing method. 

The most popular and widely-used parsers that use the DOM as Objects and are also mentioned on the Oracle Tutorial site but are not part of the Java SE include:


Why (JDOM) JDOM2 Over DOM4J

When I was searching for other parsers, I found some information about them in different forums. That was several years ago, and I cannot remember which forums they were, otherwise I would have posted the links here. The best information at that time told me that DOM4J had more functionality than JDOM, which is not always necessary.

At that time, I was also collaborating with another development shop that was using the JDOM parser too, so maybe that influenced my decision too. The final criteria desicion was that the JDOM was accepted by the Java Community Process (JCP) as a Java Specification Request (JSR-102). Those were my reasons for choosing JDOM over DOM4J. That doesn't mean that the DOM4J is a worse parser, but JDOM was my choice.


JDOM2 vs. DOM

At the beginning of one of my projects, I was using the DOM parser of the Java core API. After I got more information about the different kinds of parsers, I decided to work with JDOM, and all the old code where the DOM was used needed refactoring. Doing this, I noticed differences in the syntax of the parsers, which was one of the reasons why I rewrote the code. The differences between the parsers are described below.

Note: One of the differences between JDOM and JDOM2 is that JDOM2 uses generics whereas JDOM does not.

Note: One term that I want to mention here, which can be found in many tutorials, is the client class. The Java-class where the XML parsers are used is called the client class. This term is also used when using other kinds of processing libraries like the JSON parsers or other factory pattern like builder.

For creation of the source code I have used Eclipse 4.3 Kepler/Luna as IDE, JDK 1.7u40 and JDOM2. Also Junit 4.8 for creation of the test classes was involved.


XML Examle

Given the XML file as sample for the following examaples:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:my="http://www.stojanok.name/2013/javaxml">
	<tag>first</tag>
	<tag type="special">second</tag>
	<other>third</other>
	<other type="">fourth</other>
	<!-- XPath -->
	<deep>
		<tag>
			<other>fifth</other>
		</tag>
	</deep>
	<!-- namespace -->
	<my:other>sixth</my:other>
	<my:other my:type="different">seventh</my:other>
	<empty></empty>
</root>

Creating the Document

Fist of all we need the document object which contains all the XML nodes and  attributes as objects.

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// namespace !!!
factory.setNamespaceAware(true);
// begin of try - catch block 		
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document document = docBuilder.parse(is);
Element root = document.getDocumentElement();

JDOM2

SAXBuilder saxBuilder = new SAXBuilder();
// begin of try - catch block 		
Document document = saxBuilder.build(is);
Element root = document.getRootElement();

As you can see the DOM needs one more coding step for this operation.In this example I have used InputStream but it also cen be use InputSource-Objects, Files etc.

Follownig Exceptions have to be catched:

DOM: ParserConfigurationException, SAXException, IOException, XPathExpressionException, TransformerConfigurationException, TransformerException

JDOM: JDOMException, IOException, TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException

The most exception for the JDOM2 parsing mthod came with the Transformer which it is used for the XSLT Transformation.

Searching for XML Nodes and Attributes

Other elegant feature from JDOM over the DOM parser is getting Children by name. In the following example you can see the differences in the syntax of the parsers for this case.

DOM

// situation 1 - iterating through child nodes
// searching for the tag-Node
NodeList nodeList = root.getChildNodes();
	for (int i = 0; i < nodeList.getLength(); i++) {
		if (nodeList.item(i).getNodeName().equals("tag")) {
			LOGGER.trace(nodeList.item(i).getNodeName());
		}
	}	
// searching for the other-Node with attribute type - version 1			
	for (int i = 0; i < nodeList.getLength(); i++) {
		if (nodeList.item(i).getNodeName().equals("other")) {
			NamedNodeMap attributes = nodeList.item(i).getAttributes();
			for (int j= 0; j < attributes.getLength(); j++) {
				if ("type".equals(attributes.item(j).getNodeName())) {
					LOGGER.trace("version 1: " + attributes.item(j).getTextContent());
				}
			}
		}
	}
// searching for the other-Node with attribute type - version 2			
	for (int i = 0; i < nodeList.getLength(); i++) {
		if (nodeList.item(i).getNodeName().equals("other")) {
			if (nodeList.item(i).getAttributes().getNamedItem("type") != null) {
				LOGGER.trace("version 2: " + nodeList.item(i).getAttributes().getNamedItem("type").getTextContent());
			}
		}
	}			
			
// any dept level
	NodeList nodeListSomeNode = root.getElementsByTagName("tag");
	for (int i = 0; i < nodeListSomeNode.getLength(); i++) {
		LOGGER.trace("some node: " + nodeListSomeNode.item(i).getNodeName());
	}

JDOM2

// situation 1
// searching for the tag-Node
	for (Element node : root.getChildren()) {
		if (node.getName().equals("tag")) {
			LOGGER.trace(node.getName());
		}
	}
// iterating for the tag-Node			
	for (Element node : root.getChildren("tag")) {
		LOGGER.trace(node.getName());
	}
// searching for the other-Node with attribute type		
	for (Element node : root.getChildren("other")) {
		if (node.getAttribute("type") != null) {
			LOGGER.trace(node.getAttribute("type").getValue());
		}
	}			
// iterating over all nodes			
	for (Content content : root.getDescendants()) {
		LOGGER.trace(content);
	}

Namspaces handling

Somethimes the XML files are using explicit defined namespace or many namspaces at once. If you have this kind of case that you need to use the methods where the namespace is needed as parameter. If you don't do this you will get empty data without any warning and confusion of the programmer is inevitable.

DOM

Note: For using namespace in DOM you need to set the boolean flag of the DocumentBuilderFactory via the setNamespaceAware-method to true.

factory.setNamespaceAware(true);
...
// situation 2
// namespaces
	NodeList nodeListNS = root.getChildNodes();
	// node with namespace - any dept level
	nodeListNS = root.getElementsByTagNameNS(namespaceURI, "other");
	for (int i = 0; i < nodeListNS.getLength(); i++) {
		LOGGER.trace("ElemByTName: " + nodeListNS.item(i).getNodeName());
	}
	// node with namespace - any dept level
	nodeListNS = root.getElementsByTagNameNS(namespaceURI, "other");
	for (int i = 0; i < nodeListNS.getLength(); i++) {
		Node node = nodeListNS.item(i).getAttributes().getNamedItemNS(namespaceURI, "type");
		if (node != null) {
			LOGGER.trace(node.getNodeValue());
		}
		LOGGER.trace("ElemByTName Attribute: " + nodeListNS.item(i).getAttributes().getNamedItemNS(namespaceURI, "type"));
	}
	for (int i = 0; i < nodeList.getLength(); i++) {
		if (nodeList.item(i).getNamespaceURI() != null
				&& nodeList
						.item(i)
						.getNamespaceURI()
						.equals(namespaceURI)) {
			String prefix = nodeList.item(i).lookupPrefix(namespaceURI);					
			if (nodeList.item(i).getNodeName().equals(prefix.concat(":").concat("other"))) {
				NamedNodeMap attributes = nodeList.item(i).getAttributes();
				for (int j= 0; j < attributes.getLength(); j++) {
					if (prefix.concat(":").concat("type").equals(attributes.item(j).getNodeName())) {
						LOGGER.trace("end: " + attributes.item(j).getTextContent());
					}
				}
			}
		}
	}

JDOM2

// situation 2
	// namespaces
	Namespace nsMy = root.getNamespace("my"); 
	LOGGER.trace("nsMy: " + nsMy.getPrefix() + " | " + nsMy.getURI());
	LOGGER.trace("other: " + root.getChild("other", nsMy).getTextTrim());
	for (Element node : root.getChildren("other", nsMy)) {
		if (node.getAttribute("type", nsMy) != null) {
			LOGGER.trace(node.getAttribute("type", nsMy).getValue());
		}
	}	

Using XPath

If the XML file has very complex and deep structure the best way to get a list of the nodes from the deepest node descendants without creating a chain of getChildren() methods is to use Xpath. With XPath you can get select the node by using the XPath syntax. As return value a list of XML nodes will be expected.

DOM

// Xpath
	//Evaluate XPath against Document itself
	XPath xPath = XPathFactory.newInstance().newXPath();
	NodeList nodes = (NodeList) xPath.evaluate("/root/deep/tag/other/text()",
			document.getDocumentElement(), XPathConstants.NODESET);
	for (int i = 0; i < nodes.getLength(); ++i) {
		if (nodes.item(i) instanceof Text) {
			System.out.println(((Text)nodes.item(i)).getTextContent());					
		} else if (nodes.item(i) instanceof Element) {
			Element e = (Element) nodes.item(i);
			LOGGER.trace(e.getNodeValue());
		}
	}

JDOM2

XPathFactory xpathFactory = XPathFactory.instance();
String titelTextPath = "root/deep/tag/other/text()";
XPathExpression<Object> expr = xpathFactory.compile(titelTextPath);
List<Object> xPathSearchedNodes = expr.evaluate(document);
for (int i = 0; i < xPathSearchedNodes.size(); i++) {
	Content content = (Content) xPathSearchedNodes.get(i);
	LOGGER.trace(content.getValue());
}

Note: If you want to use XPath with JDOM2 you need to use the jaxen library as well.

XSLT

Somethimes we need to restructure or filter given XML files. Other technique than using the DOM/JDOM2 API for manipulating nodes like delete, create nodes and modify nodes is the use of XSLT. XSLT is other kind of processing which I would not discuss here but I want to show you how XSL files can be used via the API on the document objects created by the parsers. The XSLT File is presented as second InputStream.

DOM

Transformer transformer2 = TransformerFactory.newInstance()
					.newTransformer(new StreamSource(inputStream2));
	Source source2 = new DOMSource(document);
	Result result2 = new StreamResult(new FileOutputStream(new File(("d:\\test_dom.xml"))));
			transformer2.transform(source2, result2);

JDOM2

Transformer transformer = TransformerFactory.newInstance()
					.newTransformer(new StreamSource(inputStream2));
JDOMSource jdomSource = new JDOMSource(document);
JDOMResult jdomResult = new JDOMResult();
transformer.transform(jdomSource, jdomResult);
xmlOutput.setFormat(Format.getPrettyFormat());
xmlOutput.output(jdomResult.getResult(), new FileOutputStream(new File("d:\\test_jdom.xml")));

Output the document

At the of the processing we need to send our new document for further processing or persistence out of the client class. How you can do this is shown in the following code:

DOM

TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
//transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
DOMSource source = new DOMSource(document);
Result result = new StreamResult(new FileOutputStream(new File(("d:\\test_dom.xml"))));
// output to the System.out
// Result result =  new StreamResult(System.out);
transformer.transform(source, result);

JDOM2

XMLOutputter xmlOutput = new XMLOutputter();
// output to the System.out
//xmlOutput.output(document, System.out);
xmlOutput.output(document, new FileOutputStream(new File("d:\\test_jdom.xml")));
// other possibility
String output = xmlOutput.outputString(document); 

Maven dependencies

If you use maven here are the dependecies which I have used for the test project:

	<dependencies>
		<dependency>
			<groupId>org.jdom</groupId>
			<artifactId>jdom2</artifactId>
			<version>2.0.5</version>
		</dependency>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.11</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-api</artifactId>
			<version>1.7.5</version>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-log4j12</artifactId>
			<version>1.7.5</version>
		</dependency>
		<dependency>
			<groupId>log4j</groupId>
			<artifactId>log4j</artifactId>
			<version>1.2.17</version>
		</dependency>
		<dependency>
			<groupId>jaxen</groupId>
			<artifactId>jaxen</artifactId>
			<version>1.1.4</version>
		</dependency>
	</dependencies>

Résumé

Writing this article, creating the code for it and doing the tests I have also learned new stuff about the parsers: 

  • JDOM2 has shorter syntax.
  • DOM and JDOM are using the same parsing method.
  • JDOM2 can build document from a StAX-based XMLStreamReader.
  • Iterating trough the nodes can be also implemented recursively by both libraries.
  • Different techniques for handling the namespaces as mentioned above.
  • Using namespaces in DOM you ned to set the setNamespaceAware-Flag to true.
  • For using XPath in JDOM2 you need the jaxen library
  • Both parsers are using same libraries for XSLT.

I have also searched links about performance comparison but haven't found any good one. I guess both parser should be equally fast.

As you can see the JDOM2 syntax is shorter for using the same functionality. That's one reason for using it over the DOM in new projects. If you have projects which are already using DOM then there is no need to refactor them only because of the syntax. At last but not least you should aleays keep an eye on the slogan: never change a running system. Or maybe wiser: never run a changing system.

GitHub

The sources for this example can be checked out from: https://github.com/kstojanovski/java-and-xml. Use the java-and-xml folder and start the test method of the TestJavaXml.java file from the test folder.

Published at DZone with permission of its author, Kosta Stojanovski.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)