Programmer, solution architect, user group and conference organizer, conference speaker and traveling fun code evangelist. Johannes tries to apply Agile principles to large software projects, but what he's really passionate about is sharing the experience of more fun programming with other coders around the world. Johannes is a DZone MVB and is not an employee of DZone and has posted 31 posts at DZone. You can read more from them at their website. View Full User Profile

Announcing EAXY: Making XML Easier in Java

11.08.2013
| 9100 views |
  • submit to reddit

XML libraries in Java is a minefield. The amount of code required to manipulate and read XML is staggering, the risk of getting class path problems with different libraries is substantial and the handling of namespaces opens for a lot of confusion and errors. The worst thing is that the situation doesn’t seem to improve.

A colleague made me aware of the JOOX library some time back. It’s a very good attempt to patch these problems. I found a few shortcomings with JOOX that made me want to explore alternatives and naturally I ended up writing my own library (as you do). I want the library to allow for Easy manipulation of XML, and in an episode of insufficient judgement, I named the library EAXY. It’s a really bad name, so I appreciate suggestions for improvement.

Here is what I set out to solve:

  • It should be easy to create fairly complex XML trees with Java code
  • It should be straight-forward and fool-proof to use namespaces. (This is where JOOX failed me)
  • It should easy to read values out of the XML structure.
  • It should be easy to work with existing XML documents in the file structure or classpath
  • The library should prefer throwing an exception over silently failing.
  • As a bonus, I wanted to make it even easier to deal with (X)HTML, by adding convenience functions for this.

1. creating an xml document

An XML document is just a tree. How about if align the tree to the Java syntax tree. For example – lets say you wanted to programmatically wanted to construct some feedback on this article:

Element email = Xml.el("message",
Xml.el("recipients",
Xml.el("recipent",
Xml.attr("type", "email"),
Xml.attr("role", "To"),
Xml.text("mailto:johannes@brodwall.com")),
Xml.el("recipent", Xml
.attr("type", "email"),
Xml.attr("role", "Cc"),
Xml.text("mailto:contact@brodwall.com"))),
Xml.el("subject", "EAXY feedback"),
Xml.el("contents", "I think this is an interesting library"));

Each element (Xml.el) has a tag name and can nest other elements, attributes (Xml.attr) or text (Xml.text). If the element only contains a text, we don’t even need to make the call to Xml.text. The syntax is optimized so that if you want to do a static import on Xml.* you can write code like this:

Element email = el("message",
el("recipients",
el("recipent",
attr("type", "email"),
attr("role", "to"),
text("mailto:johannes@brodwall.com")),
el("recipent",
attr("type", "email"),
attr("role", "cc"),
text("mailto:contact@brodwall.com"))),
el("subject", "EAXY feedback"),
el("content", "I think this is an interesting library"));

2. reading xml

Reading XML with Java code can be a challenge. The DOM API makes it extremely wordy to do anything at all. You an use XPath, but can be a bit too much on the compact side and when you do something wrong, the result is simply that you get an empty collection or a null value back. I think we can improve on this.

Consider the following:

System.out.println(email.find("recipients", "recipient").texts());

I step down the XML tree structure and get all the recipient email addresses of the previous message. But wait – running this code returns an empty list. EAXY allows us to avoid scratching our head over this:

System.out.println(email.find("recipients", "recipient").check().texts());

Now I get the following exception:

org.eaxy.NonMatchingPathException: Can't find 
{recipient} below [message, recipients].
Actual elements: [Element{recipent}, Element{recipent}]

As you can see, we misspelled “recipent” in the message. Let’s get back to this problem later, but for now, let’s work around it to create something meaningful:

for (Element recipient : email.find("recipients", "recipent")) {
if ("to".equals(recipient.attr("role"))) {
System.out.println(recipient.text());
}
}

Again, I think this is about as fluent as Java’s syntax allows.

3. validation and namespaces

So, we had a message where one of the element names was misspelled. If you have an XSD document for the XML you’re using, you can validate the document against this. However, as you may get used to when it comes to Java XML libraries the act of performing this validation is quite well hidden behind complex API’s. So I’ve provided a little help:

Xml.validatorFromResource("mailmessage.xsd").validate(email);

This reads the mailmessage.xsd from the classpath, which is the most common use case for me.

Of course, most schemas don’t refer to elements in the empty namespace. When using validation, it’s common that we have to construct elements in a specific namespace. In most Java libraries for dealing with XML, this is hard and easy to get wrong, especially when namespaces are mixed. I’ve made namespaces into a primary feature of the Eaxy library:

Namespace MSG_NS = new Namespace("http://eaxy.org/test/mailmessage", "msg");
Element email = MSG_NS.el("message",
MSG_NS.el("recipients",
MSG_NS.el("recipient",
MSG_NS.attr("type", "email"),
attr("role", "cc"),
text("mailto:contact@brodwall.com"))));

Notice that the “type” and the “role” attributes belong to different namespaces – a scenario that is especially hard to facilitate with other libraries.

4. templating

Reading the XSD from the classpath inspired another usage: What if we have an XML document as a template in the classpath and then use Java-code to manipulate this document. This would be especially handy for XHTML:

Document doc = Xml.readResource("testdocument.html");
Element peopleElement = doc.select("#peopleForm");
 
peopleElement.add(el("input",
attr("type", "text"),
attr("name", "firstName"),
attr("value", "Johannes")));
peopleElement.add(el("input", 
attr("type", "text"), 
attr("name", "lastName"),
attr("value", "Brodwall")));

This code reads the file testdocument.html from the classpath, selects the element with id “peopleForm” and adds two input elements to it.

5. html convenience

In the code above, we set the type, name and value attributes of HTML input elements. These are among the most frequently used attributes in HTML manipulation. To make this easier, I’ve added some convenience methods to Eaxy:

peopleElement.add(el("input")
.type("text").name("firstName").val("Johannes"));
peopleElement.add(el("input")
.type("text").name("lastName").val("Brodwall"));

A final case I wanted to optimize for is that of dealing with forms in HTML. Here’s some code that manipulates a form before that can be sent to the user.

HtmlForm form = new HtmlForm(peopleElement);
form.set("firstName", "Johannes");
form.set("lastName", "Brodwall");
 
doc.writeTo(req.getWriter());

Here, I set the form contents directly. The code will throw an exception if a parameter name is misspelled, so it’s easy to ensure that you use it correctly.

conclusion

I have five examples of how Eaxy can be used to do easily what’s hard to do with most XML libraries for Java: Create a document tree with pure Java code, read and manipulate individual parts of the XML tree, the use of namespace and validation, templating and manipulating (X)HTML documents and forms.

The library is not stable now, but for an XML library to be unstable may not be a very risky situation as most errors will be easy to detect long before production.

I hope that you may find it useful to try and use this library in your code to deal with XML and (X)HTML manipulation. I’m hoping for some users who can help me iron out the bugs and make Eaxy even more easy to use.

Oh, and do let me know if you come up with a better name.

Published at DZone with permission of Johannes Brodwall, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Johan Sjöberg replied on Fri, 2013/11/08 - 7:17am

I thought XML in Java was dead since '04, leaving us scarred with the nothing but a horrible experience. I'm very excited to see a fresh take on a problem which in other platforms have been solved since long. So +1 to your initiative which looks promising by the way. 


My personal concerns in an XML library are twofold: It needs to be performant and it should be usable in cases with even big chunks of XML. Partial DOM parsers address the second issue should that route be taken. I wouldn't mind if simplicity were focused on reading/parsing XML - there are plenty of object mappers to produce XML already. 

Johannes Brodwall replied on Sat, 2013/11/09 - 5:47am in response to: Johan Sjöberg

Thank you for the positive feedback., Johan

When it comes to performance, I've considered implementing a event-based subtree parser, but I haven't found a syntax I'm happy with yet. The code base includes tests with up to one million elements and a document of more than 1 MB with no problems.

I'd like to take up the challenge of building a partial parser: Do you have a suggestion for what the syntax could look like?

Mappers like JAXB does indeed produce XML, but they force you to generate a seperate set of classes and write code to map your internal objects to these generated object. In other words: It's a trap!

I would challenge you take a look at generating XML with a library like my own or JOOX just for comparison.

Johan Sjöberg replied on Sat, 2013/11/09 - 9:25am in response to: Johannes Brodwall

I'm fine with the separate set of classes to produce XML - it makes live a bit easier when creating e.g., spring REST routines where you are in control of the output. The alternative would be along the lines of decorating each domain object with a toXML() routine. 

Consuming on the other hand is a whole separate issue. 

I suppose partial DOM trees can be built by something like an xpath separator. 

for(PartialTree t : Parser.on("/product").parsePartial(input)) {
    ...
}
view sourceWhen it come to working an API I think lxml (http://lxml.de/index.html#documentation) is quite decent to use. 

Lastly, when it comes to event based parsing it becomes trickier. Sax and stax both are virtually unusable for complex xml - and that level of performance is usually not needed. I have no idea what a manageable syntax would look like for such, perhaps something dreamy like

on(event).collectChildElements("foo", "bar", "baz").skipNulls().withMapper(m);

Still I think a partial DOM parser would suffice in most of the cases above a more cumbersome event based approach. If event based parsing is considered, I think the trick is to keep a wider state than just the current element (e.g., tracking depth, parent, etc.)

Amit Mujawar replied on Mon, 2013/12/02 - 12:31am

 Has anyone tried Groovy? Please take a shot at http://groovy.codehaus.org/Processing+XML

Do we really need anything else???

Johannes Brodwall replied on Mon, 2013/12/02 - 5:08pm in response to: Amit Mujawar

I'll language wars to zealots. :-)

Johannes Brodwall replied on Mon, 2013/12/02 - 5:17pm in response to: Johan Sjöberg

The partial tree syntax look like something I could do. It's always a bit challenging to build a Iterable that doesn't keep the whole state in memory, but I think it's doable.

I've found that sacrificing the context (parent, depth) makes the usage and implementation so much faster and less bug-prone that I'd like to see how far that experiment can take me.

Let me get back to you if I get around to implementing the partial parser.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.