Christoph has posted 6 posts at DZone. View Full User Profile

StAXON - JSON via StAX

02.08.2012
| 8535 views |
  • submit to reddit

XML is for dinosaurs, right? Everybody uses JSON these days. So you do, don’t you? But what about things like XSD, XSLT, JAXB, XPath, etc – is it all evil?

In this article, I’d like to introduce the StAXON project (APL2) which tries to give you the best from both worlds: JSON outside, but XML inside. One benefit from this is that you can integrate JSON with powerful XML-related technologies for free.

StAXON lets you read and write JSON using the Java Streaming API for XML (javax.xml.stream), also known as StAX. More specifically, StAXON provides implementations of the

  • StAX Cursor API (XMLStreamReader and XMLStreamWriter)
  • StAX Event API (XMLEventReader and XMLEventWriter)
  • StAX Factory API (XMLInputFactory and XMLOutputFactory)

for JSON.

You may know the Jettison project, which also has XMLStreamReader and XMLStreamWriter implementations. However, StAXON aims to provide a more comprehensive and consistent solution and tries to avoid some of the issues users are having with Jettison.

Anyway, let’s get started and see what this “anti-aging substance” for XML can do.

Setup

Add the following dependency to your Maven POM file:

<dependency>
    <groupId>de.odysseus.staxon</groupId>
    <artifactId>staxon</artifactId>
    <version>1.0</version>
</dependency>

or get the latest StAXON JAR from the Downloads page and add it to your classpath.

Mapping Convention

The purpose of StAXON’s mapping convention is to generate a more compact JSON. It borrows the "$" syntax for text elements from the Badgerfish convention but attempts to avoid needless text-only JSON objects:

  • Element names become object properties:
    <alice/> <–> {"alice":null}
  • Attributes go in properties whose name begin with "@":
    <alice charlie="david"/> <–> {"alice":{"@charlie":"david"}}
  • Text-only elements go to a simple key/value property:
    <alice>bob</alice> <–> {"alice":"bob"}
  • Otherwise, text content is mapped to the "$" property:
    <alice charlie="david">bob</alice> <–> {"alice":{"@charlie":"david","$":"bob"}}
  • Nested elements go to nested properties:
    <alice><bob>charlie</bob></alice> <–> {"alice":{"bob":"charlie"}}
  • A default namespace declaration goes in the element’s "@xmlns" property:
    <alice xmlns="http://foo.com"/> <–> {"alice":{"@xmlns":"http://foo.com"}}
  • A prefixed namespace declaration goes in the element’s "@xmlns:<prefix>" property:
    <p:alice xmlns:p="http://foo.com/> <–> {"p:alice":{"@xmlns:p":"http://foo.com"}}

Note that the “ugly” '$'-fields and '@'-attributes will only appear when mapping XML elements that take both attributes (including namespace declarations) and text.

Core API

As StAXON is merely a StAX implementation, there’s just a thin API layer to deal with configuration that you have to care about. Everything else is pure StAX.

  • JsonXMLInputFactory extends XMLInputFactory and is used to create JSON stream/event readers
  • JsonXMLOutputFactory extends XMLOutputFactory and is used to create JSON stream/event writers
  • JsonXMLConfig provides a shared configuration interface for JsonXMLInputFactory and JsonXMLOutputFactory
  • JsonXMLConfigBuilder provides a fluent API to build JsonXMLConfig configuration instances

If you know StAX, you’ll notice that there’s little new: just obtain a reader or writer from StAXON and you’re ready to go.

Writing JSON

Create a JSON-based writer:

XMLOutputFactory factory = new JsonXMLOutputFactory();
XMLStreamWriter writer = factory.createXMLStreamWriter(System.out);

Write your document:

writer.writeStartDocument();
writer.writeStartElement("customer");
writer.writeStartElement("name");
writer.writeCharacters("John Doe");
writer.writeEndElement();
writer.writeStartElement("phone");
writer.writeCharacters("555-1111");
writer.writeEndElement();
writer.writeEndElement();
writer.writeEndDocument();
writer.close();

With an XML-based writer, this would have produced something like

<customer><name>John Doe</name><phone>555-1111</phone></customer>

However, with our JSON-based writer, the output is

{"customer":{"name":"John Doe","phone":"555-1111"}}

Reading JSON

Create a JSON-based reader:

String json = "{\"customer\":{\"name\":\"John Doe\",\"phone\":\"555-1111\"}}";

XMLInputFactory factory = new JsonXMLInputFactory();
XMLStreamReader reader = factory.createXMLStreamReader(new StringReader(json));

Read your document:

assert reader.getEventType() == XMLStreamConstants.START_DOCUMENT;
reader.nextTag(); 
assert reader.isStartElement() && "customer".equals(reader.getLocalName());
reader.next();
assert reader.isStartElement() && "name".equals(reader.getLocalName());
reader.next();
assert reader.hasText() && "John Doe".equals(reader.getText());
reader.nextTag();
assert reader.isEndElement();
reader.next();
assert reader.isStartElement() && "phone".equals(reader.getLocalName());
reader.next();
assert reader.hasText() && "555-111".equals(reader.getText());
reader.nextTag();
assert reader.isEndElement();
reader.next();
assert reader.isEndElement();
reader.next();
assert reader.getEventType() == XMLStreamConstants.END_DOCUMENT;
reader.close();

Factory Configuration

The JsonXMLInputFactory and JsonXMLOutputFactory classes can be configured via the standard setProperty(String, Object) API. The factory classes define several constants for properties they support.

However, the JsonXMLConfig interface provides a convenient way to hold the configuration of both - input and output - factories:

JsonXMLConfig config = new JsonXMLConfigBuilder().
    virtualRoot("customer").
    prettyPrint(true).
    build();
XMLInputFactory inputFactory = new JsonXMLInputFactory(config);
...
XMLOutputFactory outputFactory = new JsonXMLOutputFactory(config);
...

Virtual Roots

Set the virtualRoot configuration property to strip the root element from the JSON representation, e.g.

{
  "name" : "John Doe",
  "phone" : "555-1111"
}

As XML requires a single root element, but JSON documents often don’t have one, this is an important feature required to read and write existing JSON formats.

Mastering Arrays

What about JSON arrays? Unfortunately, there’s nothing like this in XML. And to be honest, this causes most of the trouble when writing JSON via an XML API like StAX. Simply omitting the array boundaries would lead to non-unique JSON properties, which is usually not desired.

StAXON provides several ways to deal with JSON arrays. At the core is the idea to leverage XML processing instructions to tell the writer about to start an array: the <?xml-multiple?> processing instruction maps a sequence of XML elements with the same name to a JSON array.

The processing instruction optionally takes the array element tag name (with prefix) as data. There’s no end array hint as StAXON detects the end of an array sequence and closes it automatically.

Consider the following JSON document:

{
  "alice" : {
    "bob" : [ "edgar", "charlie" ],
    "peter" : null
  }
}

In order to get a "bob" array instead of two separate "bob" properties, we need to provide XML events corresponding to

<alice>
  <?xml-multiple?>
  <bob>edgar</bob>
  <bob>charlie</bob>
  <peter/>
</alice>

I.e., with the cursor API, you would just insert

writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET); // <?xml-multiple?>

to start an array.

Initiating Arrays with Element Paths

Sometimes it is not desired or even impossible to generate <?xml-multiple?> processing instruction to control arrays. This may be the case if the actual writing isn’t done by your code, but some other framework like JAXB or similar, and you only provide a stream writer.

Addressing such a scenario, wouldn’t it be nice being able to tell the writer beforehand, which elements should trigger a JSON array? This is where the XMLMultipleStreamWriter and XMLMultipleEventWriter wrappers step in.

E.g., to specify a sequence of bob elements below root element alice as a multiple path:

writer = new XMLMultipleStreamWriter(writer, true, "/alice/bob");

The boolean parameter specifies whether our paths include the root node (alice) from the paths. That is, we could also use

writer = new XMLMultipleStreamWriter(writer, false, "/bob");

To wrap all bob fields into arrays (not just alice children), we can use a relative path, without a leading slash:

writer = new XMLMultipleStreamWriter(writer, false, "bob");

Now we (or some legacy code, framework, …) may write our document, and the writer will take care to trigger the bob array for us.

Triggering Arrays automatically

Finally, if nothing else works for you, you may also let StAXON fully automatically determine array boundaries. Use this only if you cannot provide <?xml-multiple?> processing instructions and cannot provide the paths of the elements that should be wrapped into JSON arrays.

However, using this method has several drawbacks:

  • The writer basically needs to cache the entire document in memory, eating both space and time.
  • The writer will not be able to produce empty arrays or arrays with a single element.

To enable this feature, set the JsonXMLOutputFactory.PROP_AUTO_ARRAY property to true.

Triggering Document Arrays

StAXON’s writer implementation allows you to wrap a sequence of documents into a JSON array. To do this, write the <?xml-multiple?> PI before writing anything else:

writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET);
writer.writeStartDocument(); // first array component
...
writer.writeEndDocument();
writer.writeStartDocument(); // second array component
...
writer.writeEndDocument();
...
writer.close();

The writer.close() call is crucial here, as it will close the JSON array.

Using JAXB

Consider a JAXB-annotated Customer class:

@JsonXML(virtualRoot = true, prettyPrint = true, multiplePaths = "phone")
@XmlRootElement
public class Customer {
    public String name;
    public List<String> phone;
}

The @JsonXML annotation is used to configure the mapping details. In the above example, the customer root element is stripped from the JSON representation, phone elements are wrapped into an array and JSON output is nicely formatted, e.g.

{
  "name" : "John Doe",
  "phone" : [ "555-1111" ]
}

 

Now, the JsonXMLMapper class enables for dead-simple mapping to and from JSON:

/*
 * Create mapper instance.
 */
JsonXMLMapper<Customer> mapper = new JsonXMLMapper<Customer>(Customer.class);

/*
 * Read customer.
 */
InputStream input = getClass().getResourceAsStream("input.json");
Customer customer = mapper.readObject(input);
input.close();

/*
 * Write back to console
 */
mapper.writeObject(System.out, customer);

Using JAX-RS

StAXON provides the staxon-jaxrs module, which enables your RESTful services to serialize/deserialize JAXB-annotated classes to/from JSON. It includes the following JAX-RS @Provider classes:

  • de.odysseus.staxon.json.jaxrs.jaxb.JsonXMLObjectProvider is used to read and write JSON objects
  • de.odysseus.staxon.json.jaxrs.jaxb.JsonXMLArrayProvider is used to read and write JSON arrays

In order to select the StAXON message body readers/writers for your resource, a @JsonXML annotation is required.

When used with JAX-RS, the @JsonXML annotation can be placed on

  • a model type (@XmlRootElement or @XmlType) to configure its serialization and deserialization
  • a JAX-RS resource method to configure serialization of the result type
  • a parameter of a JAX-RS resource method to configure deserialization of the parameter type

If a @JsonXML annotation is present at a model type and a resource method or parameter, the latter will override the model type annotation. If neither is present, StAXON will not handle the resource.

You can find a sample project using Jersey with StAXON here.

Using XPath

XPath is another standard that can be easily adopted for use with JSON.

The Java XPath API (javax.xml.xpath) doesn’t let us provide an XMLStreamReader or similar as a source, but requires a Document Object Model (DOM). Therefore, we need to read our JSON into a DOM first to apply expressions against that DOM. This could be done by performing an XSLT identity transformation to a DOMResult. However, StAXON provides the DOMEventConsumer class to translate XML events to DOM nodes, which should be faster and simpler than leveraging XSLT.

Once we have a DOM, there’s nothing special with applying XPath expressions.

StringReader json = new StringReader("{\"edgar\":\"david\",\"bob\":\"charlie\"}");

/*
 * Our sample JSON has no root element, so specify "alice" as virtual root
 */
JsonXMLConfig config = new JsonXMLConfigBuilder().virtualRoot("alice").build();

/*
 * create event reader
 */
XMLEventReader reader = new JsonXMLInputFactory(config).createXMLEventReader(json);

/*
 * parse JSON into Document Object Model (DOM)
 */
Document document = DOMEventConsumer.consume(reader);

/*
 * evaluate an XPath expression
 */
XPath xpath = XPathFactory.newInstance().newXPath();
System.out.println(xpath.evaluate("//alice/bob", document));

Running the above sample will print charlie to the console.

What else?

In the end, using an XML API to read and write JSON may still look like a compromise, but it may turn out to be a good choice. The availability of a StAX implementation for JSON acts as a door opener to powerful XML related technologies and easily enables for dual-format (XML and JSON) services.

There’s more we can do with StAXON: XSD, XSLT, XQuery, XML-JSON/JSON-XML conversions, to name a few. Please check the Wiki for some of those.

Published at DZone with permission of its author, Christoph Beck.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)