Daniel have developed support for proprietary technology which is processing huge volumes of XML data. Now is working on Register of traffic accidents(J2EE application build on Websphere) for Ministry of Interior of the Slovak Republic. Daniel has posted 9 posts at DZone. You can read more from them at their website. View Full User Profile

XSD Schema is Not the Only Way

06.02.2010
| 9326 views |
  • submit to reddit
More than any other W3C Recommendation is the XSD the most criticized one. There are many reasons for the criticism but let's have a look at the most painful one, the readability.

I want to read XML description

There is a widely used technology for description of XML data structure called XSD Schema. You write an XSD for your XML data and then send it to the other sides of the planned data exchange in order to keep the XML structure. When the person on the other side receives the Schema, he needs to understand the data structure and prepare an API. He actually needs to read something like this:
<xs:element name="Books">
<xs:complexType>
<xs:sequence>
<xs:element name="Book" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="title" type="xs:string" use="required"/>
<xs:attribute name="author" type="xs:string" use="required"/>
<xs:attribute name="pagecount" type="xs:integer" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
For a computer it's not a problem to read XSD but it is a little exhausting for a human being. So first you will just use a "sketch", example data with commented data types.
<Books>

<Book title="required string" author="required string" pagecount="optional int"/>
</Books>
This way it is convenient enough for us to explain what the XML data should look like, but a parser needs real schema language for the validation, so we have to maintain several descriptions and keep them synchronized. We can see this happen in this and many other use-cases. But there are several other schema languages available. Let's have a look at the XDefinition, which is the closest one to your first "sketch".

XDefinition

XDefinition is a schema language developed from the beginning with close respect to the natural readability by keeping the form of the XML data source. Designed to be understandable not only for programmers but also to analysts, system architects and all the other parties concerned in the project. This kind of approach leaves no space to misinterpret data description during their exchange, starting from the architects to database specialists. So take a look at the complete XDefinition of our XML example.
<xd:def xmlns:xd      =  "http://www.syntea.cz/xdef/2.0"
xd:name = "Books"
xd:root = "Books">
<Books>
<Book title="required string" author="required string" pagecount="optional int"
xd:script="occurs *"/>
</Books>
</xd:def>
Implementations of the XDefinition processor are available in the Java 1.3-1.6 and .NET. The first thing on your mind probably is that it can't fully substitute the XSD functionality ... but it can, and much more than just substitute it.

Not just validation

Capabilities of the XDefinition exceed the boundaries of the validation concept. There is the possibility to make a transformation of the document based on the values of the input document or the other inputs, for example a database accessed by external methods. External methods are the methods of classes accessible through a context classloader available to the XDefinition processor. That means that you can run your own code from the XDefintion and use the result when processing the XML document.

Support

There is an available plugin to the NetBeans IDE providing a wide support for XDefintion, plugin for validation of XML in the oXygen XML editor, plugins for Eclipse and IntelliJ IDEA IDE's are in the early stage of the development. The NetBeans plugin comes with the context help, an example project, it's own XDefinition data type, an ability to check XDefinition, track bugs and an ability to validate XML documents.

Don't be afraid to try something else

There is a lot more to find interesting about the XDefinition, look for them on the web of Syntea software group a.s. or on the XDefintion page.
Resources:
  • Tutorial with examples
  • XDefinition guidepost

  • Published at DZone with permission of its author, Daniel Kec.

    (Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

    Comments

    Alessandro Santini replied on Wed, 2010/06/02 - 4:40am

    Ever used an IDE to model/read an XML schema? I can assure you it is very easy.

    What IMHO would be nice to see is some more sophisticated way of declaring constraints. Readability is a non-issue (and I can assure that I am deeply involved with XML/XSD and the rest of the family).

    Nicolas Frankel replied on Wed, 2010/06/02 - 4:43am in response to: Alessandro Santini

    +1.

    Some technologies are not human-friendly. It should be the goal of these technologies to get support from popular IDEs.

    Another good example is Maven POM which can get pretty complex but is easily readable with m2eclipse Eclipse plugin or NetBeans.

    Daniel Kec replied on Wed, 2010/06/02 - 5:53am in response to: Alessandro Santini

    XML is keeping the lead as a data structure because the readability, same way the Schema languages should. WYSIWYG is a nice thing but history already taught us the positives and negatives of this approach at the development domain.

    Alessandro Santini replied on Wed, 2010/06/02 - 6:22am in response to: Daniel Kec

    I first of all think that XML is not leading the way anywhere because of its many drawbacks (payload size in primis). In addition to this, XML may prove very handy where a simple and effective way of interchanging structured data is required, but not in all cases.

    Readability is a concern mostly to the XML-configuration freaks. The abuse of this technology has been long debated so I won't trigger that thread again.

    The WYSIWYG thing is as good and as bad as introducing new technologies every ten minutes, particularly for those who do not survive.

    Daniel Kec replied on Wed, 2010/06/02 - 6:44am in response to: Alessandro Santini

    XDefiniton is 7 years old, that is about two years younger than the XSD and same as RELAX NG which reacted in same time on the same cause XSD's. I guess that you generally disagree with James Clark's opinions about XML but there is lot of us who doesn't.

    Alessandro Santini replied on Wed, 2010/06/02 - 7:44am in response to: Daniel Kec

    As complexity increases, readability decreases. Having said that, I bet anyone to say that they can comfortably read a complex model without the help of a tool.

    As to James Clark, yes and no - I certainly agree that XSD has issues (and that is why I said that it would be interesting to have a sophisticated way of declaring constraints, see my first comment) but I totally disagree when he states that the readability must be the priority-1 objective of any XML-related initiative: IMHO payload (i.e. XML) should be both user-and-machine readable, not everything.

    Jethro Borsje replied on Wed, 2010/06/02 - 8:52am

    I think we shouldn't try to make everything human readable at all cost. Sometimes something is just not very human readable but still quite useful because of it's machine readability, for example: RDF & RDFS.

    Daniel Kec replied on Wed, 2010/06/02 - 9:13am in response to: Jethro Borsje

    But we are talking about description of the structure not the data itself, that is something people have to read maybe more often than the machine.

    Alessandro Santini replied on Wed, 2010/06/02 - 9:23am in response to: Daniel Kec

    Again, using an IDE - that is perfectly fine to me and I presume to most developers. In this context, besides its oddities, XML schema is sufficient in most cases and has a superior tooling support. It is not by coincidence if it is the most widespread, don't you agree?

    Daniel Kec replied on Wed, 2010/06/02 - 9:34am in response to: Alessandro Santini

    Not everyone involved with the data structure have to be using the specialized IDE, there aren't only programmers, thats is great misunderstanding among coders. Do you really believe that widespread=best?

    Alessandro Santini replied on Wed, 2010/06/02 - 9:47am in response to: Daniel Kec

    I would revert your statement - business users are the ones who need a visual tool! Developers can peek at the source code, business users want a clear picture of the meaning and not stare at the wonderful geekiness of an XML file.

    Jethro Borsje replied on Wed, 2010/06/02 - 10:21am

    I really do not see the problem, as Alessandro Santini says, there is more than enough tooling support. So for those users who feel an XSD is to complicated to read there are GUI tools to make the XSD visual.

    Daniel Kec replied on Wed, 2010/06/02 - 11:34am in response to: Jethro Borsje

    Desperate times call for desperate measures, that's the WYSIWYG syndrome all over again

    Alessandro Santini replied on Wed, 2010/06/02 - 11:42am in response to: Daniel Kec

    Rest assured that - if I have a syndrome - it is the one you also should have: the productivity syndrome - in other words, "get the job done" without wasting time. You represent the category of those who prefer to privilege the form over the substance. I simply don't.

    Andy Leung replied on Wed, 2010/06/02 - 12:05pm

    Thanks for the article, nice read.

    However, I am on Alessandro's boat. I think as a developer, if XSD is really causing a fortune to read and understand it, one should develop a tool to help oneself but others as well. I strongly believe that XSD is good enough at its own level and seriously, I don't see the difference between XDefinition. If a COBOL developer doesn't understand XSD, she won't understand XDefinition either. It's about her mindset. However, when she understands XSD, it is better for her to use some tools or build her own tool to make her life easier than extending on the XSD its own.

    Nowadays developers seldom have enough time to develop and write documentation within project timeline, therefore using a tool to help is the only way to complete the job with quality. If we still rely on the structural change of the content to help our eyeballs, that's so 1980s.

    Well just my 2 cents.

    Daniel Kec replied on Wed, 2010/06/02 - 12:45pm in response to: Andy Leung

    Let's make a nasty code from now on, nobody needs to read it any more :-) , no that's a joke. I do understand your angle of view, but every coin have two sides and I say that person not acquainted ever before is going to understand to XDefinition or RELAX much, much faster than XSD.

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.