The Problem Is Not JSON Or XML, It Is About Data Context
There is an interesting discussion occurring regarding data transfer in web applications. The discussion has centered on the differences between JSON and XML in the JavaScript heavy sites. It started with Norm Walsh commenting on Twitter and Foursquare removing support for XML in their APIs. The basic idea of his post was that if you are using JavaScript, and you are only passing around atomic values or lists and hashes of atomic values, then JSON makes complete sense. He then talks about the difficulty of JSON when you need more context or you have mixed content. Overall, it was a very sensible post. The discussion gained steam because of Norm’s “Meh” reaction, and because talking about “which is the better technology” tends to get people all riled up.
A few days after Norm’s post, two other posts appeared refuting his stance even though in many comments it is considered a non-debate. First, Manu Sporny talked about the move to JSON being more of a paradigm shift to simpler markup. The complaints about SOAP and XML Schemas are obvious, but he complicates the argument by introducing JSON-LD into the conversation. JSON-LD introduces syntax for JSON to denote LinkedData, and there is notation very similar to XML Namespaces, to which Norm replies “Wow. By the time you start doing that, you’re sure you wouldn’t be better with a richer markup vocabulary?” Lastly, James Clark throws his opinion into the mix. His commentary is more about the fact that XML is losing web developers which could be a bad thing.
Now that you have the background story, I wanted to state that people are missing Norm Walsh’s original point. This problem is about context and it is not really being treated that way. People are using JSON for web development because there is almost zero learning curve. It is used because of the increasing trend of JavaScript heavy sites to drive interactivity and some of the mashup creativity. Because Twitter’s is readily available, people have created widgets to display their tweets on a web page. For many web developers that means grabbing a JSON representation of some tweets and converting it to HTML. This process barely takes longer than trying to find the correct API documentation.
In this same context, if the APIs are XML based you then need something to parse the XML into an appropriate JavaScript object. You can already tell that this is getting more complicated than simple JSON. To make matters worse, past browsers handled XML differently and sometimes very poorly. Because web developers were depending on the XML support in the browser, the problems of cross-browser support arose again. Obviously, developers do not want to go down that path and JSON is easier anyway.
However, what if you are using PHP or Java on the server? PHP has plenty of XML handling libraries, with SimplePie being a hugely popular RSS feed processing library. If you can make the Twitter API call from your Java server code, there are plenty of libraries for handling XML there as well. So, in that context XML may be a better option.
If you look at the type of problem, this could also change the data format. In the Twitter API example, if you have a simple widget that just displays tweets, then a solution based on JavaScript and JSON makes a lot of sense. What if that widget needed to be more dynamic? For example, let’s say that someone registered on your site will see those tweets displayed differently and there are additional links in the tweet for replying or retweeting. You could write some JavaScript code to just check for registered users and generate different HTML, but this gets unwieldy if the number of different displays grows beyond 2 or 3. If the data is in XML, then you can write different XSLT scripts for each display which remains separate from the main widget code. You just need to select the appropriate XSLT based on the user interacting with the site. At this point people are likely going to complain about the use of SOAP for web services and its complexity. Let’s ignore that option as REST has won, and SOAP is overly complex, can we agree on that and move on?
As with any programming problem, different requirements and different contexts may call for different technologies. If you get stuck on saying that JSON is better than XML (or the other way around), you lose another tool in your toolbox.
The other context that people are missing is why Twitter and Foursquare chose to support JSON only. This is likely a question of application complexity and analytics. Like any good API provider, Twitter is probably tracking all calls to the API and this includes the data format requested. It is very possible that the demand for XML was fairly low and it did not warrant separate support. In addition to this, there are plenty of JSON processing libraries available for mainstream languages like Java, so there was little risk in dropping support for XML. If there is no support for XML, then their API becomes simpler to support. That means less code to maintain, simpler maintenance of code because there are not multiple representations of one set of data, and fewer questions about the different formats.
So, quit whining about whose data format is better. Each one is better in a different context, otherwise it is highly unlikely that they would have become so popular. The important thing is to learn both formats, and other popular ones that appear, that way you can make an educated decision on which format to use in your situation.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)






Comments
Greg Brown replied on Fri, 2010/12/03 - 9:22am
Robert Csala replied on Fri, 2010/12/03 - 9:41am
Greg Brown replied on Fri, 2010/12/03 - 9:52am
in response to:
Robert Csala
Robert Csala replied on Fri, 2010/12/03 - 10:02am
in response to:
Greg Brown
Stan Dyck replied on Fri, 2010/12/03 - 11:45am
Here's the thing you're missing though. HTML *is* XML. If twitter provides me with nice, semantically marked up, well formed html, I don't need to covert or parse anything. I can just jam the output directly into a page as is and style it however I want with a css file. That's the easiest option of all.
You say the html isn't formed the way you want? Well, first I'd challenge you to make sure you really mean that, but if you aren't convinced, then there are already tons of great javascript libraries (I'm partial to JQuery) that know how to manipulate plain old html in whatever fashion you want. No xml parsers or xslt knowledge is required.
Robert Diana replied on Fri, 2010/12/03 - 11:53am
in response to:
Greg Brown
Greg Brown replied on Fri, 2010/12/03 - 11:59am
in response to:
Stan Dyck
Greg Brown replied on Fri, 2010/12/03 - 12:01pm
in response to:
Robert Diana
Robert Diana replied on Fri, 2010/12/03 - 12:04pm
in response to:
Stan Dyck
Robert Diana replied on Fri, 2010/12/03 - 12:08pm
in response to:
Greg Brown
Robert Diana replied on Fri, 2010/12/03 - 12:10pm
in response to:
Greg Brown
Greg Brown replied on Fri, 2010/12/03 - 12:13pm
in response to:
Robert Diana
Stan Dyck replied on Fri, 2010/12/03 - 1:05pm
in response to:
Greg Brown
It is true that the rules for html are more lax, but if I'm an api provider I am free to produce html that is well formed xml and to guarantee that it is so. Call it xhtml if you wish. Nothing prevents me from presenting the same html differently from the way you do, either via css or more sophisticated javascript means.
How is using jquery (say) to manipulate html more difficult? Remember, my requirement is semantically marked up, well formed html. If a company like twitter is interested in having people use their api, they will make it as easy to use as possible. What I'm saying is that this is the easiest way.
My main point is that (svg and pdf formatting aside) the stuff I get from twitter or whomever is problably going to end up as html. Why not send it to me as html instead of making me convert it from some xml or json format *into* html.
p.s. Your statement that html isn't designed for data transfer is pretty funny considering that (I would guess) the majority of the world's data gets transferred as html every minute of every day.
Stan Dyck replied on Fri, 2010/12/03 - 1:12pm
in response to:
Robert Diana
Nothing prevents you (meaning the API provider) from:
- serving up well formed (in the xml spec sense) html
- marking it up semantically (without caring about the presentation aspects)
My point is that if you do this, it is easier for me to use. The "one purpose" of this html should be to mark up the data. Most people get caught up in the idea that html is for presentation purposes only. That is not the case.Greg Brown replied on Fri, 2010/12/03 - 1:31pm
in response to:
Stan Dyck
Attila Király replied on Fri, 2010/12/03 - 1:32pm
in response to:
Greg Brown
This is not accurate. The HTML5 standard supports two serialization formats: HTML and XHTML. Both syntax are part of the spec. So you can use xml syntax for HTML5. And I think xhtml will rise in the future because after long years IE with version 9 is finally supporting application/xhtml+xml content type too.
HTML syntax
XHTML syntax
Greg Brown replied on Fri, 2010/12/03 - 1:40pm
in response to:
Stan Dyck
Greg Brown replied on Fri, 2010/12/03 - 1:42pm
in response to:
Attila Király
Stan Dyck replied on Fri, 2010/12/03 - 2:02pm
in response to:
Greg Brown
You already have to worry about my (when I say "my" here I mean an API provider) formatting and deal with my changes in the JSON or proprietary XML cases. I can change those just as easily as I can change the html that I produce. The difference is that if I always produce html, and you don't modify it, you can at least be sure that *something* will render. If I change my xml or json format under your nose, your use of my api will probably break.
If you don't like my html you don't have to send it to your clients' browsers. You can manipulate it how you want. But then you have to do that with XML or JSON too. In fact you are required to since you can't display them as XML or JSON.
I can't say I'm not crazy, but I'm not really joking, nor do I think I am alone. My guess is that you may not understand what I mean by semantic html. Wikipedia can explain that better than I.
Greg Brown replied on Fri, 2010/12/03 - 2:10pm
in response to:
Stan Dyck
Greg Brown replied on Fri, 2010/12/03 - 2:27pm
in response to:
Stan Dyck
Stan Dyck replied on Fri, 2010/12/03 - 3:01pm
in response to:
Greg Brown
No, you were describing the design of html. The fact is data *is* transferred as html. When you refresh this page, the data stored in some db somewhere will be converted and transferred to your browser as html. Multiply that by a few billion clicks across the Internet. See what I'm saying?
Perhaps an example (bear with me...or not...either way). Say I want to produce an application/website/whatever that displays articles from this site but one that removes silly commentary by people named "Stan". Can't imagine why, but hey, the client is always right.
I don't have this site's database or their nice object model with its awesome separation of presentation. (I don't even have their permission but we'll let that slide). All I've got to work with is the html that is spit out when I go to urls on their site. Under those circumstances, the best I can hope for is
Perhaps each comment is contained in a div with a "comment" class. Each comment has a span tag around the comment's author with "authorname" class attribute (bye bye, Stan!). The article itself might be inside a div tag with an "article" class and a unique id in an id attribute so I can strip out all the ads and non-article clutter around it. Stuff like that.
With that kind of thing in place, I can do wondorous things with the html because IT HAS BEEN RENDERED AS DATA!!! The nice thing too is that it's not really much of a burden for the people producing the html to make these changes (look at the source of this page. They are already most of the way there). They probably should be doing it anyway, especially if they want to do things like publish an API but don't want to take the time to translate their object model to some arbitrary textual format. HTML is a text format. It's expressive. Lots of things know how to deal with it. We should use it!
If you read this far, you are awesome. I hope you had fun. Thanks.
Stan Dyck replied on Fri, 2010/12/03 - 3:19pm
in response to:
Greg Brown
Stan Dyck replied on Fri, 2010/12/03 - 3:28pm
in response to:
Greg Brown
We're on two sides of the same coin now. Not the best way for all types, but much more than none is what I say.
p.s. I think html has a table element. Most people misuse it, but a perfectly functional semantic construct. [ducking to avoid thrown objects].
Greg Brown replied on Fri, 2010/12/03 - 3:38pm
in response to:
Stan Dyck
Greg Brown replied on Fri, 2010/12/03 - 3:38pm
in response to:
Stan Dyck
Stan Dyck replied on Fri, 2010/12/03 - 4:04pm
in response to:
Greg Brown
Fair enough, application specific then, but you obviously took my meaning. You say, "I would still need to write logic...". I say in the semantic html case "I might not need to write logic..." and in the application-specific xml or json case "I am required to write logic...". The difference between "might not need to" and "am required to" can be huge and is certainly reason to use semantic html as opposed to xml assuming I want to maximize the usefulness of my API.
As for your example, the answer is easy. I would use the hCard microformat. Take a look at how they model your exact example in html. One advantage that accrues magically is that other people know about hCard and can do things with it without any prior knowledge of your API. For example Firefox extensions like Operator. It costs a site developer very little to mark up contact data this way.
Greg Brown replied on Sat, 2010/12/04 - 9:48am
in response to:
Stan Dyck
Nicolas Frankel replied on Sun, 2010/12/05 - 2:27pm
Simple and to the point. You read my mind (or my article): it's not only true for XML vs Jason but you can generalize for TechX vs TechY.
When you hold a hammer, everything looks like a nail!
Tim O'farrell replied on Mon, 2010/12/06 - 7:17am
in response to:
Greg Brown