Senior Java developer, one of the top stackoverflow users, fluent with Java and Java technology stacks - Spring, JPA, JavaEE. Founder and creator of http://computoser.com and http://welshare.com . Worked on Ericsson projects, Bulgarian e-government projects and large scale recruitment platforms. Member of the jury of the International Olympiad in Linguistics and the Program committee of the North American Computational Linguistics Olympiad. Bozhidar is a DZone MVB and is not an employee of DZone and has posted 82 posts at DZone. You can read more from them at their website. View Full User Profile

Don’t Use JSON And XML As Internal Transfer Formats

11.09.2012
| 5483 views |
  • submit to reddit

 You have a system that has multiple components and they have to communicate. They do that either via internal web services or using a message queue. Normally, you would want to send (data transfer) objects from one component to another. Three typical examples:

  • a user has registered and you send a message to a message queue and whenever the message is consumed, an email is sent to the user. The message needs the user at least the email and names of the user.
  • if your layers communicate via web services for some reason (rather than live within one JVM), on registration the web layer needs to invoke a back-end service and pass a User object.
  • you store objects in some (distributed) in-memory cache in order to reduce redundant calls to the database (assuming you map your database results to objects in some way, either an ORM or some mapper, but this is done in the majority of cases). So when a request arrives asking for a user profile, you check if it’s present in the cache, and if it is – you get it from there, rather than hitting the database.

In order to achieve these things you need to serialize the objects to some format that will then be deserialized on the other end. Many frameworks include XML and JSON serializers and they are used in many examples online. Therefore people are inclined to use JSON or XML for these purposes. And that’s not a good idea. Using these formats internally has no benefit – you don’t actually need the serialized objects to be human-readable, and if you need to read the message contents, then you have the facilities to deserialize it and print it to a log file.

But there are major drawbacks – speed and size. Both formats are text-based (so that they can be human-readable), which means they are unnecessarily verbose. Yes, JSON is less verbose than XML, but it’s still a text format that you don’t need. Instead, in most cases you’d better use binary serialization. Almost any binary serialization is better. I have evaluated a couple and the ease of use + speed and size benefits made me choose MessagePack. But you can also use protobuf, bson , avro or whatever fits your project.

Yes, I know, I also said “this is probably a micro-optimization”. And then I ran some benchmark on our messages to see the time and size saved. I don’t remember the exact figures, but MessagePack was a lot faster and had a lot smaller message size, and seeing the results made me go straight into coding a MessagePackConverter to replace the JSONConverter. It is a pretty small change for the huge impact it has on the whole system. And given the high volume of messages that our system needs to serialize and deserialize, spending one day on integrating MessagePack is totally worth it – after all that would allow you to process or store (say) twice as many messages with the same hardware (compared to JSON).

There are, of course, some things to consider, like versioning of the objects (if you add a field, does the deserialization of old messages break? In messagePack it does if the field is primitive, so you need a custom template to handle that) or if you are in a multi-language environment – is the deserialization library supported by all languages. Also, you usually have to let the serializer know the structure of your objects in advance, so here’s some additional code/annotations to populate the serializer context. But all of these are included in the “one day” mentioned above that I spent for integrating MessagePack.

And it is probably a good idea to mention that if you are exposing an API to 3rd parties, you can’t rely on these serializers – your API should be JSON/XML, because there it needs to be human-readable and it needs to be supported in every language.

But unless you totally don’t care about your resources (probably because it’s a system with little usage), seriously consider a binary serialization mechanism for your internal messaging, APIs, caching, etc.

Published at DZone with permission of Bozhidar Bozhanov, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)