Enterprise Integration Zone is brought to you in partnership with:

Ayende Rahien is working for Hibernating Rhinos LTD, a Israeli based company producing developer productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com) and more. Ayende is a DZone MVB and is not an employee of DZone and has posted 451 posts at DZone. You can read more from them at their website. View Full User Profile

Just Drop to Binary if You're Going to Compress Your JSON

09.18.2013
| 12687 views |
  • submit to reddit

This post was written at 5:30AM.  I had this thought while doing research for another post, and I couldn’t really let it go.

XML, as a text base format, is really wasteful in space. But that wasn’t what really made it lose its shine. That happened when it became so complex that it stopped being human readable. For example, I give you:

 <?xml version="1.0" encoding="UTF-8" ?>

   <SOAP-ENV:Envelope

   xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"

    xmlns:xsd="http://www.w3.org/1999/XMLSchema"

    xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">

    <SOAP-ENV:Body>

        <ns1:getEmployeeDetailsResponse

         xmlns:ns1="urn:MySoapServices"

         SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

            <return xsi:type="ns1:EmployeeContactDetail">

                <employeeName xsi:type="xsd:string">Bill Posters</employeeName>

                <phoneNumber xsi:type="xsd:string">+1-212-7370194</phoneNumber>

                <tempPhoneNumber

                 xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"

                 xsi:type="ns2:Array"

                 ns2:arrayType="ns1:TemporaryPhoneNumber[3]">

                    <item xsi:type="ns1:TemporaryPhoneNumber">

                        <startDate xsi:type="xsd:int">37060</startDate>

                        <endDate xsi:type="xsd:int">37064</endDate>

                        <phoneNumber xsi:type="xsd:string">+1-515-2887505</phoneNumber>

                    </item>

                    <item xsi:type="ns1:TemporaryPhoneNumber">

                        <startDate xsi:type="xsd:int">37074</startDate>

                        <endDate xsi:type="xsd:int">37078</endDate>

                        <phoneNumber xsi:type="xsd:string">+1-516-2890033</phoneNumber>

                    </item>

                    <item xsi:type="ns1:TemporaryPhoneNumber">

                        <startDate xsi:type="xsd:int">37088</startDate>

                        <endDate xsi:type="xsd:int">37092</endDate>

                        <phoneNumber xsi:type="xsd:string">+1-212-7376609</phoneNumber>

                    </item>

                </tempPhoneNumber>

            </return>

        </ns1:getEmployeeDetailsResponse>

    </SOAP-ENV:Body>

 /SOAP-ENV:Envelope>

After XML was banished from the company of respectable folks, we had JSON show up and entertain us. It is smaller and more concise than XML, and so far it has resisted the efforts to make it into some sort of a uber-complex enterprise-y tool.

But today, I ran into quite a few efforts in the community that try to do strange things to JSON. I am talking about things like JSON DB (a compressed JSON format, not an actual JSON database), JSONH, json.hpack and other things. All of these projects are attempts to reduce the size of JSON documents.

Let's take an example. The following code is a JSON document representing one of RavenDB's builds:

 {

   "BuildName": "RavenDB Unstable v2.5",

   "IsUnstable": true,

   "Version": "2509-Unstable",

   "PublishedAt": "2013-02-26T12:06:12.0000000",

   "DownloadsIds": [],

   "Changes": [

     {

       "Commiter": {

         "Email": "david@davidwalker.org",

         "Name": "David Walker"

       },

       "Version": "17c661cb158d5e3c528fe2c02a3346305f0234a3",

       "Href": "/app/rest/changes/id:21039",

       "TeamCityId": 21039,

       "Username": "david walker",

       "Comment": "Do not save Has-Api-Key header to metadata\n",

       "Date": "2013-02-20T23:22:43.0000000",

       "Files": [

         "Raven.Abstractions/Extensions/MetadataExtensions.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "david@davidwalker.org",

         "Name": "David Walker"

       },

       "Version": "5ffb4d61ad9102696948f6678bbecac88e1dc039",

       "Href": "/app/rest/changes/id:21040",

       "TeamCityId": 21040,

       "Username": "david walker",

       "Comment": "Do not save IIS Application Request Routing headers to metadata\n",

       "Date": "2013-02-20T23:23:59.0000000",

       "Files": [

         "Raven.Abstractions/Extensions/MetadataExtensions.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

        "Version": "5919521286735f50f963824a12bf121cd1df4367",

       "Href": "/app/rest/changes/id:21035",

       "TeamCityId": 21035,

       "Username": "ayende rahien",

       "Comment": "Better disposal\n",

       "Date": "2013-02-26T10:16:45.0000000",

       "Files": [

         "Raven.Client.WinRT/MissingFromWinRT/ThreadSleep.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "c93264e2a94e2aa326e7308ab3909aa4077bc3bb",

       "Href": "/app/rest/changes/id:21036",

       "TeamCityId": 21036,

       "Username": "ayende rahien",

       "Comment": "Will ensure that the value is always positive or zero (never negative).\nWhen using numeric calc, will div by 1,024 to get more concentration into buckets.\n",

       "Date": "2013-02-26T10:17:23.0000000",

       "Files": [

         "Raven.Database/Indexing/IndexingUtil.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "7bf51345d39c3993fed5a82eacad6e74b9201601",

       "Href": "/app/rest/changes/id:21037",

       "TeamCityId": 21037,

       "Username": "ayende rahien",

       "Comment": "Fixing a bug where we wouldn't decrement reduce stats for an index when multiple values from the same bucket are removed\n",

       "Date": "2013-02-26T10:53:01.0000000",

       "Files": [

         "Raven.Database/Indexing/MapReduceIndex.cs",

         "Raven.Database/Storage/Esent/StorageActions/MappedResults.cs",

         "Raven.Database/Storage/IMappedResultsStorageAction.cs",

         "Raven.Database/Storage/Managed/MappedResultsStorageAction.cs",

         "Raven.Tests/Issues/RavenDB_784.cs",

         "Raven.Tests/Storage/MappedResults.cs",

         "Raven.Tests/Views/ViewStorage.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "ff2c5b43eba2a8a2206152658b5e76706e12945c",

       "Href": "/app/rest/changes/id:21038",

       "TeamCityId": 21038,

       "Username": "ayende rahien",

       "Comment": "No need for so many repeats\n",

       "Date": "2013-02-26T11:27:49.0000000",

       "Files": [

         "Raven.Tests/Bugs/MultiOutputReduce.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "0620c74e51839972554fab3fa9898d7633cfea6e",

       "Href": "/app/rest/changes/id:21041",

       "TeamCityId": 21041,

       "Username": "ayende rahien",

       "Comment": "Merge branch 'master' of https://github.com/cloudbirdnet/ravendb into 2.1\n",

       "Date": "2013-02-26T11:41:39.0000000",

       "Files": [

         "Raven.Abstractions/Extensions/MetadataExtensions.cs"

       ]

     }

   ],

   "ResolvedIssues": [],

   "Contributors": [

     {

       "FullName": "Ayende Rahien",

       "Email": "ayende@ayende.com",

       "EmailHash": "730a9f9186e14b8da5a4e453aca2adfe"

     },

     {

       "FullName": "David Walker",

       "Email": "david@davidwalker.org",

       "EmailHash": "4e5293ab04bc1a4fdd62bd06e2f32871"

     }

   ],

    "BuildTypeId": "bt8",

   "Href": "/app/rest/builds/id:588",

   "ProjectName": "RavenDB",

   "TeamCityId": 588,

   "ProjectId": "project3",

   "Number": 2509

 }

This document is 4.52KB in size. Running this through JSONH gives us the following:

 [

     14,

     "BuildName",

     "IsUnstable",

     "Version",

     "PublishedAt",

     "DownloadsIds",

     "Changes",

     "ResolvedIssues",

     "Contributors",

     "BuildTypeId",

     "Href",

     "ProjectName",

     "TeamCityId",

      "ProjectId",

     "Number",

     "RavenDB Unstable v2.5",

     true,

     "2509-Unstable",

     "2013-02-26T12:06:12.0000000",

     [

     ],

     [

         {

             "Commiter": {

                 "Email": "david@davidwalker.org",

                 "Name": "David Walker"

             },

             "Version": "17c661cb158d5e3c528fe2c02a3346305f0234a3",

             "Href": "/app/rest/changes/id:21039",

             "TeamCityId": 21039,

             "Username": "david walker",

             "Comment": "Do not save Has-Api-Key header to metadata\n",

             "Date": "2013-02-20T23:22:43.0000000",

             "Files": [

                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "david@davidwalker.org",

                 "Name": "David Walker"

             },

             "Version": "5ffb4d61ad9102696948f6678bbecac88e1dc039",

             "Href": "/app/rest/changes/id:21040",

             "TeamCityId": 21040,

             "Username": "david walker",

             "Comment": "Do not save IIS Application Request Routing headers to metadata\n",

             "Date": "2013-02-20T23:23:59.0000000",

             "Files": [

                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

             },

             "Version": "5919521286735f50f963824a12bf121cd1df4367",

             "Href": "/app/rest/changes/id:21035",

             "TeamCityId": 21035,

             "Username": "ayende rahien",

             "Comment": "Better disposal\n",

             "Date": "2013-02-26T10:16:45.0000000",

             "Files": [

                 "Raven.Client.WinRT/MissingFromWinRT/ThreadSleep.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

             },

             "Version": "c93264e2a94e2aa326e7308ab3909aa4077bc3bb",

              "Href": "/app/rest/changes/id:21036",

             "TeamCityId": "...bug where we wouldn't decrement reduce stats for an index when multiple values from the same bucket are removed\n",

             "Date": "2013-02-26T10:53:01.0000000",

             "Files": [

                 "Raven.Database/Indexing/MapReduceIndex.cs",

                 "Raven.Database/Storage/Esent/StorageActions/MappedResults.cs",

                 "Raven.Database/Storage/IMappedResultsStorageAction.cs",

                 "Raven.Database/Storage/Managed/MappedResultsStorageAction.cs",

                 "Raven.Tests/Issues/RavenDB_784.cs",

                 "Raven.Tests/Storage/MappedResults.cs",

                 "Raven.Tests/Views/ViewStorage.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

             },

             "Version": "ff2c5b43eba2a8a2206152658b5e76706e12945c",

             "Href": "/app/rest/changes/id:21038",

             "TeamCityId": 21038,

             "Username": "ayende rahien",

              "Comment": "No need for so many repeats\n",

             "Date": "2013-02-26T11:27:49.0000000",

             "Files": [

                 "Raven.Tests/Bugs/MultiOutputReduce.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

              },

             "Version": "0620c74e51839972554fab3fa9898d7633cfea6e",

             "Href": "/app/rest/changes/id:21041",

             "TeamCityId": 21041,

             "Username": "ayende rahien",

             "Comment": "Merge branch 'master' of https://github.com/cloudbirdnet/ravendb into 2.1\n",

             "Date": "2013-02-26T11:41:39.0000000",

             "Files": [

                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"

             ]

         }

     ],

     [

     ],

     [

         {

             "FullName": "Ayende Rahien",

              "Email": "ayende@ayende.com",

             "EmailHash": "730a9f9186e14b8da5a4e453aca2adfe"

         },

         {

             "FullName": "David Walker",

             "Email": "david@davidwalker.org",

             "EmailHash": "4e5293ab04bc1a4fdd62bd06e2f32871"

         }

     ],

     "bt8",

     "/app/rest/builds/id:588",

     "RavenDB",

     588,

     "project3",

     2509

 ]

It reduced the document size to 2.93KB! Awesome!  That's nearly half of the original size. Except – this is actually generating an utterly unreadable mess. Can you look at this and figure out what the hell is going on?

I thought not. At this point, we might as well use a binary format. I happen to have a zip tool at my disposal, so I checked what would happen if I threw this JSON through that. The end result was a 1.42KB file, and I had no more loss of readability than I did with the JSONH code.

To be frank, I just don’t get efforts like this. JSON is a text base in human readable format. If you lose the human readable portion of the format, you might as well drop directly to binary. It is likely to be more efficient, and you don’t lose anything by doing it.

If you want to compress your data, it is probably better to use something like a compression tool. HTTP Compression, for example, is practically free, since all servers and clients should be able to consume it now. And any tool that you use should be able to inspect through it. Plus, it's likely to generate much better results from your JSON documents than if you try a clever format like the one generated by JSONH.




Published at DZone with permission of Ayende Rahien, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Michail Argyriou replied on Wed, 2013/09/18 - 7:53am

Good point about JSON compression!

About large xml files and their usage ..xml is not intended to be read only by humans but mainly being a portable data transfer protocol to be consumed by code...no human will ever read it! so I agree that large xml files can be confusing to human but it doesn't matter! On addendum think WSDL files...they are large...(almost) nobody can read or write them by hand but we have tools that generate/consume them!



Simon Bird replied on Wed, 2013/09/25 - 7:26am in response to: Michail Argyriou

I've worked with XML for over 10 years and I've yet to see any instance of an XML document that contains the element types (e.g. xsi:type="string") that the author has used in his example.

So if the intent was to show how poor XML can be then it's worked but no one in their right mind would build an XML document such as the example given.  It does also contain unused and duplicated namespace definitions.  

We can all make up examples to make our point but this is a lazy way of making the point.

I suspect XML has got it's poor reputation not because of itself but because few people design the schema that describes the document in a way that is both optimal and useful.   The same can happen with JSON, without design you will still get documents that are poorly structured and difficult to understand.

Compression has it's place but there's no excuse for not putting in the effort to design the data model that either XML or JSON is there to support in the first place.

Ed Griebel replied on Wed, 2013/09/25 - 9:41am

XML definitely has issues, but using a SOAP request as the example of "bad XML" is unfair as the SOAP protocol is almost universally derided as the epitome of the bad things that happen when you have a committee designing a specification.  

The continued popularity of XML is precisely that it is human- and machine-readable. Without any other explanation one could guess what this chunk of XML represents:

<trade>
  <block>
    <name>IBM</name>
    <price>188</price>
    <shares>3000</shares>
  </block>
  <block>
    <name>AMZN</name>
    <price>315</price>
    <shares>1000</shares>
  </block>
  <time>2013-09-30T14:15:00Z</time>
  <transid>8675309</transid>
  <transtype>BUY</transtype>
</trade>

Try the same thing with an object expressed in a Windows WCF binary RPC or CORBA IDL instance. 

Somenath Mukhop... replied on Fri, 2013/09/27 - 5:45am

how  about converting xml directly into protocol buffer format?

Robert StJohn replied on Fri, 2013/09/27 - 9:32am

Consider that your interpretation and expectations of the term "readable" might be the cause of your consternation.  I think what "human readable" should really mean to you is that you are able to easily generate and interact with the content in a text editor for purposes of testing and debugging.  A binary format will not allow you to do the same without special tools.  I agree that sending any text-based data over the wire should include compression, but that's less a binary format than it is an extra layer wrapping the content, which is easily unwrapped for human readability with standard tools.  Any large, rich data entity will be difficult to view, but I'd much rather have the ability to consume it directly with vi, less, sed, etc. instead of using some proprietary binary format that defines its grammar in terms of binary symbols and looks like hieroglyphs in an editor.

Robert StJohn replied on Fri, 2013/09/27 - 9:41am in response to: Simon Bird

I whole-heartedly agree, Simon.  XML's biggest problem is lazy users.  Instead of taking the time to actually understand the format and design useful grammars, like you said, what we have is a world of full of tragic tools like JAX-RPC and its successors that do a poor job at best of auto-generating grammars and documents and taking advantage of the full richness XML can offer.

Evgeniy Karyakin replied on Fri, 2013/10/04 - 3:04am

 Is EXI W3C direction effectively dead?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.