NoSQL Zone is brought to you in partnership with:

Francesca is the Community Marketing Manager for MongoDB at 10gen. In a former life she hacked on C++ and Java. She currently plays with Ruby on Rails and is learning to love Node.js. She blogs and tweets with varying consistency. Francesca has posted 2 posts at DZone. View Full User Profile

BSON and Data Interchange

12.29.2011
| 7868 views |
  • submit to reddit

There’s a lot of good things about JSON — it’s a standards based, language independent, representation of object-like data. Also, it’s easy to read (for users and programmers alike). Each document is only about data, not complex object graphs and links. Thus it’s easy to inspect without knowing all the code of an application.

This article was originally authored by Spencer Brody, a 10gen Software Engineer and Dwight Merriman, a  MongoDB Core Contributor, founder and CEO of 10gen


Further, JSON is “schemaless”. We do not have to predefine our (protocol) schema. This can be quite helpful: imagine RPC’ing data from client A to server B with a fixed schema for the messages. On a schema change both need to be ‘updated’ with the new schema. If there are many components to the system it’s even more complicated of course. There is some analogy here to XML, which can (optionally) be schemaless.

It would be nice to have a binary representation of JSON. That is what BSON is all about.

So what are the goals of BSON? They are:

  1. Fast scan-ability. For very large JSON documents, scanning can be slow. To skip a nested document or array we have to scan through the intervening field completely. In addition as we go we must count nestings of braces, brackets, and quotation marks. In BSON, the size of these elements is at the beginning of the field’s value, which makes skipping an element easy.
  2. Easy manipulation. We want to be able to modify information within a document efficiently. For example, incrementing a field’s value from 9 to 10 in a JSON text document might require shifting all the bytes for the remainder of the document, which if the document is large could be quite costly. (Albeit this benefit is not comprehensive: adding an element to an array mid-document would result in shifting.) It’s also helpful to not need to translate the string “9” to a numeric type, increment, and then translate back.
  3. Additional data types. JSON is potentially a great interchange format, but it would be nice to have a a few more data types. Most importantly is the addition of a “byte array” data type. This avoids any need to do base64 encoding, which is awkward.


One thing that is not a goal of BSON: compactness. The JSON document {“field”:7} represents the number seven as a single byte. That’s pretty good.

Perhaps the best example to date of usage of BSON is MongoDB. MongoDB uses it heavily — for sending documents over the network, persisting them to disk, as well as for internal data manipulations. In fact this is where BSON originated, although today it is a separate spec that should not be considered coupled to any one particular project.

There is BSON serialization and deserialization code for most languages; implementations are open source and mostly available under Apache 2 license. Quite a few implementations originated from a MongoDB drivers; work is underway in most drivers to fully decouple, although independent use works fine today.

If you or someone you know is using BSON in a project, please let us know by posting on the BSON mailing list. Check out bsonspec.org for more information.

Source:  http://blog.mongodb.org/post/9333386434/bson-and-data-interchange

Published at DZone with permission of its author, Francesca Krihely.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Kathy John replied on Thu, 2012/02/23 - 10:03am

BSON is very slow compared to MessagePack and produces significantly larger output. It does have some advantages, for example that it can natively encode dates, MD5's, UUIDs, and a few other structures. There may also be cases where BSON is better suited for database purposes, such as rewriting parts of a document without unpacking and repacking the whole structure, but I'm not qualified to judge that. 

I've benchmarked BSON under JRuby to be around ten times as slow as MessagePack, and around five times as slow as JSON (all implementations are running more or less exclusively in Java code).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.