Big Data/Analytics Zone is brought to you in partnership with:

Kristina Chodorow is a core contributor to MongoDB. She has written several O'Reilly books (MongoDB: The Definitive Guide, Scaling MongoDB, and 50 Tips and Tricks for MongoDB Developers) and has given talks at conferences around the world, including OSCON, FOSDEM, Latinoware, TEK·X, and YAPC. Her Twitter handle is @kchodorow. Kristina is a DZone MVB and is not an employee of DZone and has posted 52 posts at DZone. You can read more from them at their website. View Full User Profile

The Rise of Big Data

10.11.2013
| 4720 views |
  • submit to reddit

I was helping a MongoDB user with sharding one time. His chunks weren’t splitting and I was trying to diagnose the issue. His shard key looked reasonable, he didn’t have any errors in his log, and manually splitting the chunks worked. Finally, I looked at how much data he was storing: only a few MB per chunk. “Oh, I see the problem,” I told him. “It looks like your chunks are too small to split, you just need more data.”

“No, my data is huge, enormous,” he said.

“Um, okay. If you keep inserting data, it should split.”

“This is a bug. My data is big.”

We argued back and forth a bit, but I managed to back off from having called his data small and convince him it wasn’t a bug. That day, I learned that people take their data size very personally.

Posts that an algorithm decided were similar to this one:

Published at DZone with permission of Kristina Chodorow, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)