Big Data/Analytics Zone is brought to you in partnership with:

Ayende Rahien is working for Hibernating Rhinos LTD, a Israeli based company producing developer productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com) and more. Ayende is a DZone MVB and is not an employee of DZone and has posted 461 posts at DZone. You can read more from them at their website. View Full User Profile

Reviewing Lightning Memory-Mapped Database Library: Partial

07.12.2013
| 3439 views |
  • submit to reddit

Continuing in my quest to learn more, I decided to go over the LMDB codebase.

LMDB is:

LMDB is an ultra-fast, ultra-compact key-value data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases.

It has some interesting feature set, and a really small codebase. I am anxious to see how they managed to do so much.

Interestingly, the data model for LMDB is quite different from the usual append-only / transaction log. Instead, it allows only a single concurrent writer, and modify the data in place. There also appears to be a lot of dire warnings regarding usage of long transactions, since they would result in increased file size, presumably because the db couldn’t find the pages the scavenge because they are locked by ongoing transactions.

One thing that I should note already, the code (I am currently about 1/3 of the way of lmdb.h file) is very well commented, and it explains a lot about what is going on. If the rest of the code is like that, this is going to be really nice to read. Okay, I take it back. The main files seems to be lmdb.h and mdb.c. The first one is 1,300 lines and the second is over 7,500 lines. Admittedly, this is impressive in the sense that pretty much everything is done there, and there are a lot of docs. But damn, I wish this was better organized. Right now my head feels like I need to pop up and take a breath.

I read ~2,000 lines of code so far, and I haven’t found anything that does something. It is all headers or comments or macros up until now. I skipped over to the code, and it is really hard to understand what is going on.

Take this code snippet:

image

It shows a lot of the things that make it hard to work with. effectively random naming convention (under_score, Pascal_naming, SHOUT_NAMING, etc), the goto trick is used WAY too much, lovely variable names such as n2. This is part of a method that goes on for something like 150 – 200 lines. And it include the following code:

image

Quick, can you tell me how many variables are declared here? And note that they are all local variables. I counted it twice, once getting to 17 and once getting 18.

That is just too much, I am not going to go any deeper. The leveldb codebase was easy to follow, it had structure. This codebase is just a code dump. It might be a really good codebase, for what it needs to do, but I literally can’t follow it.

Published at DZone with permission of Ayende Rahien, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)