NoSQL Zone is brought to you in partnership with:

Antoine works as a Technical Manager at MongoDB Inc, helping companies architect systems with MongoDB. Previously he worked as a kernel engineer on MongoDB core and the Java driver. He also has spent many years in the CDN industry, at Panther CDN then CDNetworks, designing and developing one of the largest and fastest Content Delivery and Application Acceleration Network. Prior work include the development of a new network protocol in Linux kernel to multiplex several interfaces, used in wireless devices. Antoine received a MS degree in Computer Science from Stevens Institute (Hoboken), and a BS in Computer Science from Epita (Paris France). Antoine is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

How to Use MongoDB as a Pure In-memory DB (Redis Style)

10.28.2013
| 16239 views |
  • submit to reddit

The Idea

There has been a growing interest in using MongoDB as an in-memory database, meaning that the data is not stored on disk at all. This can be super useful for applications like:

  • a write-heavy cache in front of a slower RDBMS system
  • embedded systems
  • PCI compliant systems where no data should be persisted
  • unit testing where the database should be light and easily cleaned

That would be really neat indeed if it was possible: one could leverage the advanced querying / indexing capabilities of MongoDB without hitting the disk. As you probably know the disk IO (especially random) is the system bottleneck in 99% of cases, and if you are writing data you cannot avoid hitting the disk.

One sweet design choice of MongoDB is that it uses memory-mapped files to handle access to data files on disk. This means that MongoDB does not know the difference between RAM and disk, it just accesses bytes at offsets in giant arrays representing files and the OS takes care of the rest! It is this design decision that allows MongoDB to run in RAM with no modification.

How it is done

This is all achieved by using a special type of filesystem called tmpfs. Linux will make it appear as a regular FS but it is entirely located in RAM (unless it is larger than RAM in which case it can swap, which can be useful!). I have 32GB RAM on this server, let’s create a 16GB tmpfs:

# mkdir /ramdata
# mount -t tmpfs -o size=16000M tmpfs /ramdata/
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvde1             5905712   4973924    871792  86% /
none                  15344936         0  15344936   0% /dev/shm
tmpfs                 16384000         0  16384000   0% /ramdata

Now let’s start MongoDB with the appropriate settings. smallfiles and noprealloc should be used to reduce the amount of RAM wasted, and will not affect performance since it’s all RAM based. nojournal should be used since it does not make sense to have a journal in this context!

dbpath=/ramdata
nojournal = true
smallFiles = true
noprealloc = true

After starting MongoDB, you will find that it works just fine and the files are as expected in the FS:

# mongo
MongoDB shell version: 2.3.2
connecting to: test
> db.test.insert({a:1})
> db.test.find()
{ "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 }

# ls -l /ramdata/
total 65684
-rw-------. 1 root root 16777216 Apr 30 15:52 local.0
-rw-------. 1 root root 16777216 Apr 30 15:52 local.ns
-rwxr-xr-x. 1 root root        5 Apr 30 15:52 mongod.lock
-rw-------. 1 root root 16777216 Apr 30 15:52 test.0
-rw-------. 1 root root 16777216 Apr 30 15:52 test.ns
drwxr-xr-x. 2 root root       40 Apr 30 15:52 _tmp

Now let’s add some data and make sure it behaves properly. We will create a 1KB document and add 4 million of them:

> str = ""

> aaa = "aaaaaaaaaa"
aaaaaaaaaa
> for (var i = 0; i < 100; ++i) { str += aaa; }

> for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});}
> db.foo.stats()
{
        "ns" : "test.foo",
        "count" : 4000000,
        "size" : 4544000160,
        "avgObjSize" : 1136.00004,
        "storageSize" : 5030768544,
        "numExtents" : 26,
        "nindexes" : 1,
        "lastExtentSize" : 536600560,
        "paddingFactor" : 1,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 129794000,
        "indexSizes" : {
                "_id_" : 129794000
        },
        "ok" : 1
}

The document average size is 1136 bytes and it takes up about 5GB of storage. The index on _id takes about 130MB. Now we need to verify something very important: is the data duplicated in RAM, existing both within MongoDB and the filesystem? Remember that MongoDB does not buffer any data within its own process, instead data is cached in the FS cache. Let’s drop the FS cache and see what is in RAM:

# echo 3 > /proc/sys/vm/drop_caches 
# free
             total       used       free     shared    buffers     cached
Mem:      30689876    6292780   24397096          0       1044    5817368
-/+ buffers/cache:     474368   30215508
Swap:            0          0          0

As you can see there is 6.3GB of used RAM of which 5.8GB is in FS cache (buffers). Why is there still 5.8GB of FS cache even after all caches were dropped?? The reason is that Linux is smart and it does not duplicate the pages between tmpfs and its cache… Bingo! That means your data exists with a single copy in RAM. Let’s access all documents and verify RAM usage is unchanged:

> db.foo.find().itcount()
4000000

# free
             total       used       free     shared    buffers     cached
Mem:      30689876    6327988   24361888          0       1324    5818012
-/+ buffers/cache:     508652   30181224
Swap:            0          0          0
# ls -l /ramdata/
total 5808780
-rw-------. 1 root root  16777216 Apr 30 15:52 local.0
-rw-------. 1 root root  16777216 Apr 30 15:52 local.ns
-rwxr-xr-x. 1 root root         5 Apr 30 15:52 mongod.lock
-rw-------. 1 root root  16777216 Apr 30 16:00 test.0
-rw-------. 1 root root  33554432 Apr 30 16:00 test.1
-rw-------. 1 root root 536608768 Apr 30 16:02 test.10
-rw-------. 1 root root 536608768 Apr 30 16:03 test.11
-rw-------. 1 root root 536608768 Apr 30 16:03 test.12
-rw-------. 1 root root 536608768 Apr 30 16:04 test.13
-rw-------. 1 root root 536608768 Apr 30 16:04 test.14
-rw-------. 1 root root  67108864 Apr 30 16:00 test.2
-rw-------. 1 root root 134217728 Apr 30 16:00 test.3
-rw-------. 1 root root 268435456 Apr 30 16:00 test.4
-rw-------. 1 root root 536608768 Apr 30 16:01 test.5
-rw-------. 1 root root 536608768 Apr 30 16:01 test.6
-rw-------. 1 root root 536608768 Apr 30 16:04 test.7
-rw-------. 1 root root 536608768 Apr 30 16:03 test.8
-rw-------. 1 root root 536608768 Apr 30 16:02 test.9
-rw-------. 1 root root  16777216 Apr 30 15:52 test.ns
drwxr-xr-x. 2 root root        40 Apr 30 16:04 _tmp
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvde1             5905712   4973960    871756  86% /
none                  15344936         0  15344936   0% /dev/shm
tmpfs                 16384000   5808780  10575220  36% /ramdata

And that verifies it! :)

What about replication?

You probably want to use replication since a server loses its RAM data upon reboot! Using a standard replica set you will get automatic failover and more read capacity. If a server is rebooted MongoDB will automatically rebuild its data by pulling it from another server in the same replica set (resync). This should be fast enough even in cases with a lot of data and indices since all operations are RAM only :)

It is important to remember that write operations get written to a special collection called oplog which resides in the local database and takes 5% of the volume by default. In my case the oplog would take 5% of 16GB which is 800MB. In doubt, it is safer to choose a fixed oplog size using the oplogSize option. If a secondary server is down for a longer time than the oplog contains, it will have to be resynced. To set it to 1GB, use:

oplogSize = 1000

What about sharding?

Now that you have all the querying capabilities of MongoDB, what if you want to implement a large service with it? Well you can use sharding freely to implement a large scalable in-memory store. Still the config servers (that contain the chunk distribution) should be disk based since their activity is small and rebuilding a cluster from scratch is not fun.

What to watch for

RAM is a scarce resource, and in this case you definitely want the entire data set to fit in RAM. Even though tmpfs can resort to swapping the performance would drop dramatically. To make best use of the RAM you should consider:

  • usePowerOf2Sizes option to normalize the storage buckets
  • run a compact command or resync the node periodically.
  • use a schema design that is fairly normalized (avoid large document growth)

Conclusion

Sweet, you can now use MongoDB and all its features as an in-memory RAM-only store! Its performance should be pretty impressive: during the test with a single thread / core I was achieving 20k writes per second, and it should scale linearly over the number of cores.










Published at DZone with permission of Antoine Girbal, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)