NoSQL Zone is brought to you in partnership with:

I am a software architect passionate about software integration, high scalability and concurrency challenges. Vlad is a DZone MVB and is not an employee of DZone and has posted 64 posts at DZone. You can read more from them at their website. View Full User Profile

A Beginner’s Guide to MongoDB Performance Turbocharging

01.20.2014
| 10044 views |
  • submit to reddit

Introduction

This is the second part of our MongoDB time series tutorial, and this post will be dedicated to performance tuning. In my previous post, I introduced you into our virtual project requirements.

In short we have 50M time events, spanning from the 1st of January 2012 to the 1st of January 2013, with the following structure:

{
    "_id" : ObjectId("52cb898bed4bd6c24ae06a9e"),
    "created_on" : ISODate("2012-11-02T01:23:54.010Z")
    "value" : 0.19186609564349055
}

We’d like to aggregate the minimum, the maximum, and the average value as well as the entries count for the following discrete time samples:

  1. all seconds in a minute
  2. all minutes in an hour
  3. all hours in a day

This is how our base test script looks like:

var testFromDates = [
    new Date(Date.UTC(2012, 5, 10, 11, 25, 59)),
    new Date(Date.UTC(2012, 7, 23, 2, 15, 07)),
    new Date(Date.UTC(2012, 9, 25, 7, 18, 46)),
    new Date(Date.UTC(2012, 1, 27, 18, 45, 23)),
    new Date(Date.UTC(2012, 11, 12, 14, 59, 13))
];
 
function testFromDatesAggregation(matchDeltaMillis, groupDeltaMillis, type, enablePrintResult) {
    var aggregationTotalDuration = 0;
    var aggregationAndFetchTotalDuration = 0;
    testFromDates.forEach(function(testFromDate) { 
        var timeInterval = calibrateTimeInterval(testFromDate, matchDeltaMillis);
        var fromDate = timeInterval.fromDate;
        var toDate = timeInterval.toDate;
        var duration = aggregateData(fromDate, toDate, groupDeltaMillis, enablePrintResult);
        aggregationTotalDuration += duration.aggregationDuration;
        aggregationAndFetchTotalDuration += duration.aggregationAndFetchDuration;      
    });
    print(type + " aggregation took:" + aggregationTotalDuration/testFromDates.length + "s");
    if(enablePrintResult) {
        print(type + " aggregation and fetch took:" + aggregationAndFetchTotalDuration/testFromDates.length + "s");
    }
}

And this is how we are going to test our three use cases:

testFromDatesAggregation(ONE_MINUTE_MILLIS, ONE_SECOND_MILLIS, 'One minute seconds');
testFromDatesAggregation(ONE_HOUR_MILLIS, ONE_MINUTE_MILLIS, 'One hour minutes');
testFromDatesAggregation(ONE_DAY_MILLIS, ONE_HOUR_MILLIS, 'One year days');

We are using five start timestamps and these are used to calculate the current in-testing time interval by the given time granularity.

The first time stamp (e.g. T1) is Sun Jun 10 2012 14:25:59 GMT+0300 (GTB Daylight Time) and the associated in-testing time intervals are:

  1. all seconds in a minute:
    [ Sun Jun 10 2012 14:25:00 GMT+0300 (GTB Daylight Time)
    , Sun Jun 10 2012 14:26:00 GMT+0300 (GTB Daylight Time) )
  2. all minutes in an hour:
    [ Sun Jun 10 2012 14:00:00 GMT+0300 (GTB Daylight Time)
    , Sun Jun 10 2012 15:00:00 GMT+0300 (GTB Daylight Time) )
  3. all hours in a day:
    [ Sun Jun 10 2012 03:00:00 GMT+0300 (GTB Daylight Time)
    , Mon Jun 11 2012 03:00:00 GMT+0300 (GTB Daylight Time) )

Cold database testing

The first tests are going to be run on a freshly started MongoDB instance. So between each test we are going to restart the database, so no index gets pre-loaded.

Type seconds in a minute minutes in an hour hours in a day
T1 0.02s 0.097s 1.771s
T2 0.01s 0.089s 1.366s
T3 0.02s 0.089s 1.216s
T4 0.01s 0.084s 1.135s
T4 0.02s 0.082s 1.078s
Average 0.016s 0.088s 1.3132s

We are going to use these results as a reference for the following optimization techniques I’m going to present you.

Warm database testing

Warming-up indexes and data is a common technique, being used for both SQL and NoSQL database management systems. MongoDB offers the touch command for this purpose. But this is no magic wand, you don’t blindly use it in the hope of leaving all your performance problems behind. Misuse it and your database performance will drastically drop, so be sure you understand your data and its usage.

The touch command let us specify what we want to preload:

  • data
  • indexes
  • both data and indexes

We need to analyze our data size and how we are going to query it, to get the best of data preloading.

Published at DZone with permission of Vlad Mihalcea, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)