NoSQL Zone is brought to you in partnership with:

Mike is co-founder of Fiesta - https://fiesta.cc. Before Fiesta, he was a developer working on the MongoDB project. Mike is a DZone MVB and is not an employee of DZone and has posted 14 posts at DZone. You can read more from them at their website. View Full User Profile

MongoDB Performance Tuning and Scalability

12.27.2011
| 10829 views |
  • submit to reddit
This post was a live blog from the recent MongoSV conference.  Here’s a link to the entire series of posts.

Kenny is getting started, talking about performance tuning based on experience at Shutterfly. They have 8 MongoDB clusters in production with ~30 servers. Not cloud based: all own hardware and datacenters.

MongoDB performance tuning is similar to traditional RDBMS tuning. Looking at queries, indexes, etc. If performance isn’t good on a single server than don’t look to sharding, reading from replicas, etc. Single server performance is critical.

Modeling is key. Schema design can be really important for performance (recommends talks later on by Eliot & Kyle).

Know when to stop tuning: prioritize what is important/adequate for the business/application. What needs to be fast? Build tuning into dev. lifecycle, don’t wait until there’s an issue. Tuning is “personal”: need to know your problem/domain.

MongoDB is really fast when read only, writes start to impact performance. Important consideration during design phase.

The profiler. Writes to db.system.profile collection. Recommendation is to turn it on and leave it on: low overhead. Look for full scans (nreturned vs nscanned) and updates (ideally you want fastmod - in place updates. Look for moved & key updates).

Should graph response times over time (from the system.profile collection). Shows performance over time of db. To look at the profiling data just do `show profile` from the shell.

Showing examples of data from the profiler: here’s an example where nscanned is 10000 and nreturned is 1: we need an index! Another example where need to move the document due to an update (keyword “moved” in the profile doc.). Now showing an example using $inc - you’ll see “fastmod” in the profile document - that’s good!

Now talking about explain(). Use during development, don’t wait. This actually runs the query when you call it. When you find a bad op using the profiler, run explain on it to get more info: shows index usage, yields, covered indexes, nscanned vs nreturned. Another recommendation: run explain() twice to see difference when data is in memory. Showing the difference between a query w/ and w/o an index in terms of explain.

Now talking about covered indexes: need to do a projection that says we don’t need _id: `db.test.find({userid: 10}, {_id: 0, userid: 1})`. When you don’t need _id it’s possible to respond to the query using the index only.

Architecture tips: split on functional areas first to different replica set clusters, then worry about sharding those (possibly). Do reads off of slaves when you can, but be sure your app can handle inconsistent reads first. Also, use slaves for maintenance (index compaction, etc.). Move reports & backups to slaves, too. One mongod instance per machine: keeps things simple for introspection.

Emphasizing the importance of minimizing writes.

Now we’re talking about data locality. When you’re doing a query it’s best if the results are as dense as possible (as few blocks on disk). How do you maintain this? Here’s an example of how to see this: need to include `$diskLoc` in your query document, and finish with a `.showDiskLoc()` (analogous to `.explain()`).

Total performance is a function of write performance. Keep an eye on lock % and queue size: how much is the DB waiting for writes. A trick (for pre 2.0 when data > RAM) is to do read before write: spend more time in read lock rather than write lock. Tune for fastmod’s: reduce moves (maybe by pre-padding documents). Evaluate indexes for key changes, minimize # of indexes if unused. Look for places to do inserts instead of updates.

What about scaling reads? They scale easily if writes are tuned. Identify reads that can be performed on slaves. Make sure you have enough RAM for indexes - can check the mongostat “faults” column for cache misses. Minimize I/O per query (back to data locality).

Tools: mongostat (look for faults & lock % / queue lengeth). currentOp() to see what’s waiting. mtop to get a picture of current session level information. iostat to see how much physical I/O is going on. Do load testing before going live. Use MMS (or some other monitoring system).

What if you still need more performance after doing all of this tuning? One option is to use SSDs. Shutterfly uses Facebook’s flashcache: kernel module to cache data on SSD. Designed for MySQL/InnoDB. SSD in front of a disk, but exposed as a single mount point. This only makes sense when you have lots of physical I/O. Shutterfly saw a speedup of 500% w/ flashcache. A benefit is that you can delay sharding: less complexity.

Source:  http://blog.fiesta.cc/post/13976616772/mongosv-live-blog-performance-tuning-and-scalability

Published at DZone with permission of Mike Dirolf, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Robert Craft replied on Tue, 2011/12/27 - 2:13pm

Reading this I am quite positively amazed. Sometimes showing other competing projects down using their drawbacks is a low hanging and dirty fruit often used by several people and projects. This comparison is maybe diplomatic for critics but to me it beautifully describes what each platform is capable of so we can make an informed decision (provided you look for updated documentation on several points). Instead of one sided "we are better than the rest and we know better" approach many have taken today, this is a refreshing welcome.

Mitch Pronschinske replied on Thu, 2011/12/29 - 10:36am in response to: Robert Craft

I agree, Robert.  There's not enough attention given to those who write level-headed posts and too much given to the more inflammatory posts these days.  I was interested in talking about some topics with you so I sent an email to your account's email.  You can also send me an email at mpron[at]dzone[dot]com if you don't get the one I sent.

-Mitch Pronschinske

DZone Senior Curator

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.