How MySQL Can Finally Kick Some MongoDB ****
If you have followed, or have tried to follow, my different attempts at
getting Key-Value Store, in this case represented by MongoDB, from MySQL
on a single machine with all data in RAM. I have not been very
successful so far, but many smart people, way smarter than yours truly,
has been giving me suggestions for things to try to get MySQL closer to
the performance of MongoDB, where MongoDB did some 110 k row reads per
second, whereas MySQL was at best reading some 43 k rows per second
(using the HEAP / MEMORY storage engine) and 46 k row reads per second
(using NDB and without CLIENT_COMPRESS). Note that not all combinations
has been tested, so it would be reasonably safe to assume that using the
HEAP / MEMORY storage engine and excluding the memory storage engine,
would be even faster than the 43 k rows when using CLIENT_COMPRESS.
As I could see that the CPU load on mysqld was very high, and as everything is in memory and hence there is no disk I/O, then we were probably CPU bound here. So bypassing mysqld when using MySQL Cluster, directly accessing NDBAPI did get me much better performance, some 90 k row reads, but that is still behind MongoDB.
Among the suggestions of what the bottleneck in MySQL was, was Parsing and optimization. Also, the question of how big the client / server overhead was. I have already figured out that using the compress client protocol actually limited performance quite a bit here, which might not be that much of a surprise, but that it affected performance by as much as 25 - 30 % as I found was surprising.
Anyway, the easiest way to see how much the client / server overhead was, was to run with libmysqld, the MySQL Embedded Library. Using libmysqld required some changes in my benchmark program, as I wanted to make sure that data was in memory before starting the test. Having done that I then commenced to test with using InnoDB in libmysqld. And hey, now we see some very different results! 115 k row reads per second! Yes, faster than MongoDB!
The documentation for libmysqld claims that the embedded InnoDB engine is not reentrant, but I run multiple connections here (in multiple threads), and it seems to work just fine. But this might be a mistake.
If this is truly so, I'm not 100% sure about these results yet, I'd really like to see someone else testing something similar with "normal" mysqld vs. libmysqld, then we know a few things:
Another conclusion is that there would be a lot to potentially gain from improving the client / server protocol here. And some of you out there might say that my testcase is a bit extreme, having all data in RAM and all that, but I am not so sure that is the case, if you are running in a cloud, getting a lot of RAM is not that hard, but getting real fast disk-I/O is, at least right now and at reasonable price, i.e. adding RAM is less expensive than adding more powerful I/O.
The next cool thing to try then: Using libmysqld with NDB, an option has has been around for a while now, but I have not seen it documented much, which is strange as libmysqld really should server NDB much better than many other storage engines (as the mysqld / libmysqld component, when using MySQL Cluster, is largely stateless).
/Karlsson
Once again playing with libmysqld, and still thinking this is a "forgotten" technology with some interesting aspects to it.
Published at DZone with permission of Anders Karlsson, author and DZone MVB. (source)As I could see that the CPU load on mysqld was very high, and as everything is in memory and hence there is no disk I/O, then we were probably CPU bound here. So bypassing mysqld when using MySQL Cluster, directly accessing NDBAPI did get me much better performance, some 90 k row reads, but that is still behind MongoDB.
Among the suggestions of what the bottleneck in MySQL was, was Parsing and optimization. Also, the question of how big the client / server overhead was. I have already figured out that using the compress client protocol actually limited performance quite a bit here, which might not be that much of a surprise, but that it affected performance by as much as 25 - 30 % as I found was surprising.
Anyway, the easiest way to see how much the client / server overhead was, was to run with libmysqld, the MySQL Embedded Library. Using libmysqld required some changes in my benchmark program, as I wanted to make sure that data was in memory before starting the test. Having done that I then commenced to test with using InnoDB in libmysqld. And hey, now we see some very different results! 115 k row reads per second! Yes, faster than MongoDB!
The documentation for libmysqld claims that the embedded InnoDB engine is not reentrant, but I run multiple connections here (in multiple threads), and it seems to work just fine. But this might be a mistake.
If this is truly so, I'm not 100% sure about these results yet, I'd really like to see someone else testing something similar with "normal" mysqld vs. libmysqld, then we know a few things:
- The overhead of MySQL parsing and optimization is not even close to as high as much as one might think (yes, I was also thinking that all this CPU overhead might be due to SQL parsing etc).
- There is something in the MySQL Client / Server protocol that one should look at (note that I was testing with Unix Domain sockets, and also with the normal TCP/IP network, and sockets was faster but not much. So this is not due to some network issues on my machine, and also the same network was using for testing with MongoDB).
Another conclusion is that there would be a lot to potentially gain from improving the client / server protocol here. And some of you out there might say that my testcase is a bit extreme, having all data in RAM and all that, but I am not so sure that is the case, if you are running in a cloud, getting a lot of RAM is not that hard, but getting real fast disk-I/O is, at least right now and at reasonable price, i.e. adding RAM is less expensive than adding more powerful I/O.
The next cool thing to try then: Using libmysqld with NDB, an option has has been around for a while now, but I have not seen it documented much, which is strange as libmysqld really should server NDB much better than many other storage engines (as the mysqld / libmysqld component, when using MySQL Cluster, is largely stateless).
/Karlsson
Once again playing with libmysqld, and still thinking this is a "forgotten" technology with some interesting aspects to it.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





