The Doom of Multiple Storage Engines
One of the big “Selling Points” of MySQL is support for Multiple
Storage engines, and from the glance view it is indeed great to provide
users with same top level SQL interface allowing them to store their
data many different way. As nice as it sounds the in theory this
benefit comes at very significant cost in performance, operational and
development complexity. Let me touch just on the few complications it
brings in
Conversions – Each storage engine has its own data
storage format so MySQL has to do a lot of copying and conversions while
accessing the data, which significantly limits performance compared to
basically “zero copy” design one may have streaming data from memory
when it fits there.
Optimizer and Execution Storage engines are not
created equal especially if you look at In Memory storage engines vs
Distributed ones. Trying to handle all different cases in Optimizer –
such as in memory nature of MEMORY storage engine, clustering by Primary
key in Innodb and distributed nature of NDB makes it more complicated
when it should be and still because it has to serve so many use cases it
is not tuned to exploit full performance potential of any given storage
engine.
Synchronization The top level on MySQL side, such
as .frm files and binary log files has to be synchronized with storage
engine transactions which require a lot of complications and performance
overhead. MySQL has to do several fsync() calls per transaction commit
to be on the safe side and still number of operations are not fully
safe – you can for example get .frm out of sync with internal data
dictionary if you crash at wrong time.
Transaction Support We have to deal with mix of
locks on high level as well as on storage engine internally. We have to
deal with different locking strategies storage engines may have and if
we venture into using cross storage engine transactions we’re up for a
lot of surprises.
Backup Cross storage engine backup is a pain
because they are so different. Some of them are distributed and some in
memory so even OS level approaches such as LVM backup may not work in
all cases. We also get into synchronization aspects making backup which
prevents from having hot backup in many cases.
Replication Support for different storage engines
is a pain in replication. To start we have to write second binary log
file when all transaction engines have their own !
Quality Assurance Multiple storage engines are
much more complicated to test. And I believe multi-storage engine use is
really not tested very deeply. I’ve seen some tests being run for
different storage engines, but really this is not enough as MySQL allows
you using multiple storage engines in the query you need to test all
kinds of combination such as running multi-table update joining PBXT and
Innodb tables.
Operations Things like backup and monitoring
becomes a lot more complicated plus you have to deal with balancing –
allocating resources to different storage engines and performance tuning
becomes more complicated.
Now what is interesting for probably 95% of applications single storage engine would be good enough. In fact people already do not love to mix multiple storage engines very actively because of potential complications involved.
Now lets think what we could have if we have a version of MySQL Server which drops everything but Innodb Storage engine (it could be NDB, PBXT etc, it does not matter).
We could save a lot of CPU cycles by having storage format same as processing format. We could tune Optimizer to handle Innodb specifics well. We could get rid of SQL level table locks and using Innodb internal data dictionary instead of Innodb files. We would use Innodb transactional log for replication (which could be extended a bit for this purpose). Finally backup can be done in truly hot way without nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql” database any more. Single Storage Engine server would be also a lot easier to test and operate.
This also would not mean one has to give up flexibility completely, for example one can imagine having Innodb tables which do not log the changes, hence being faster for update operations. One could also lock them in memory to ensure predictable in memory performance.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





Comments
Nicolas Jouve replied on Mon, 2010/05/10 - 11:20am
I do agree completely with this.
And infact, I even think that "multiple storage engines" is not a selling point but has been instead a big marketing error , as the image people have of MySQL is very confused due to the fact that people do not necessarily talk about the "same MySQL".
Michael Eric replied on Wed, 2012/01/25 - 11:46am
I think this is exactly the reasoning why PostgreSQL only has one storage engine to rule them all.
It seems like the Drizzle project is trying to address a lot of these types of performance issues, but they are keeping support for pluggable storage engines (MyISAM notably absent) and even extending it. I wonder what their take on those issues is, or if they are able to get around some of them (perhaps by rigidly defining how a storage engine must behave/store data to avoid the copying issues?). If anyone has some insight I’d love to hear it.