NoSQL Zone is brought to you in partnership with:

Mitch Pronschinske is the Lead Research Analyst at DZone. Researching and compiling content for DZone's research guides is his primary job. He likes to make his own ringtones, watches cartoons/anime, enjoys card and board games, and plays the accordion. Mitch is a DZone Zone Leader and has posted 2578 posts at DZone. You can read more from them at their website. View Full User Profile

Basic Mistakes in Database Testing

  • submit to reddit

There is a huge number of books about TDD. Those books usually concentrate on using test for units of work. Units of work are understood in many different ways, usually it means a class. So the books say: write a lot of tests, make code in such a way the the tests pass. All external resources should be mocked so you can test only this unit.

This article is authored by Szymon Guz

That’s cool, but unfortunately all the testing stops at this moment. There are some queries (written by hand or generated by some ORMs) but that is not tested, usually. Some programmers test those using integration tests – tests which connect to a real database and perform real queries. This usually means testing the happy path – I’ve got my ORM so it takes care of everything, I don’t need to think.

Database is usually the most valuable asset a company own. Applications are rewritten from time to time. Old applications are thrown away, new applications are installed. Nobody throws away the database full of data when changing the applications. The database is carefully migrated. There are systems where many different applications use the same database at the same time. That’s why it is important to have good database model full of constraints and why database should be treated with care. You really don’t want to destroy the data consistency as this could cost your company too much.

This article is about often forgotten database testing. Integration tests using real databases. It really doesn’t matter what kind of database engine you are using. You can use PostgreSQL, MySQL, Oracle or even those funny noSQL databases like MongoDB. Below rules can be applied to all kinds of databases and all kinds of applications. Maybe not all of them, as noSQL databases cannot enforce data integrity.

Your application usually consists of many different parts. There is some <put here any language you like> code, some configuration files, some SQL queries, some external systems. Testing an application means testing each part separately (only because that makes finding bugs easier) and testing how all the parts cooperate. Database is one of suchs parts, and you should test it thoroughly.

Not Testing Database

The first and most terrible mistake. Not testing database at all. You write some code using database. You even make unit tests for those classes using some mocked database connection.

What about integration tests? Tests should test the application using production environment. The only idea behind tests is to be sure that application works after deploying to production. If you don’t test it on production database, you don’t really know if it works. Your mocked connection allows you to send any query you won’t and doesn’t check it. It just returns the data you want.

Not creating integration tests mean you don’t really test your application.

Not Testing Database Schema

Most teams I’ve observed used to have some kind of integrations tests. Usually happy path tests: there is an ORM, we persist object, ORM does the work, that’s cool, I don’t need to think.

I’ve never seen a team which tests the database schema. Imagine you need to create an index on the database because on production some query is slow. You want to have this index and be sure that next time you deploy that application on new client’s environment, the index is there. Why not write a simple test to check the index existence?

There are much more things to test beside indexes:

  • primary keys
  • foreign keys
  • checks – to be sure that column ‘price’ don’t have negative values
  • uniqueness of some columns – you really don’t want to have two users with the same login

Tests Don’t Use Production Engine

When you develop an application, you can choose from a wide collection of databases. Usually you choose among what is best, what the team knows best, or it is chosen by management (sometimes using some strange reasoning). Sometimes the applications use different database engines at the same time. Sometimes applications are prepared to use different engines, so customer, who buys the application, can choose database he wants.

The chosen database engine really doesn’t matter for making goods tests.

Programmers are lazy, so they want their tests to run very fast. They don’t want to wait for the results for too long. That’s why many teams use some faster in memory databases like HSQLDB. They are much faster because they are stored only in RAM memory and don’t touch any drives while operating.

There is one more rule often forgotten:

Tests should use the same database engine that is used on production.

Many programmers use some other engine. Common explanaition is simple: “Our database is too slow, we should use some in memory database engine.”. This is not a good idea. This way you test the other engine, not your production one, so in fact you don’t have your application tested.

I had this problem once. We had to optimize queries by settings a session variable after connecting to the database. That application used Oracle only on production. After setting this variable the tests failed. All the tests. It turned out that I cannot set this variable in HSQLDB, as it simply didn’t exist. So I had to write a terrible code: check the database engine after connection and set the variable or not.

Even if you don’t have any problems with mixing engines, remember that when you test on other database engine than you have on production, you just don’t have the application tested at all.

Tests Don’t Prepare Database

Testing has a common, well defined flow. It is very simple:

  1. Prepare environment
  2. Run a test
  3. Check test result
  4. Go back to the first point

Try do something opposite and it can hurt you.

Can you see that there is no tidying after the test?

This is very important: tests need to prepare environment before, not after the test. You cannot be sure that the test will be able to clean everything. Application could exit with an error, network connection could broke, or application could blow (e.g. with Out Of Memory Exception). It shouldn’t matter how the test ends, it really matters that the test is running with the same environment which is prepared for each test.

I’ve this mistake once: there were plenty of integration tests which were cleaning all the changes after each test. Many programmers were running those tests using debugger and stop it in the middle after spotting a bug. Any other test run after that had unpredictable and random result. It was running on environment changed by the previous test and wasn’t clearing the whole environment for itself.

Tests Prepare Database, And Don’t Check That

In the previous part I wrote a lot about preparing a database. I want to add one more thing. Preparing the database is not enough. When you prepare the database by clearing some tables, loading fixtures and so on… there is one thing left.

Check the database state after preparing.

You really need to be sure that you’ve prepared all that correctly. This could save your time later, when there is a bug caused by some data that was left and wasn’t cleared.

This should be a part of the pre-test database preparation.

Not Testing Creation Scripts

Each application needs to have some kind of installation procedure. There is always the first time you deploy the database.

Programmers tend to change database by hand usind some adhoc DDL queries. They don’t write them down later, or forget about that. They don’t update the installation scripts. Most teams don’t use versioning scripts (e.g. migrations in Ruby on Rails, or Liquibase in the Java world).

The best way to test that is to recreate the database before running test suite. You don’t have to do that before each test starts. Just run it once, before running all tests.

Yes, better be safe than sorry.

Not Testing Foreign Keys

Foreign keys are one of the basic way of providing database consistency. In a good relational database schema you should have the keys. If you don’t have, well, it could be a sign that you have a really big problem. However this depends on the data model, but usually the lack of foreighn keys is a smell of really bad design.

Testing foreign keys is simple. Just add some rows to a table without adding appropriate rows in referenced table first. You should get an error. Then you should delete rows from the referenced table, you could get error, or not (that depends on the key definition). You should check the expected behaviour whatever it is.

Not Testing Default Values

In a good database design you should define some sensible default values. Usually those default values could be nulls. Those should be tested as well. You cannot assume that only your application will be changing data in this database.
Two questions:

  • What if somebody wants to create a quick fix and update some data using adhoc SQL queries?
  • What if some day someone starts another application which can change the data and the new application won’t be using your ORM mappings or your DAO classes?

If you have stupid default values, or wrong default values, you could destroy the data and that could be even worse than a simple application bug.

Not Testing Constraints

There are more constraints in a database than just primary and foreign keys. You could have columns which are unique or not null. You can constraint column to only small set of values. You could want to be sure that the price will never be lower than 0.

Good database schema should have many constraints. You should test them too. If you want your price column to have only positive values, what happens when you try to insert there $-1? Why not test that?

You cannot assume that only your well tested application will be using the data, and those checks are your last line of defense against those bugs. Why not test if it works correctly?

Multiple Tests Can Change the Same Database

Usually databases tests change the database. You can run multiple test symultaneously, but you have to be sure that none of the tests influence others. You must be sure that if one test writes something to the database, another test won’t read that.

Usually it is too easy to mess that up, so my small advice: avoid running multiple tests at the same time. This also means that you shouldn’t run the same test suites from multiple machines.

When you have many developers, who want to run tests, each of them should have separate database, where tests can write. If you have some kind of read only database, that’s OK, multiple machines can use such a database at the same time. But if you allow for a situation where all programmers are using the same database for testing, you can have really unpredictable test results.

What a programmer do when he spots an error in a test? Well, good programmer tries to fix that. If the test fails only because of another programmer who was running his tests on the same database, this is just a waste of programmer’s time to fix them.

There is No Big Red Button

Good programmers are lazy. Good programmers are more and more frustrated if you order them to repeat the same task all the time. Good programmers automate repeatable things.

In each project you have to deploy something on testing environment. How much time does it take? Do you really want to waste your programmer time for redeploying application and loading databases all the time?

That’s why each project should have the Big Red Button. The button a programmer can push, make some coffee, get back to work and after a couple of minutes get to know that the Big Red Button finished the job and everything is ready.

The Big Red Button is a really time saver. You may claim that it is too slow to automate all of that. Well, in fact it is not. It is quite opposite. It is like claiming that TDD is slow. It is slower at the very beginning but the more complicated the project becomes the more time you save because of existing tests, or Buttons. All sort of Big Red Buttons – you could use them to deploy applications, run tests and all other staff like that.

Sometimes Jenkins (aka Hudson) is used for that. Yes, this is a great software for such Big Red Button stuff. The only thing is that each programmer should has his own set of jobs to deploy all the stuff on his own environment where he (or she, of course) can play with that not influencing others, and not being influenced as well.


There are many tools for database testing. You could write simple integration tests for testing the whole schema. For PostgreSQL there is pgTAP, which can be integrated with Jenkins using the TAP plugin, so Jenkins can have a job which tests if a database (including the production database) is OK.



Ant Kutschera replied on Mon, 2011/11/28 - 1:33am

Definitely run integration tests against the same version as your prod db. JPA QL might work with a stub/mock, but that's no guarantee the sql generated will work. And MySQL is definitely different to Oracle (sequences, for eg). Good article. Good tip with Jenkins too.

Andrew Spencer replied on Mon, 2011/11/28 - 4:09am

Oh, my goodness. I (and colleagues) have been burnt by so many of the things on your list.  I think I shall print out a list of the headings and pin it up by my desk.

Truly excellent article.  The material would also make an excellent presentation.

There's one thing I'm not convinced of, which is testing whether foreign keys / default values / constraints are working.  Assuming you've put them in the schema in the first place, I'm inclined to trust the database to apply them properly.  What I don't trust is my own ability to remember to put them into the schema in the first place. 

Consequently, I agree that the schema definition should be checked in some way. But how do you assert the correctness of DDL?  I don't mean the syntactic correctness (which can be checked by simply executing it), but the semantic correctness.  Should column BLABLA.XYZ be constrained to the values 'X', 'Y' and 'Z'?  Should PATIENT_ID be a foreign key to the PERSON table? Etc.  These are things that no automatic test can know -- unless you tell it, in which case you're simply repeating the information that's already in the DDL.  Such a test tells you whether the test and the DDL agree with each other, but it doesn't tell you whether the both of them agree with you.

The approach I favour is: have a single reference schema, which is inspected for correctness by a human. Then, as part of the automatic build, there should be a check that will fail the build if there are any differences between the target schema and the expected schema.  This is the direction we're taking on my current team, though we've still got some way to go.

 I'd be interested to hear others' opinions on this approach.

Marek Dec replied on Mon, 2011/11/28 - 9:48am

Thanks for this great article, Mitchell.

I was just thinking of a small modification to the following guideline:


Even if you don’t have any problems with mixing engines, remember that when you test on other database engine than you have on production, you just don’t have the application tested at all.


Say you have no problems with mixing the DB engines (e.g. you use plain JPA). Do you think it's a good idea to have your build tool configured in such a way that it performs a quick versions of the DB tests on a in-memory database on the developers machine, and the longer version of the DB tests on the production database within some CI build? 

Actually I was always wondering if it is a good idea to have to different versions of the automated testcase sets: a developer version and the CI version.
What is your opinion on that?


Mitch Pronschinske replied on Mon, 2011/11/28 - 3:46pm in response to: Marek Dec

Glad you liked it Marek!  I posted it with the permission of Szymon Guz, who writes at

Szymon Guz replied on Mon, 2011/11/28 - 4:37pm in response to: Marek Dec

Hi Marek,
thanks for the question.

My main point for the whole article is that you should fully test the application before deploying it on production. Those tests should be done on environment exactly the same as on production. Otherwise you cannot be sure that, after deploying the application, it will work correctly (correctly means here that all tests pass).

Sometimes you can have no problems with mixing the database engines you use. JPA or Hibernate is a good example of what you can achieve. However there is one small problem: sometimes there can be bugs and differences in the drivers, so in fact switching from one database engine to another could make some tests fail, and it can hurt.

If you want to use different engine for fast testing and production one for slower testing, then you could have another problem. Even Hibernate documentation states that there is a small number of queries which have to be done using SQL. If you have two engines - you can have two different sets of SQL queries to take care of.

When you have two different engines, and you want to use some powerful features from the production one, how do you want to simulate those using that simple fast engine? Do you want to create two different test suites for two different engines? Do you want to test one set of application features for fast testing and other set for production one? While refactoring, would you tell your programmers to maintain both sets?

I think the best solution is to have one database engine and use all its unique powerful features. Just use the production engine.

If your tests are too slow, just make them faster. Making them faster doesn't mean that you should use different database. If your application is too slow for testing, will you tell your programmers to write parts of that in another language just to test them? Do you want to write tests which don't use features used on production?

I'd rather make two different sets of tests: Unit Tests and Integration Tests. Unit tests should be fast, really fast and use some mocked database connection. Integration Tests will be slower, so they should be run only after the Unit Tests pass. The Integration Tests should use the production database engine. This is a nice idea for another story.

Andrew Thompson replied on Mon, 2011/11/28 - 5:58pm

Good article. Just one comment. I'm not sure I agree that creating tests to check for the presence of indexes/keys etc is worth the effort. I tend to use tools like liquibase to integrate database changes into the build - and then disallow anybody from making any changes that aren't in that build. That way I'm comfortable not testing for things like the presence of an index / keys etc in the same way that I don't need to write a test to verify that class X exists.

Nirmal Upadhyay replied on Mon, 2011/12/12 - 2:31am

Hello Mitchell,

Great article. First of all I would like to congratulate for writting such a wonderful article. Everything is so simple and nicely explained. I will definately take print out of this one and pass it to the testers and developers of my company.

It can help to build a strong db design for such applications that heavily rely on database operations and those having huge and real time data. Its also a great material from which a nice presentation can be built.

Thanks once again for this article. I would love to hear from you that how did you inspired yourself for writing such a  good article.

Sincerely Thanking you



Aniceto Garcia replied on Thu, 2012/01/26 - 4:38am

Good stuff. There are many mistakes we make while building or using database in our application. I think all the focus should be on normalizing the database. Either it is in 3rd, 4th on what normal form.

All the relations must be defined properly and offcourse schema must be tested.

Douglas Maxwell replied on Wed, 2012/10/31 - 1:40pm


So I have a query that runs a little slow. There are several ways I might optimize it. One might be adding an index another might be partitioning the table or whatever. I go down the index route and  test for the presence of the index. Later it becomes apparent that there is a better optimization and the index is removed .Cue broken tests.

If performance is the issue the tests should test performance. Once they start testing implementation details the tests become fragile and limit your ability to change the system. It is not good practice to test down to the smallest detail of the implementation.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.