Basic Mistakes in Database Testing
There is a huge number of books about TDD. Those books usually
concentrate on using test for units of work. Units of work are
understood in many different ways, usually it means a class. So the
books say: write a lot of tests, make code in such a way the the
tests pass. All external resources should be mocked so you can test only
This article is authored by Szymon Guz
That’s cool, but unfortunately all the testing stops at this moment. There are some queries (written by hand or generated by some ORMs) but that is not tested, usually. Some programmers test those using integration tests – tests which connect to a real database and perform real queries. This usually means testing the happy path – I’ve got my ORM so it takes care of everything, I don’t need to think.
Database is usually the most valuable asset a company own. Applications are rewritten from time to time. Old applications are thrown away, new applications are installed. Nobody throws away the database full of data when changing the applications. The database is carefully migrated. There are systems where many different applications use the same database at the same time. That’s why it is important to have good database model full of constraints and why database should be treated with care. You really don’t want to destroy the data consistency as this could cost your company too much.
This article is about often forgotten database testing. Integration tests using real databases. It really doesn’t matter what kind of database engine you are using. You can use PostgreSQL, MySQL, Oracle or even those funny noSQL databases like MongoDB. Below rules can be applied to all kinds of databases and all kinds of applications. Maybe not all of them, as noSQL databases cannot enforce data integrity.
Your application usually consists of many different parts. There is some <put here any language you like> code, some configuration files, some SQL queries, some external systems. Testing an application means testing each part separately (only because that makes finding bugs easier) and testing how all the parts cooperate. Database is one of suchs parts, and you should test it thoroughly.
Not Testing Database
The first and most terrible mistake. Not testing database at all. You write some code using database. You even make unit tests for those classes using some mocked database connection.
What about integration tests? Tests should test the application using production environment. The only idea behind tests is to be sure that application works after deploying to production. If you don’t test it on production database, you don’t really know if it works. Your mocked connection allows you to send any query you won’t and doesn’t check it. It just returns the data you want.
Not creating integration tests mean you don’t really test your application.
Not Testing Database Schema
Most teams I’ve observed used to have some kind of integrations tests. Usually happy path tests: there is an ORM, we persist object, ORM does the work, that’s cool, I don’t need to think.
I’ve never seen a team which tests the database schema. Imagine you need to create an index on the database because on production some query is slow. You want to have this index and be sure that next time you deploy that application on new client’s environment, the index is there. Why not write a simple test to check the index existence?
There are much more things to test beside indexes:
- primary keys
- foreign keys
- checks – to be sure that column ‘price’ don’t have negative values
- uniqueness of some columns – you really don’t want to have two users with the same login
Tests Don’t Use Production Engine
When you develop an application, you can choose from a wide collection of databases. Usually you choose among what is best, what the team knows best, or it is chosen by management (sometimes using some strange reasoning). Sometimes the applications use different database engines at the same time. Sometimes applications are prepared to use different engines, so customer, who buys the application, can choose database he wants.
The chosen database engine really doesn’t matter for making goods tests.
Programmers are lazy, so they want their tests to run very fast. They don’t want to wait for the results for too long. That’s why many teams use some faster in memory databases like HSQLDB. They are much faster because they are stored only in RAM memory and don’t touch any drives while operating.
There is one more rule often forgotten:
Tests should use the same database engine that is used on production.
Many programmers use some other engine. Common explanaition is simple: “Our database is too slow, we should use some in memory database engine.”. This is not a good idea. This way you test the other engine, not your production one, so in fact you don’t have your application tested.
I had this problem once. We had to optimize queries by settings a session variable after connecting to the database. That application used Oracle only on production. After setting this variable the tests failed. All the tests. It turned out that I cannot set this variable in HSQLDB, as it simply didn’t exist. So I had to write a terrible code: check the database engine after connection and set the variable or not.
Even if you don’t have any problems with mixing engines, remember that when you test on other database engine than you have on production, you just don’t have the application tested at all.
Tests Don’t Prepare Database
Testing has a common, well defined flow. It is very simple:
- Prepare environment
- Run a test
- Check test result
- Go back to the first point
Try do something opposite and it can hurt you.
Can you see that there is no tidying after the test?
This is very important: tests need to prepare environment before, not after the test. You cannot be sure that the test will be able to clean everything. Application could exit with an error, network connection could broke, or application could blow (e.g. with Out Of Memory Exception). It shouldn’t matter how the test ends, it really matters that the test is running with the same environment which is prepared for each test.
I’ve this mistake once: there were plenty of integration tests which were cleaning all the changes after each test. Many programmers were running those tests using debugger and stop it in the middle after spotting a bug. Any other test run after that had unpredictable and random result. It was running on environment changed by the previous test and wasn’t clearing the whole environment for itself.
Tests Prepare Database, And Don’t Check That
In the previous part I wrote a lot about preparing a database. I want to add one more thing. Preparing the database is not enough. When you prepare the database by clearing some tables, loading fixtures and so on… there is one thing left.
Check the database state after preparing.
You really need to be sure that you’ve prepared all that correctly. This could save your time later, when there is a bug caused by some data that was left and wasn’t cleared.
This should be a part of the pre-test database preparation.
Not Testing Creation Scripts
Each application needs to have some kind of installation procedure. There is always the first time you deploy the database.
Programmers tend to change database by hand usind some adhoc DDL queries. They don’t write them down later, or forget about that. They don’t update the installation scripts. Most teams don’t use versioning scripts (e.g. migrations in Ruby on Rails, or Liquibase in the Java world).
The best way to test that is to recreate the database before running test suite. You don’t have to do that before each test starts. Just run it once, before running all tests.
Yes, better be safe than sorry.
Not Testing Foreign Keys
Foreign keys are one of the basic way of providing database consistency. In a good relational database schema you should have the keys. If you don’t have, well, it could be a sign that you have a really big problem. However this depends on the data model, but usually the lack of foreighn keys is a smell of really bad design.
Testing foreign keys is simple. Just add some rows to a table without adding appropriate rows in referenced table first. You should get an error. Then you should delete rows from the referenced table, you could get error, or not (that depends on the key definition). You should check the expected behaviour whatever it is.
Not Testing Default Values
In a good database design you should define some sensible default
values. Usually those default values could be nulls. Those should be
tested as well. You cannot assume that only your application will be
changing data in this database.
- What if somebody wants to create a quick fix and update some data using adhoc SQL queries?
- What if some day someone starts another application which can change the data and the new application won’t be using your ORM mappings or your DAO classes?
If you have stupid default values, or wrong default values, you could destroy the data and that could be even worse than a simple application bug.
Not Testing Constraints
There are more constraints in a database than just primary and foreign keys. You could have columns which are unique or not null. You can constraint column to only small set of values. You could want to be sure that the price will never be lower than 0.
Good database schema should have many constraints. You should test them too. If you want your price column to have only positive values, what happens when you try to insert there $-1? Why not test that?
You cannot assume that only your well tested application will be using the data, and those checks are your last line of defense against those bugs. Why not test if it works correctly?
Multiple Tests Can Change the Same Database
Usually databases tests change the database. You can run multiple test symultaneously, but you have to be sure that none of the tests influence others. You must be sure that if one test writes something to the database, another test won’t read that.
Usually it is too easy to mess that up, so my small advice: avoid running multiple tests at the same time. This also means that you shouldn’t run the same test suites from multiple machines.
When you have many developers, who want to run tests, each of them should have separate database, where tests can write. If you have some kind of read only database, that’s OK, multiple machines can use such a database at the same time. But if you allow for a situation where all programmers are using the same database for testing, you can have really unpredictable test results.
What a programmer do when he spots an error in a test? Well, good programmer tries to fix that. If the test fails only because of another programmer who was running his tests on the same database, this is just a waste of programmer’s time to fix them.
There is No Big Red Button
Good programmers are lazy. Good programmers are more and more frustrated if you order them to repeat the same task all the time. Good programmers automate repeatable things.
In each project you have to deploy something on testing environment. How much time does it take? Do you really want to waste your programmer time for redeploying application and loading databases all the time?
That’s why each project should have the Big Red Button. The button a programmer can push, make some coffee, get back to work and after a couple of minutes get to know that the Big Red Button finished the job and everything is ready.
The Big Red Button is a really time saver. You may claim that it is too slow to automate all of that. Well, in fact it is not. It is quite opposite. It is like claiming that TDD is slow. It is slower at the very beginning but the more complicated the project becomes the more time you save because of existing tests, or Buttons. All sort of Big Red Buttons – you could use them to deploy applications, run tests and all other staff like that.
Sometimes Jenkins (aka Hudson) is used for that. Yes, this is a great software for such Big Red Button stuff. The only thing is that each programmer should has his own set of jobs to deploy all the stuff on his own environment where he (or she, of course) can play with that not influencing others, and not being influenced as well.
There are many tools for database testing. You could write simple
integration tests for testing the whole schema. For PostgreSQL there is
pgTAP, which can be integrated with Jenkins using the TAP plugin, so
Jenkins can have a job which tests if a database (including the
production database) is OK.