Andrew trained as a biologist before transmuting into a software engineer. He's been coding in Java since 1997, and has done a bit of scientific software, a bit of mobile gaming and lots of enterprise software. He's particularly interested in humane and sustainable development. Professionally, that means code maintainability and agile methods. Outside the professional sphere: permaculture, gardening, cooking, green transport and raising children (not by order of importance). He also loves ensemble singing, but that’s on hold for now. Andrew is a DZone MVB and is not an employee of DZone and has posted 9 posts at DZone. You can read more from them at their website. View Full User Profile

Solve Foreign-key Problems in DBUnit Test Data

  • submit to reddit

If you create small per-test datasets, as DBUnit advises, you’ll get intermittent build failures due to foreign-key violations. This post explains (1) why this happens, (2) why small per-test datasets are still a good idea, and (3) one simple way to get around the problem.

NB When I searched for solutions to this problem, I discovered that other kinds of foreign-key problem come up with DBUnit. Some people have circular dependencies in their relational database schemas, which stops DBUnit from loading the test data. If such is your case, I’m sorry to say that this post won’t help you with it, and your best option is probably to just take yourself outside and shoot yourself now. (Although some people seem to chosen instead to disable foreign key checking during test runs.)

What causes the foreign-key violations

The cause of the problem is simple, and illustrated by a trivial example. Suppose you have two entity classes, HitchHiker and SpaceShip. The HitchHiker table has a foreign key that references SpaceShip. The test data for HitchHikerDaoTest contains lines from both tables, whereas the test data for SpaceShipDaoTest contains only lines from SpaceShip.

DBUnit’s default setup operation, CLEAN_INSERT, wipes data from every table occurring in the test dataset and then inserts the lines listed in that dataset. When SpaceShipDaoTest runs, DBUnit will start by deleting everything in the SpaceShip table. If any HitchHikers are currently riding in the SpaceShips that are about to be deleted, the database will object to their untimely eviction (I’m not sure whether the error message will read like Vogon poetry, though).

If you start from an empty database, and execute SpaceShipDaoTest and then HitchHikerDaoTest, you’ll be fine; but if you do it in the other order, your build will fail. It’s that second-worst kind of bug, the unpredictable kind, since you don’t (usually) specify the order in which tests run. After all, they’re supposed to be independent! So you may well find that you have no problems for months on end, until one day you get an error running individual tests in a particular sequence, or Maven changes the order in which it runs your tests on the CI server, and BOOM!

Why you should still use small independent datasets

It’s tempting to circumvent the problem by using a single monolithic dataset for all your integration tests. I’ve tried this, and I advise against it. A big data file is hard to work with: you waste a lot of time scrolling around looking for the line you need, and it’s very hard to follow and understand foreign-key relations. Worse still: by modifying the data to make one test pass, you can easily accidentally break another one. The larger the dataset and the test suite become, the more fragile they get, and the more painstaking it becomes to modify them.

How to avoid the foreign-key problem with small independent datasets

One working but unsatisfactory solution would be to pad out every XML dataset with the list of all tables touched in the test suite. It’s unsatisfactory because the only way to add a table into a FlatXmlDataSet is to list a line of that table — a FlatXmlDataSet can’t contain empty tables — and there’s no justification for polluting the test data with lines from tables that are not part of the test.

The solution I found was to use a DTD to clean tables before tests. Every XML file has different contents, but they all reference a single DTD which lists all the tables involved in the test suite. The DTD is easy to generate from the database schema, and useful for auto-complete and catching typos in column names, so you should probably already be using one. The code to exploit its contents is very simple:

private IDataSet loadTestDataWithDtdTableList(String dtdFilename)
throws IOException, DataSetException, SQLException {

Reader dtdReader = new FileReader(new ClassPathResource(dtdFilename).getFile());
IDataSet dtdDataset = new FlatDtdDataSet(dtdReader);
FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder();
builder.setMetaDataSet(new DatabaseDataSet(dbUnitConnection, false));
IDataSet xmlDataset =;
return new CompositeDataSet(dtdDataset, xmlDataset);

How it works: DBUnit provides a facility to load a dataset from a DTD. This dataset contains all the tables listed in the DTD, but of course empty of data. The DTD dataset is then combined with a FlatXmlDataSet representing your test data. The graphic below illustrates the composite dataset that would be produced for the SpaceShip example.

Schema of composite dataset from XML and DTD

If you have dictionary tables whose contents never change, you can and should leave them out of the DTD as well as out of the XML datasets, to improve test performance a little.

One further detail: you should close the FileReader after test setup. I couldn’t find a hook into the end of the test setup operation (short of writing my own DatabaseOperation), so I saved the reference as a member variable and hooked the close() call into the tear-down phase of the test.

NB For a more complete code example, see this Gist snippet of a base class for TestNG+Spring+DBUnit tests that adds the above-described DBUnit setup operation to Spring’s TestNG helper class.

Happy database testing!


Published at DZone with permission of Andrew Spencer, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)



Mladen Girazovski replied on Wed, 2011/02/16 - 4:00am

I prefer this pattern when using DBUnit:

(Example: Back Door Fixture Setup )

By not  using files for the testdata, your tests will be less obfuscated and you can avoid redundancy (test data in files needs to match the expected values in the testcase).

Kim Eeckhout replied on Wed, 2011/02/16 - 5:10am

I've ran into the same issue before, but I solved it in another way. As I was using HSQL DB for my DBUnit tests, I extended the CLEAN_INSERT operation so that it would disable all referential constraints before doing the clean, and then enabled them again before doing the insert.

 I think the statement to do that was something along the lines of 



Andrew Spencer replied on Wed, 2011/02/16 - 6:13am

@Mladen Girazovski
Thanks for the link. I like this approach, the only downside is that XML embedded in Java isn't very pretty - the ideal would be a Java-based format (DSL?). As I understand, this technique is complementary to the DTD-based clean operation.

@Kim Eeckhout
I saw that some people did that. It's more flexible because you don't have to worry either about the order of tables in your test data. The disadvantage is that it's database specific. We use Oracle and I don't think you can even do that in Oracle (though I could be wrong; I am not a DBA).

Roger Lindsjö replied on Wed, 2011/02/16 - 11:53am in response to: Andrew Spencer

You can disable and enable constraints in Oracle with ALTER TABLE table DISABLE CONSTRAINT constraint. You can enable the constraint again with similar syntax. I don't know how to enable / disable all constraints in one go, maybe someone else does.

Chad Retz replied on Wed, 2011/02/16 - 6:17pm

I just added utilities to disable/enable constraints (SQL server from here) and allow identity inserts on tables. Makes things really simple.

Mladen Girazovski replied on Thu, 2011/02/17 - 4:17am

@Mladen Girazovski
Thanks for the link. I like this approach, the only downside is that XML embedded in Java isn't very pretty - the ideal would be a Java-based format (DSL?). As I understand, this technique is complementary to the DTD-based clean operation.

 Yes Andrew, my suggestion had nothing to do with the clean operation afterwards, it was just a recommendation.

While XML in Java might look strange at first, it can be encapsulted in a factory, but personally, i don't find it as ugly as it might seem, the big advantage is, that you only need to define your expected values in one place, no need to synchronize the testdata in an XML file and in the Java code, helps with refactoring and makes the test easier to understand IME.

Fuzail Sarang replied on Thu, 2011/02/17 - 4:29am

I have to totally disagree with this article for these reasons:
The whole point of Unit Test is that its a single Unit that does not interfere with other tests. This is solved much more easily by writing proper tests. What is the definition of a proper test, one that sets up what ever it needs to, runs whatever it needs to and then tears down whatever it setup and run leaving the system in a clean expected state for the next test to run.
Quite simply the problem then is the test was written badly and the solution provided is hacking to bypass badly written tests. I teach all my developers that when they write a test they must:
  1. Assume the systems is in a known state
  2. Setup the systems according to their tests needs.
  3. Verify all the data etc on test completion
  4. Reset the system to the known state

If you don't know the known state and can't reset to the known state that means you not testing properly in the first place since you don't know what your test has changed if you did you could reset and if you don't know what has changed you can't verify accurately.

Andrew Spencer replied on Thu, 2011/02/17 - 10:07am in response to: Fuzail Sarang

@Fuzail Sarang

Ah... I see what you mean, but I'm not convinced that it is useful nor practical to leave the database in a clean state after each test.  It is much more reliable to have a setup fixture that can get the database into the required state, no matter what state it starts off in.  Keeping the post-test state is also useful for diagnosing failed tests (cf DBUnit's "Best pratice" page).

 To put it another way, I think it unwise to start with step 1, and prefer to perform step 4 at the start of each test.

Andrew Spencer replied on Thu, 2011/02/17 - 10:01am in response to: Roger Lindsjö

@Roger Lindsjö, Chad Retz

Thanks for the tips for disabling constraints in Oracle and SQL Server.  I'm thinking this might be a cleaner option than my suggestion, and I'll give it a try on my next project.  As long as the constraints are activated before the start of the test, and all constraints checked on reactivation, it'd be safe to do that.

Cédric Conversin replied on Fri, 2011/11/18 - 7:38am

For Oracle, seems that the FEATURE_SKIP_ORACLE_RECYCLEBIN_TABLES property works


Rafael Ponte replied on Tue, 2012/11/13 - 11:36am

Every test should be isolated and independent from each other as well as every dataset(s). It's the test which defines what a dataset should have.

It may appear harder to keep all data needed within a dataset to satisfy a single test but if someone follows this rule he probably will not have those kind of problems.

Also, it's important to know how the CLEAN_INSERT operation works, it's a composite operation that performs a DELETE_ALL operation followed by an INSERT operation. However,  CLEAN_INSERT performs a DELETE_ALL from bottom-up and right after an INSERT from top-down.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.