Jerry is a DZone MVB and is not an employee of DZone and has posted 6 posts at DZone. You can read more from them at their website. View Full User Profile

Database/Code Impedence Mismatch

02.18.2010
| 5146 views |
  • submit to reddit

I love natural keys in database design. You have to pay attention, though: the natural impedance mismatch between a programming language representation and the database representation of the key can bite you.

Consider an object whose primary key might contain a date--say, a change log record. Oracle and DB2 both store a DATE as a time containing year, month, day, hours, minutes, and seconds. No timezone. The natural mapping for a Java tool like Hibernate is to map to a java.util.Date, which stores the Date as a time in milliseconds since the epoch GMT, and then maps it to whatever timezone is set on the machine where the code is running for display and conversion.

Now consider what might happen (especially if our change log record is attached to some parent object);

  1. We create and save the object; it is persisted. The local cached copy contains a non-zero value for milliseconds, but the database has truncated the milliseconds value and saved it.
  2. Later on in the code somewhere, we have reason to save the object again, perhaps as part of some collection operation.
  3. Hibernate looks in its cache, compares it with the database, and notes that the values of the Date don't match--so it tries to save the value again.
  4. The database dutifully tosses out the spare milliseconds, and bam! we have an attempt to re-insert an existing record, so it throws an exception.
This is all terribly confusing to the programmer, who, inspecting the objects in question, sees no difference between what's in the database and what's in her code, especially since the default display characteristics of her database browser and her debugger don't show the milliseconds.

The easy fix in this case is to declare a class which matches the database representation--in this case, a good choice would be to declare a new class which truncates the milliseconds. A modest example is shown below:

/**
* Public Domain; use or extend at will.
*/
import java.util.Date;

public class DbDate extends Date {
/** increment if you change the state model */
private static final long serialVersionUID = 1L;

/** @see java.util.Date#Date() */
public DbDate() {
long t = getTime();
setTime(t - t%1000);
}

/** @see java.util.Date#Date(long) */
public DbDate(long t) {
super(t - t%1000);
}

/** @see java.util.Date#setTime(long) */
@Override
public void setTime(long time) {
super.setTime(time - time%1000);
}
}

Also note that if you declared the database column as a TIMESTAMP, the Java and database representations more-or-less match--avoiding, in this case, this kind of problem. Note that Oracle doesn't support TIMESTAMP_WITH_TIMEZONE in a primary key, and DB2 doesn't implement TIMESTAMP_WITH_TIMEZONE at all--as of the last time I had access to DB2.

Dealing with timezones is another topic entirely--one which I'll take up in a future post.

From http://execdesign.blogspot.com

Published at DZone with permission of Jerry Andrews, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Karl Peterbauer replied on Thu, 2010/02/18 - 4:03am

I don't think that date-based primary keys are a good idea, even for log records.

Chad Hahn replied on Thu, 2010/02/18 - 1:27pm

UGH!  Why would I want to go through all the pain of the above code and explanation, you just wrote, instead of using a simple surrogate key?  Please don't encourage more people to misuse natural keys, especially when using JPA/Hibernate.

Steve Ck replied on Fri, 2010/02/19 - 11:14pm

Karl & chahn, I don't think the issue addressed here is one of database primary keys.  Under different circumstances you may have the same logical quantity (e.g. a date) represented multiple way in difference context (e.g. Java Date, db TIMESTAMP) and you need to be careful when moving from one domain to another, especially when you have any type of equivalence test, as the data may change underneath your abstraction.  The solution, as pointed out, is a canonical representation.

Nicolas Bousquet replied on Sat, 2010/02/20 - 7:16am

As other say, never use data as a primary key.

 

Hey in fact, a date is not a natural key most of the time in the first place. Simply because as your logging exemple, you might want several log to occurs at the same time.

 

Some have done this on our software, and it's a really, really bad. We have several event that should occurs at exactly the same date. But it is not possible... Then we have to loop until we find a second not used. Sometime it's up to 100 seconds later...

A date shall NEVER be a primary key, and shall never be a natural key. Even if it take into account milliseconds. There is always a chance that this current date is already used. (For logs it's even very likely).

The date is just a property, and if you need to find things by date, just add a non unique index on it.

 

And oh, it's best pratice to allways a single column for the primary key. Usualy an auto incremented number is perfect.

If you model change, and you initial natural key is not  one anymore then you don't have to change other tables. Join are more consise to express in SQL and performance is better. Your natural key (maybe a compositif of several fields) shall have a unique constraint and or indexes if needed

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.