Cagdas Basaraner is a software engineer graduated from Hacettepe University Computer Engineering department and Information Systems master program (Turkey). He has 5 years of professional experience, and is working on information systems with JEE web technologies. He is also a former developer of information systems with Microsoft .NET technologies and Command & Control (C4I) systems with Java technologies. Cagdas is a DZone MVB and is not an employee of DZone and has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

20 Database Design Best Practices

11.04.2012
| 52487 views |
  • submit to reddit

 

  1. Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...).
  2. Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names.
  3. Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Course”. StudentCourse is much better).
  4. Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.).
  5. Keep passwords as encrypted for security. Decrypt them in application when required.
  6. Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing ...).
  7. Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems.
  8. Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”.
  9. Provide authentication for database access. Don’t give admin role to each user.
  10. Avoid “select *” queries until it is really needed. Use "select [required_columns_list]" for better performance.
  11. Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis ...) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters.
  12. Partition big and unused/rarely used tables/table parts to different physical storages for better query performance.
  13. For big, sensitive and mission critic database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc.
  14. Use constraints (foreign key, check, not null ...) for data integrity. Don’t give whole control to application code.
  15. Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts.
  16. Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better.
  17. Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage.
  18. Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables.
  19. Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance.
  20. Spend time for database modeling and design as much as required. Otherwise saved(!) design time will cause (saved(!) design time) * 10/100/1000 maintenance and re-design time.
Published at DZone with permission of Cagdas Basaraner, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Sam Lewis replied on Mon, 2012/11/05 - 3:21am

5. is not a good practice, you should one way hash passwords not encrypt them.

8. is a bit Hungarian... why should I prefix bit columns and no other columns? 

Dean Pehrsson-c... replied on Mon, 2012/11/05 - 3:59am

"varchar column indexing will cause performance problems"

On what RDBMS?

Fredrik Bertilsson replied on Tue, 2012/11/06 - 3:05am

6 and 7. Why only integer? Varchar is not the only option to integer. Fix-length char is also an valid option. Do you have anything to support the statement that anything but integers would cause performance problems?

19. Why would excessive joins give you worse performance?

Florin Manea replied on Wed, 2012/11/07 - 6:47am in response to: Fredrik Bertilsson

6 and 7: primary key value is used in indexes to identify parent row in actual table so for varchar PKs index storage requirements (and thus storage read effort to retrieve data) will grow significantly

19.Joins need additional data to be fetched from related tables; sometimes (if no appropriate filtering values are provided/can be identified) all data must be read in order to find the matching rows (like HASH or MERGE JOINS in SQL Server)

Horse Badorties replied on Wed, 2012/11/07 - 8:29pm

14: Triggers allow you to write arbitrary asserts for your database consistency rules. If the application tries to make a mess of the database,  the right trigger will do a rollback on the transaction. This requires an 'after' trigger, which is run at the end of a transaction. If any statement in the transaction touches table X, then all triggers which involve table X will be executed. Usually you can just put a simple query as the assert.

For example, if two fields cannot both be populated, you cannot enforce this with built-in database rules. You need to add a trigger which uses this query for the assert: "if A is not null and B is not null".



Jason Mulligan replied on Mon, 2012/11/12 - 9:27am

"1. Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names."

Huh? that doesn't make any sense when you consider that collections contain 1 or more entities, and semantically it's represented with pluralization.

If your schema can't map to an API, your naming convention is flawed.

"StudentCourse" should be "Courses", since you're a student if you're taking the course.

Neron Liu replied on Mon, 2012/11/19 - 12:43am

Hi,

Is there any little story to there practices? I'm very hope to see the stories about them since the stories always makes people impressive.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.