Performance Zone is brought to you in partnership with:

Chris Travers is the most active developer of LedgerSMB and have been working with PostgreSQL since 1999. For the last five years he has spent extensive time with the more advanced features of this database management system. At the same time, PostgreSQL has continued to grow in these areas. Chris is a DZone MVB and is not an employee of DZone and has posted 30 posts at DZone. You can read more from them at their website. View Full User Profile

In Defense of Hand-coded SQL

08.21.2013
| 7424 views |
  • submit to reddit
One common comment I get when I point out that I hand-write all of my SQL queries, rather than relying on an ORM or the like, is that it is drudge work, obsoleted by modern tools, and when I mention that these are usually wrapped in stored procedures, the reactions go from disdainful to horrified.  This piece is the other side: why I do this and why I find that it works.  I am not saying that these approaches are free of costs, but software engineering is about tradeoffs.  These tradeoffs are real.  My approach is not a magic bullet, but it forms a vital piece of how I build software on the database. 

The first thing to note is that I use a lot of SELECT * FROM table queries when querying tables that match the output structure.  We all know that we run into tables that cannot be reasonably further normalized where the application structure can feed directly into the application.  In a stored procedure wrapper, SELECT * reduces maintenance points of such tables if new fields need to be added (in which case the query still matches the specified return type with no modifications).  This has costs in that it discourages refactoring of tables down the road, but this just needs to be checked.  One can still have central management by using views if needed.  Central management of type definitions is generally a good thing.  Views can take the place of an ORM ...

The second point is that CRUD queries of this sort don't really take significant time to write, even on a well-normalized database, and having these encapsulated behind a reasonably well-designed procedural interface is not a bad thing provided that some of the classical difficulties of stored procedures are addressed.

I find that my overall development time is not slowed down by hand-writing SQL.  This remains true even as the software matures.  The time-savings of automatic query tools is traded for the fact that one doesn't get to spend time thinking about how to best utilize queries in the application.  The fact is that, as application developers, we tend to do a lot in application code that could be done better as part of a query.  Sitting down and thinking about how the queries fit into the application is one of the single most productive exercises one can do.

The reason is that a lot of data can be processed and filtered in the queries themselves.  This allows one to request that the database send back data in the way the application can make best use of it.  This can eliminate a lot of application-level code and lead to a shrinking codebase.  This in turn allows application-level code to make better use of data returned from queries, which leads to better productivity all around.
Published at DZone with permission of Chris Travers, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Clemens Eisserer replied on Wed, 2013/08/21 - 5:16am

Being a developer who has done both - I really prefer the ORM way.
Even with an ORM you can go down and write your queries by hand when required - but all the boilerplate CRUD stuff is taken almost completely off your shoulders.

Chris Travers replied on Wed, 2013/08/21 - 6:44am in response to: Clemens Eisserer

Hi Clemens.

I think the key is to write SQL queries so that they are self-contained functions.  This takes a lot of practice, actually, and isn't something that comes overnight.

For example, suppose we are doing fixed asset depreciation.  One way of doing things might be to pull current state out of the database, run through it in your 3GL language, and present the desired state to the user.  I have found this to be difficult and brittle compared to using SQL to pull the *desired* state out of the database and just skip the 3GL logic in that way.  The second way leads to a much smaller codebase, more robust and verifiable logic, and more robust code.

Mason Mann replied on Wed, 2013/08/21 - 10:27am

People who handwrite SQL are invariably morons. 

Here's what you miss out when using a good ORM with generated mappings:

- Automatic first and second level caching 

- Guaranteed consistency between code and database structure. Change the database? Regenerate pojo's -> compile errors until code adheres to database structure.

- True vendor independence. Yes, I'm switching between six different db's in our products with zero issues.

- I work with objects, not relation sets. That kinda makes sense in an oop language.

- Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

- Any decent ORM understands and injects vendor specific query hints better than you.

Also, get a clue.

Hafizan Aziz replied on Thu, 2013/08/22 - 9:26pm in response to: Mason Mann

 I have mention with someone before.

1. Never ever load 1 million or more into one classes orm just to achieve OOP standard.Wonder why ram so so needed a lot. Destroy and Create class just take  mere memory.

2.Some speed optimization couldn't be done like subquery.Need 2 call twice/thrice for database ?

3.ORM only good on data structure.It's to prevent bad code.

4.True vendor independence.->A analyst realy wanted to debug what going inside.Like  Mssql server paging.Dam change a lot from 2001,2005,2008,2012 version.

Lukas Eder replied on Thu, 2013/09/12 - 7:00am in response to: Mason Mann

OK, now this was amusing :-)
- Automatic first and second level caching 
This, obviously, is utterly impossible outside the world of ORMs.
- Guaranteed consistency between code and database structure. Change the database? Regenerate pojo's -> compile errors until code adheres to database structure.
True. No one has ever written a code generator before it was added to Hibernate.
- I work with objects, not relation sets. That kinda makes sense in an oop language
... which your DBA will probably always agree with. Remember to remind your manager why he bought that 1M$ Oracle license, when you run N+1 selects for fetching your OOP objects.

- Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

Of course, there is always a black / white answer to "productivity" questions. Like, how productively you can express a SQL:2003 MERGE statement with HQL. Or, how productively you can calculate a running total involving window functions, or maybe, recursive SQL with HQL.

- Any decent ORM understands and injects vendor specific query hints better than you.

That is indeed an amazing theory, which I was utterly unaware of. I'm sure you will be more than happy to point us to a documentation example, where this applies?

Mladen Adamovic replied on Thu, 2013/09/12 - 11:25am

I've seen many projects which have bad performance and did use Hibernate or JDO. I hate those technologies (along with JPA).

After writing thousands of line of JDBC code I've realized how it could be done properly.

Finally I made fast java ORM framework and opensourced it at: https://code.google.com/p/fjorm/ 

Hope someone might like it.


Chris Travers replied on Thu, 2013/09/12 - 8:26pm in response to: Mason Mann

Mason Mann's comments are a bit amusing.  I am not 100% anti-orm.  There are some good ORM's out there, which are even capable of recursive SQL (see SQLAlchemy or DBIx::Class for example) but they tend to be SQL-oriented and written for people who can write SQL by hand.  The larger issue though is that *thinking* about sets in terms of what you can do with them is usually a performance win *even when* considering the CRUD queries one tends to write.

On to the specific points:

- Automatic first and second level caching 

Automatic caching only makes sense when you have only one application accessing your database.  Otherwise you have concurrency issues that are beyond what the automatic caching can be aware of.  In any realistically complex environment you don't want automatic caching.  You want to be deliberate about what you cache and when you invalidate things so you know what to do about it.    This requires knowledge of the environment beyond what the application can know about itself.  But if you aren't going to have multiple applications accessing your database, why not go NoSQL?  After all you are missing out on the power of the relational db.


- Guaranteed consistency between code and database structure. Change the database? Regenerate pojo's -> compile errors until code adheres to database structure.
This is an interesting problem.  However, if you have your queries in the RDBMS via UDF's and have a discoverable API, then you can either write your object model from the db (including all boilerplate code) or you can dynamically discover the API at run-time (like one might with SOAP for example), or any number of other approaches.  In short I kind of agree with this one but I prefer to go the other direction and assume some encapsulation in the db that the application can rely on.  The ORM IMO is a relatively faulty abstraction layer there.


- True vendor independence. Yes, I'm switching between six different db's in our products with zero issues.
The fundamental question, as I would put it, is, "do you want a db for many apps or an app for many rdbms's?"  I am not sure you can have both.  That's a business decision.  I think the most interesting db's are dbs for many apps however and to make that work I think you have to give up on the dream of vendor independence.  This is sub-optimal when you want to force everyone to go through your app for licensing revenue, but for an open source application it is much better.


- I work with objects, not relation sets. That kinda makes sense in an oop language.
I work with sets of objects.  It makes life easier because I get access to both sides.  You can do a *lot* of things faster by leveraging set operations, and do so in a more maintainable manner.


- Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

This might be true if you are not really familiar with SQL.  It is also true that it is hard to come up with something less productive than writing SQL as text strings rather than functions.  However using a 3GL to generate language in a 4GL is rarely a win.  A better approach is to write SQL that is capable of functioning regardless of query criteria and acts like a function.  This is true whether you are wrapping it in a UDF or not.  On the positive side, this also makes the SQL understandable, debuggable, and tunable for performance reasons.

- Any decent ORM understands and injects vendor specific query hints better than you.
I am not convinced of this.  What I am convinced is that ORM-generated SQL is usually beyond debugging in this manner.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.