Performance Zone is brought to you in partnership with:

I am the founder and CEO of Data Geekery GmbH, located in Zurich, Switzerland. With our company, we have been selling database products and services around Java and SQL since 2013. Ever since my Master's studies at EPFL in 2006, I have been fascinated by the interaction of Java and SQL. Most of this experience I have obtained in the Swiss E-Banking field through various variants (JDBC, Hibernate, mostly with Oracle). I am happy to share this knowledge at various conferences, JUGs, in-house presentations and on our blog. Lukas is a DZone MVB and is not an employee of DZone and has posted 240 posts at DZone. You can read more from them at their website. View Full User Profile

The Myth of Slow SQL JOIN Operations

08.22.2013
| 12629 views |
  • submit to reddit

In my recent SQL work for a large Swiss bank, I have maintained nested database view monsters whose unnested SQL code amounted up to 5,000 lines of code, joining the same table over and over again in separate subselects combined via UNION operations. This monster performed in way under 50ms, no matter how we queried it (see “10 more common mistakes” about the speed of queries). Of course, this performance was only achieved after lots of fine-tuning, load-testing and benchmarking. But it worked. Our Oracle database never let us down on these things.

Nonetheless, many SQL users think that JOIN operations are slow. Why? Perhaps because they are or used to be in MySQL? I’m currently reading this interesting book by Markus Winand. The book is called SQL Performance Explained. He’s also the author of Use-The-Index-Luke.com where you can get free insight into his book. I still recommend reading the whole book, though. Even SQL old-timers and SQL nerds like me will find one or two novel, very interesting approaches, some of which will be incorporated into jOOQ very soon!

In particular, consider this page which explains very well how Hash JOIN operations work:
http://use-the-index-luke.com/sql/join/hash-join-partial-objects

Published at DZone with permission of Lukas Eder, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Cody A_ replied on Thu, 2013/08/22 - 10:33am

 I think one aspect of using JOINs that is often overlooked is the relative size of the result set. If the result set is large - it often makes more sense (in terms of performance) to execute using the JOIN. However, if the result set is small, you can often times achieve equivalent or better performance using a subselect or subquery. This is what I have found to be a common challenge when writing custom SQL queries.

Lukas Eder replied on Fri, 2013/08/23 - 4:28am in response to: Cody A_

However, if the result set is small, you can often times achieve equivalent or better performance using a subselect or subquery.

What would be the rationale behind this? By subselect / subquery, you mean a semi-join (IN predicate or EXISTS predicate)? Clearly, semi-joins tend to be a bit faster than actual joins, as they can stop as soon as one record fulfills the condition. But I'd be curious to hear about a "general rule" with benchmarks.

Oleksandr Alesinskyy replied on Wed, 2013/08/28 - 4:44am in response to: Lukas Eder

 There is (can cannot be) any "general rules", at best you may hope for database-specific rules.

Surendra Kuppuraj replied on Fri, 2013/08/30 - 2:54am in response to: Cody A_

But if your result set is large dont use table alias.  Use just table name directly since table alias makes a copy of each single execution of result set. By constrained to this approach, you could enhance the performance of JOIN.

NOTE: There are different sort of joins available. But most malicious one is Inner and Outer Join. They bring duplicated rows if your joins do not match.(i personally hate join)

Lukas Eder replied on Fri, 2013/08/30 - 3:03am in response to: Surendra Kuppuraj

But if your result set is large dont use table alias.  Use just table name directly since table alias makes a copy of each single execution of result set. By constrained to this approach, you could enhance the performance of JOIN.

That sounds very improbable to me. You're either using an outdated or really simple database (or you've missed on some not-so-recent evolutions). Can you cite some authoritative sources that prove that aliasing tables has a relevant performance impact on queries?

Oleksandr Alesinskyy replied on Fri, 2013/08/30 - 4:26am in response to: Lukas Eder

 It is not just "improbable", it is a pure crap. Aliasing is just a syntactic sugar, nothing more.

Surendra Kuppuraj replied on Sun, 2013/09/22 - 11:55pm in response to: Lukas Eder

Yes. You could please refer to the book Oracle database 10g complete References by Loney (The Oracle Press). 

Oleksandr Alesinskyy replied on Mon, 2013/09/23 - 5:02am in response to: Surendra Kuppuraj


Even of it is published in a book, it does not necessarily makes it true - you would be very surprised to find how many books, sometimes by very reputable authors, contain outright mistakes.

But in this very case I am pretty sure you misinterpreted what you have read in the book - may you provide an respective excerpt (or at least more specific reference with either a page number or a chapter ).

Anyhow you so not need to rely on books in this matter - just use EXPLAIN PLAN (or one of the DBMS_XPLAN functions) on both versions of query and see for yourself ;)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.