SQL Zone is brought to you in partnership with:

I currently manage the BI platform for the second largest health insurance company in South Africa. In my previous life I was a BI lecturer and consultant. I have implemented data warehouses and BI solutions using both Open Source and proprietary software. I live by the saying, "keep it simple stupid." Mpumelelo is a DZone MVB and is not an employee of DZone and has posted 7 posts at DZone. You can read more from them at their website. View Full User Profile

6 Simple Performance Tips for SQL SELECT Statements

03.31.2014
| 14648 views |
  • submit to reddit

Performance tuning SELECT statements can be a time consuming task which in my opinion follows Pareto principle’s. 20% effort is likely give you an 80% performance improvement. To get another 20% performance improvement you probably need to spend 80% of the time. Unless you work on the planet Venus where each day on Venus is equal to 243 Earth days, delivery deadlines are likely to mean you will not have enough time to put into tuning your SQL queries.

After years writing and running SQL statements I began to develop a mental check-list of things I looked at when trying to improve query performance. These are the things I check before moving on to query plans and reading the sometimes complicated documentation of the database I am working on. My check-list is by no means comprehensive or scientific, more like a back of the envelope calculation but can I can say that most of the time I do get performance improvements following these simple steps. The check-list follows.

Check indexes

There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement. Take the 3-Minute SQL performance test. Regardless of your score be sure to read through the answers as they are informative.

Limit size of your working data set

Examine the tables used in the SELECT statement to see if you can apply filters in the WHERE clause of your statement. A classic example is when a query initially worked well when there were only a few thousand rows in the table. As the application grew the query slowed down. The solution may be as simple as restricting the query to looking at the current month’s data. When you have queries that have sub-selects, look to apply filtering to the inner statement of the sub-selects as opposed to the outer statements.

Only select fields you need

Extra fields often increase the grain of the data returned and thus result in more (detailed) data being returned to the SQL client. Additionally:

  • When using reporting and analytical applications, sometimes the slow report performance is because the reporting tool has to do the aggregation as data is received in detailed form.
  • Occasionally the query may run quickly enough but your problem could be a network related issue as large amounts of detailed data are sent to the reporting server across the network.
  • When using a column-oriented DBMS only the columns you have selected will be read from disk, the less columns you include in your query the less IO overhead.

Remove unnecessary tables

The reasons for removing unnecessary tables are the same as the reasons for removing fields not needed in the select statement. Writing SQL statements is a process that usually takes a number of iterations as you write and test your SQL statements. During development it is possible that you add tables to the query that may not have any impact on the data returned by the SQL code. Once the SQL is correct I find many people do not review their script and remove tables that do not have any impact or use in the final data returned. By removing the JOINS to these unnecessary tables you reduce the amount of processing the database has to do. Sometimes, much like removing columns you may find your reduce the data bring brought back by the database.

Remove OUTER JOINS

This can easier said than done and depends on how much influence you have in changing table content. One solution is to remove OUTER JOINS by placing placeholder rows in both tables. Say you have the following tables with an OUTER JOIN defined to ensure all data is returned:

customer_id customer_name
1 John Doe
2 Mary Jane
3 Peter Pan
4 Joe Soap
customer_id sales_person
NULL Newbee Smith
2 Oldie Jones
1 Another Oldie
NULL Greenhorn

The solution is to add a placeholder row in the customer table and update all NULL values in the sales table to the placeholder key.

customer_id customer_name
0 NO CUSTOMER
1 John Doe
2 Mary Jane
3 Peter Pan
4 Joe Soap
customer_id sales_person
0 Newbee Smith
2 Oldie Jones
1 Another Oldie
0 Greenhorn

Not only have you removed the need for an OUTER JOIN you have also standardised how sales people with no customers are represented. Other developers will not have to write statements such as ISNULL(customer_id, “No customer yet”).

Remove calculated fields in JOIN and WHERE clauses

This is another one of those that may at times be easier said than done depending on your permissions to make changes to the schema. This can be done by creating a field with the calculated values used in the join on the table. Given the following SQL statement:
FROM sales a 
JOIN budget b ON    ((year(a.sale_date)* 100) + month(a.sale_date)) = b.budget_year_month 

Performance can be improved by adding a column with the year and month in the sales table. The updated SQL statement would be as follows:

SELECT * FROM PRODUCTSFROM sales a 
JOIN budget b ON    a.sale_year_month = b.budget_year_month     

Conclusion

The recommendations boil down to a few short pointers

  • check for indexes
  • work with the smallest data set required
  • remove unnecessary fields and tables and
  • remove calculations in your JOIN and WHERE clauses.

If all these recommendations fail to improve your SQL query performance my last suggestion is you move to Venus. All you will need is a single day to tune your SQL.

Published at DZone with permission of Mpumelelo Msimanga, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Bryan Copeland replied on Wed, 2014/04/02 - 10:55am

  •  "remove calculations in your JOIN and WHERE clauses."

Or better yet do the date transformations in the programming language pulling the data...

Mpumelelo Msimanga replied on Thu, 2014/04/03 - 8:17am in response to: Bryan Copeland

You quite right, it boils down to not doing the calculation for each row.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.