How to get faster SQL queries
Seven tips to write more efficient database code
By Sean McCown | InfoWorld | Published: 14:30, 01 April 2010
SQL developers on every platform are struggling, seemingly stuck in a DO/WHILE loop that makes them repeat the same mistakes again and again. That's because the database field is still relatively immature. Sure, vendors are making some strides, but they continue to grapple with the bigger issues. Concurrency, resource management, space management, and speed still plague SQL developers whether they're coding on SQL Server, Oracle, DB2, Sybase, MySQL, or any other relational platform.
Part of the problem is that there is no magic bullet, and for almost every best practice, I can show you at least one exception. Typically, a developer finds his or her own favorite methods, though usually they don't include any constructs for performance or concurrency, and doesn't bother exploring other options. Maybe that's a symptom of lack of education, or the developers are just too close to the process to recognise when they're doing something wrong. Maybe the query runs well on a local set of test data but fails miserably on the production system.
I don't expect SQL developers to become administrators, but they must take production issues into account when writing their code. If they don't do it during initial development, the DBAs will just make them go back and do it later, and the users suffer in the interim.
There's a reason why we say tuning a database is both an art and a science. It's because very few hard-and-fast rules exist that apply across the board. The problems you've solved on one system aren't issues on another, and vice versa. There's no right answer when it comes to tuning queries, but that doesn't mean you should give up.
There are some good principles you can follow that should yield results in one combination or another. I've encapsulated them in a list of SQL dos and don'ts that often get overlooked or are hard to spot. These techniques should give you a little more insight into the minds of your DBAs, as well as the ability to start thinking of processes in a production-oriented way.
1. Don't use UPDATE instead of CASE
This issue is very common, and though it's not hard to spot, many developers often overlook it because using UPDATE has a natural flow that seems logical.
Take this scenario, for instance: You're inserting data into a temp table and need it to display a certain value if another value exists. Maybe you're pulling from the Customer table and you want anyone with more than $100,000 in orders to be labeled as "Preferred." Thus, you insert the data into the table and run an UPDATE statement to set the CustomerRank column to "Preferred" for anyone who has more than $100,000 in orders. The problem is that the UPDATE statement is logged, which means it has to write twice for every single write to the table. The way around this, of course, is to use an inline CASE statement in the SQL query itself. This tests every row for the order amount condition and sets the "Preferred" label before it's written to the table. The performance increase can be staggering.
2. Don't blindly reuse code
This issue is also very common. It's very easy to copy someone else's code because you know it pulls the data you need. The problem is that quite often it pulls much more data than you need, and developers rarely bother trimming it down, so they end up with a huge superset of data. This usually comes in the form of an extra outer join or an extra condition in the WHERE clause. You can get huge performance gains if you trim reused code to your exact needs.
3. Do pull only the number of columns you need
This issue is similar to issue No. 2, but it's specific to columns. It's all too easy to code all your queries with SELECT * instead of listing the columns individually. The problem again is that it pulls more data than you need. I've seen this error dozens and dozens of times. A developer does a SELECT * query against a table with 120 columns and millions of rows, but winds up using only three to five of them. At that point, you're processing so much more data than you need it's a wonder the query returns at all. You're not only processing more data than you need, but you're also taking resources away from other processes.