Don't use functions around columns in your WHEREs: The worst T-SQL mistake24 Jan 2022 #tutorial #sql
There’s one thing we could do to write faster queries in SQL Server: don’t use functions around columns in WHERE clauses. I learned it the hard way. Let me share this lesson with you.
Don’t use user-defined or built-in functions around columns in the WHERE clause of queries. It prevents SQL Server from estimating the right amount of rows out of the function. Write queries with operators and comparisons to make SQL Server better use the indexes it has.
With functions around columns in WHEREs
To prove this point, let’s query StackOverflow database. Yes, the StackOverflow we all know and use.
StackOverflow has a Users table that contains, well…, all registered users and their profiles. Among other things, every user has a display name, location, and reputation.
Let’s find the first 50 users by reputation in Colombia.
To make things faster, let’s create an index on the Location field. It’s an
CREATE INDEX Location ON dbo.Users(Location);
This is the query we often write,
DECLARE @Location NVARCHAR(20) = N'Colombia'; SELECT TOP 50 DisplayName, Location, CreationDate, Reputation FROM dbo.Users /* Often, we put LOWER on both sides of the comparison */ WHERE LOWER(Location) = LOWER(@Location) ORDER BY Reputation DESC; GO
Did you notice the
LOWER function on both sides of the equal sign?
We all have written queries like that one. I declared myself guilty too. Often, we use
UPPER or wrap the column around
But, let’s see what happened in the Execution Plan.
We read execution plans from right to left and top to bottom.
Here, SQL Server chose to scan the index
Location first. And notice the width of the arrow coming out of the first operator. When we place the cursor on it, it shows “Number of Rows Read.”
In this copy of the StackOverflow database, there are 2,465,713 users, only 463 living in Colombia. SQL Server has to read the whole content of the index to execute our query.
It means that to find all users in Colombia, SQL Server had to go through all users in the index. We could use that index in a better way.
Write queries with comparisons, functions, and operators around parameters. This way SQL Server could properly use indexes and have better estimates of the contents of tables. But, don’t write functions around columns in the WHERE clauses.
The same is true when joining tables. Don’t put functions around the foreign keys in your JOINs either.
Rewrite your queries to avoid functions around columns in WHERE
Let’s go back and rewrite our query without any functions wrapping columns. This way,
DECLARE @Location NVARCHAR(20) = N'Colombia'; SELECT TOP 50 DisplayName, Location, CreationDate, Reputation FROM dbo.Users /* We remove LOWER on both sides */ WHERE Location = @Location ORDER BY Reputation DESC; GO
And, let’s check the execution plan again.
This time, SQL Server used an Index Seek. It means SQL Server knew the starting point when reading the index on
Location. And the execution plan didn’t go parallel. We don’t have the black arrows in yellow circles in the operators.
Notice, this time we have a thinner arrow on the first operator. Let’s see how many rows SQL Server read this time.
After that change, SQL Server only read 463 records. That was way better than reading the whole index.
Voilà! If we want to write faster queries, let’s stop using functions around columns in our WHEREs. That screws SQL Server estimates. For example, let’s not use LOWER or UPPER around our columns. By the way, SQL Server string searches are case insensitive by default. we don’t need those functions at all.
To read more SQL Server content, check What are implicit conversions and why you should care, how to optimize queries with GROUP BY, and Just listen to SQL Server index recommendations.
Happy SQL time!