Don't use functions around columns in your WHEREs: The worst T-SQL mistake
24 Jan 2022 #tutorial #sqlThere’s one thing we could do to write faster queries in SQL Server: don’t use functions around columns in WHERE clauses. I learned it the hard way. Let me share this lesson with you.
Don’t use user-defined or built-in functions around columns in the WHERE clause of queries. It prevents SQL Server from estimating the right amount of rows out of the function. Write queries with operators and comparisons to make SQL Server better use the indexes it has.
With functions around columns in WHEREs
To prove this point, let’s query a local copy of the StackOverflow database. Yes, the StackOverflow we all know and use.
StackOverflow has a Users
table that contains, well…, all registered users and their profiles. Among other things, every user has a display name, location, and reputation.
Let’s find the first 50 users by reputation in Colombia.
To make things faster, let’s create an index on the Location field. It’s an NVARCHAR(100)
column.
CREATE INDEX Location ON dbo.Users(Location);
This is the query we often write,
DECLARE @Location NVARCHAR(20) = N'Colombia';
SELECT TOP 50 DisplayName, Location, CreationDate, Reputation
FROM dbo.Users
-- Often, we put LOWER on both sides of the comparison
WHERE LOWER(Location) = LOWER(@Location)
-- ^^^^^ ^^^^^
ORDER BY Reputation DESC;
GO
Did you notice the LOWER
function on both sides of the equal sign?
We all have written queries like that one. I declared myself guilty too. Often, we use LOWER
and UPPER
or wrap the column around RTRIM
and LTRIM
.
But, let’s see what happened in the Execution Plan.
Here, SQL Server chose to scan the index Location
first. And let’s notice the width of the arrow coming out of the first operator. When we place the cursor on it, it shows “Number of Rows Read.”
In this copy of the StackOverflow database, there are 2,465,713 users, only 463 of them living in Colombia. SQL Server had to read the whole content of the index to execute our query. Arrrggg!
It means that to find all users in Colombia, SQL Server had to go through all users in the index. We could use that index in a better way.
Write queries with comparisons, functions, and operators around parameters. This way SQL Server could properly use indexes and have better estimates of the contents of tables. But, don’t write functions around columns in the WHERE clauses.
The same is true when joining tables. Let’s not put functions around the foreign keys in our JOINs either.
Rewrite your queries to avoid functions around columns in WHERE
Let’s go back and rewrite our query without any functions wrapping columns. This way,
DECLARE @Location NVARCHAR(20) = N'Colombia';
SELECT TOP 50 DisplayName, Location, CreationDate, Reputation
FROM dbo.Users
-- We remove LOWER on both sides
WHERE Location = @Location
ORDER BY Reputation DESC;
GO
And, let’s check the execution plan again.
This time, SQL Server used an Index Seek. It means SQL Server didn’t have to read the whole content of the Location
index to run our query. And the execution plan didn’t go parallel. We don’t have the black arrows in yellow circles in the operators.
Let’s notice that this time we have a thinner arrow on the first operator. Let’s see how many rows SQL Server read this time.
After that change, SQL Server only read 463 records. That was way better than reading the whole index.
Voilà! If we want to write faster queries, let’s stop using functions around columns in our WHEREs. That screws SQL Server estimates. For example, let’s not use LOWER or UPPER around our columns. By the way, SQL Server string searches are case insensitive by default, we don’t need those functions at all.
By the way, this is the same mistake we make when comparing DateTime in our SQL queries.
To read more SQL Server content, check what implicit conversions are and why you should care, how to optimize queries with GROUP BY, and just listen to SQL Server index recommendations.
Happy SQL time!