TIL: T-SQL doesn't have constants and variables aren't a good idea

Today I learned how to use constants in SQL Server stored procedures. While getting a stored procedure reviewed, I got one comment to remove literal values. This is how to bring constants in T-SQL.

SQL Server doesn’t have a keyword for constants. To introduce constants in stored procedures, write literal values next to an explaining comment or use single-row views with the constant values as columns.

1. Don’t use variables as constants

From C# and other programming languages, we’ve learned to use constants or enums instead of magic values all over our code. Often, we would like to bring constants to our T-SQL queries. But…

T-SQL doesn’t have a keyword for constants. And SQL Server engine doesn’t inline variables when executing stored procedures.

The first thing we try by mistake to emulate constants is to use variables.

For example, let’s find all StackOverflow users with two reputation points. That’s not a popular reputation among StackOverflow users. We write something like this,

/* An index to speed things up a bit */
CREATE INDEX Reputation ON dbo.Users(Reputation)
GO

CREATE OR ALTER PROC dbo.usp_GetUsers
AS
BEGIN
    /* This is how we often emulate constants */
    DECLARE @Reputation INT = 2;

    SELECT *
    FROM dbo.Users u
    WHERE u.Reputation = @Reputation
    ORDER BY u.CreationDate DESC;
END
GO

This is the execution plan. Let’s keep an eye on the number of estimated users.

StackOverflow users with reputation = 2
Execution plan of finding users with 2-point reputation

But, there’s a downside. Variables inside stored procedures trigger a different behavior in SQL Server.

Variables and execution plans

When executing a stored procedure, SQL Server creates an execution plan for the first set of parameters it sees. And, the next time we run that stored procedure, SQL Server reuses the same execution plan, even if we use different parameters. We call this behavior Parameter Sniffing.

SQL Server uses statistics (histograms built from samples of our data) to choose the shape of execution plans. SQL Server has to choose the first table to read, the number of threads, and the amount of memory, among other things.

But, when there are variables in a stored procedure, SQL Server builds execution plans, not from statistics (samples of our data), but from an “average value.”

Variables make SQL Server build different execution plans, probably not suited for the set of parameters we’re calling our stored prcedures with. That’s why variables aren’t a good idea to replace constants.

2. Literal values and comments

The simplest solution to constants in T-SQL is to use literal values.

To make stored procedures more maintainable, it’s a good idea to write an explaining comment next to the literal value.

Let’s rewrite our stored procedure with a literal and a comment.

CREATE OR ALTER PROC dbo.usp_GetUsers
AS
BEGIN
    SELECT DisplayName, Location, CreationDate
    FROM dbo.Users u
    WHERE u.Reputation = /* Interesting reputation */2
    ORDER BY u.CreationDate DESC;
END
GO

This is the execution plan.

StackOverflow users with reputation = 2
This time, we're back to a literal value and a comment

Do you remember the estimated number of users from our example with variables? It was 123 users. Now, we have a more accurate estimated number. It’s 1,854 users. SQL Server isn’t using an average value anymore. It has better estimates this time!

We even have an index recommendation in our execution plan. By the way, don’t blindly follow index recommendations, just listening to them. They’re only a list of columns to consider indexing.

3. Create a view for constants

The hardcoded value and an explanatory comment are OK if we have our “constant” in a few places.

A more maintainable solution to literal values is to create a single-row view with columns named after the constants to declare.

CREATE OR ALTER VIEW dbo.vw_Constant
AS
SELECT (2) InterestingReputation
GO

With that view in place, we can replace the hardcoded values in our stored procedure.

CREATE OR ALTER PROC dbo.usp_GetUsers
AS
BEGIN
    SELECT *
    FROM dbo.Users u
    /* The view with our constant */
    INNER JOIN dbo.vw_Constant c 
    ON u.Reputation = c.InterestingReputation
    ORDER BY u.CreationDate DESC;
END
GO

A more maintainable alternative while keeping good estimates.

Voilà! That’s how to use constants with a view in SQL Server. I found a proposal to introduce a constant keyword in SQL Server. I learned about the trick with views from this StackOverflow question and in this one too.

For more content on SQL Server, check my other posts on functions and WHERE clauses, implicit conversions and case-sensitive searches.

Happy coding!

SQL Server Index recommendations: Just listen to them

I guess you have seen SQL Server index recommendations on actual execution plans. But, you shouldn’t take them too seriously. This is what I learned about SQL Server index recommendations.

SQL Server builds index recommendations based on the WHERE and SELECT clauses of a query, without considering GROUP BY or ORDER BY clauses. Use index recommendations as an starting point to craft better indexes.

What’s a nonclustered index anyways?

If you’re wondering… A nonclustered index is a redundant, sorted, and smaller copy of one table to make queries go faster.

Imagine you want to find a particular book in a library. Would you go shelve by shelve, book by book until you find it? Or would you use the library catalog to go directly to the one you want? Well, these days, I guess libraries have software for that. But that’s the same idea. That library catalog or reference software works like an index.

After that aside…

Rijksmuseum, Amsterdam, Netherlands
Rijksmuseum, Amsterdam, Netherlands. Photo by Will van Wingerden on Unsplash

The next time you see an index recommendation on actual execution plans or from the Tuning Advisor, don’t rush to create it. Just listen to them!

To prove why we shouldn’t blindly create every index recommendation, let’s use the StackOverflow database to write queries and indexes.

Index recommendations and Scans

Let’s start with no indexes and a simple query to find all users from Colombia. Let’s check the actual execution plan.

SELECT Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Location = 'Colombia'

This is the execution plan. Do you see any index recommendations? Nope!

Users from Colombia with no index recommendation
Execution plan of finding all users from Colombia

If SQL Server has to scan the object, the actual execution plan won’t show any index recommendations.

Now, let’s change the query a bit. Let’s find the first 1,000 users from Colombia ordered by Reputation instead.

SELECT TOP 1000 Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Location = 'Colombia'
ORDER BY Reputation DESC

Now, we have the missing index recommendation.

Users from Colombia with an index recommendation
Ladies and gentlemen, now the index recommendation

For that query, SQL Server suggests an index on Location including DisplayName and Reputation.

Recommeded index for Users from Colombia
The recommended index

Indexes aren’t sorted by included columns. Indexes might have some columns appended or “included” to avoid looking back to the table to access them.

Index recommendations including all the columns

To point out the next reason not to blindly add recommended indexes, let’s change our query to bring all columns instead of four of them.

Don’t write SELECT * queries, by the way.

SELECT TOP 1000 *
FROM dbo.Users
WHERE Location = 'Colombia'
ORDER BY Reputation DESC

Let’s see what the execution plan looks like.

Execution plan for a 'SELECT *'
Execution plan for a 'SELECT *'

At first glance, the plan looks similar. But let’s focus on what changed on the recommended index. Here it is.

Recommended index with all columns in the Users table
Recommended index with all columns in the Users table

SQL Server recommended an index with all the columns in the table, even NVARCHAR(MAX) columns. Arrrggg!

Often, SQL Server recommends adding all the columns from the table into the INCLUDE part of indexes.

Indexes aren’t free. They take disk space. Even included columns take disk space. The more keys and included columns, the bigger the indexes get and the longer SQL Server will hold locks to insert, update, and delete data.

Index Recommendations and ORDER BY’s

The next thing to know about index recommendations has to do with the keys in the index.

SQL Server index recommendations are based on the WHERE and SELECT clauses. SQL Server doesn’t use GROUP BY or ORDER BY clauses to build index recommendations.

For our last query, let’s add the recommended index (without any included columns) and another one with the ORDER BY in mind. These are the two new indexes,

/* This is the recommended one */
CREATE INDEX Location ON dbo.Users(Location);

/* This one has Reputation, which is on the ORDER BY */
CREATE INDEX Location_Reputation ON dbo.Users(Location, Reputation);
GO

After creating these indexes, let’s run our query again,

SELECT TOP 1000 Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Location = 'Colombia'
ORDER BY Reputation DESC
GO

This time, the execution plan looks like this,

Recommended index with all columns in the Users table
Recommended index with all columns in the Users table

SQL Server recommended one index but used another, even when the recommended index was in place.

SQL Server only looks at the WHERE and SELECT clauses of queries to build recommendations. We can create better indexes than the recommended ones for queries with ORDER BY and GROUP BY clauses.

Next, let’s dig into the order of keys in recommended indexes.

Keys on recommended indexes are based on equality and inequality comparisons on the WHERE clause. Columns with equality comparisons are shown first, followed by columns with inequality comparisons.

Let’s add another comparison to our sample query. This time, let’s look for users from Colombia with more than 10 reputation points.

SELECT TOP 1000 Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Reputation > 10
AND Location = 'Colombia'
ORDER BY Reputation DESC

Let’s check the recommended index on the execution plan.

Recommended index with all columns in the Users table
Recommended index on Location followed by Reputation

The recommended index contains the Location column first, then the Reputation column. But, in the query, the filter on Reputation was first. What happened here?

SQL Server builds recommended indexes on equality comparisons followed by inequality comparisons. That’s why an apparent mismatch in the order of keys on the index and filters on the query.

Last thing about recommended indexes.

Index recommendations don’t take into account existing indexes.

Let’s check our existing indexes and see if we can combine the recommended indexes with our existing ones. If our existing indexes overlap with the recommended ones, let’s drop the old ones.

Let’s build as few indexes as possible to support our queries. Let’s keep around 5 indexes per table with around 5 columns per index.

Parting words

Voilà! These are some of the things I learned about SQL Server index recommendations. Remember, indexes aren’t free. The more indexes we add, the slower our queries will get.

Next time we see index recommendations on our execution plans, let’s check if we already have a similar index and modify that one. If we don’t, please let’s remember to at least change the recommended index’s name and not to include all the columns of our table.

I learned these things following Brent Ozar’s Master Index Tuning class. Great class!

For more content on SQL Server, check how to do case sensitive searches, how to optimize GROUP BY queries and what implicit conversions are.

Monday Links: Going solo, making friends and AutoMapper

11 things I wish I knew before I went Independent

A good read for all of us wanting to go solo. The author answers the question: “how you made the decision to go independent and how you got started?”. Among other things: find a niche, learn about sales and marketing, and talk to your customers… Definitively we don’t have to be the “I do anything for anyone” kind of guys. Read full article

“If you don’t want to be just another pair of hands, the first thing you need to figure out is: Who do you want to serve?”

How to build remote teams properly

I remember my first day at one past job. I spent like one week going to the office and doing nothing. I couldn’t even touch a computer until someone, who was on vacation, showed me all the security policies.

A clear onboarding is much more important these days of work-from-home than before. Have a standardized onboarding process and assign a buddy to the newcomers. Onboarding is only one step to building remote teams. Read full article

vintage cafe scene
Let's have a coffee and be friends...Photo by Les Anderson on Unsplash

This Is How To Make Friends As An Adult

Let’s switch gears for this one. Speaking of being adults and making friends…“being ‘close’ means a text message twice a year.” That got me reading. Reconnect with old friends, listen to people and ask them to tell more, check in every two weeks, and join a group. Read full article

Why I Decided to Ditch AutoMapper

This is an oldie. I have a love-and-hate relationship with AutoMapper. I can’t debug and easily find where the mappings are. Maybe, because in one of my client’s projects we use AutoMapper when we shouldn’t.

I liked that the AutoMapper’s author replied in the comments. He wrote, “use AutoMapper if you can Auto-Map 75% or more in a given mapping. If not, don’t use it. Just map things manually.” Read full article

Voilà! Another Monday Links, a bit diverse this time. How is the onboarding process at your job? Do you use AutoMapper? If not, what do you use instead? Stay tuned to the upcoming Monday Links.

In the meantime, check my Unit Testing 101 series. Don’t miss the previous Monday Links on Better programming, flags, and C# and Daily Meetings, Estimates, and Challenges.

TIL: How to optimize Group by queries in SQL Server

Let me share this technique I learned to improve queries with GROUP BY in SQL Server.

To improve queries with GROUP BY, write the SELECT query with the GROUP BY part using only the needed columns to do the grouping or sorting inside a common table expression (CTE) first. Then, join the CTE with the right tables to find other columns.

Usual GROUP BY: Find StackOverflow most down-voted questions

Let’s use this technique to tune the store procedure to find most down-voted questions on StackOverflow.

Here’s the store procedure. Let’s fire our local copy of the StackOverflow 2013 database to run it.

CREATE OR ALTER PROC dbo.MostDownVotedQuestions AS
BEGIN
    select top 20 count(v.PostId) as 'Vote count', v.PostId AS [Post Link],p.Body
    from Votes v 
    inner join Posts p on p.Id=v.PostId
    where PostTypeId = 1 and v.VoteTypeId=3
    group by v.PostId, p.Body
    order by 'Vote count' desc
END
GO

I ran this stored procedure on my local machine five times without any indexes. It took about 2 seconds each time. On my machine, SQL Server only had 8GB of RAM. Remember, by default SQL Server uses all available RAM.

This is the execution plan. Let’s notice the Clustered Index Seek on the dbo.Posts table. And, yes, SQL Server is recommending an index. But we’re not adding it.

StackOverflow most down-votes questions
StackOverflow most down-votes questions

Then, these are the metrics grabbed with sp_BlitzCache from the First Responder Kit. This stored procedure finds all the most CPU intensive queries SQL Server has recently executed.

Most CPU intensive queries
Most CPU intensive queries

To find the most down-voted questions, our stored procedure is grouping by Body. And, that’s an NVARCHAR(MAX) column, the actual content of StackOverflow posts.

Sorting and grouping on large data types is a CPU expensive operation.

Spoons full of Indian spices
Photo by Pratiksha Mohanty on Unsplash

Group and order inside CTE’s first

To improve queries with GROUP BY, group inside a common table expression (CTE) with only the required columns in the grouping. For example, IDs or columns covered by indexes. Then, join the CTE with the right tables to find other columns.

After grouping only by PostId inside a CTE first, our stored procedure looks like this,

CREATE OR ALTER PROC dbo.MostDownVotedQuestions AS
BEGIN
    WITH MostDownVoted AS (
        select top 20 count(v.PostId) as VoteCount, v.PostId /* We removed the Body column */
        from Votes v 
        inner join Posts p on p.Id=v.PostId
        where PostTypeId = 1 and v.VoteTypeId=3
        group by v.PostId /* Also, from here */
        order by VoteCount desc
    )
    select VoteCount as 'Vote count', d.PostId AS [Post Link], p.Body
    from MostDownVoted d
    inner join Posts p on p.Id = d.PostId
END
GO

This time, we are excluding the Body column from the GROUP BY part. Then, we are joining the MostDownVotes CTE to the dbo.Post table to show only the Body of the 20 resulting posts.

Again, this is the execution plan of grouping inside a CTE first.

StackOverflow most down-votes questions
StackOverflow most down-votes questions grouping inside a CTE

Notice the Clustered Index Seek operator on the left branch. That’s to find the body of only the 20 post SQL Server found as a result of grouping inside the CTE. This time, SQL Server is grouping and sorting fewer data. It made our stored procedure use less CPU time.

Let’s take another look at sp_BlitzCache. Before running the modified version of our store procedure five times, I ran DBCC FREEPROCCACHE to free up SQL Server’s plan cache.

Most CPU intensive queries
Most CPU intensive queries with our modified stored proc

Notice the “Total CPU” and “Avg CPU” columns, we’re using less CPU time after the change. I went from 36.151ms of total CPU time to 35.505ms. Hooray!

Now, imagine if that store procedure runs not only 5 times, but multiple times per minute. What if our stored procedure feeds a reporting screen in our app? That change with a CTE will make a huge difference in the overral CPU usage.

Voilà! That’s how we can improve queries with GROUP BY. Remember, group and sort inside CTE’s to take advantage of existing indexes and avoid expensive sorting operations. Use this technique with OFFSET/FETCH, SUM, AVG, and other group functions.

I learned this technique following Brent Ozar’s Mastering Query Tuning class.

For more SQL Server content, check don’t write functions around columns in WHERE, what are implicit conversions?, and How to do a case-sensitive search in SQL Server.

Happy coding!

TIL: How to do a case-sensitive search in SQL Server

Do you use LOWER or UPPER to do case-sensitive searches? Let’s see how to write a case-sensitive search in SQL Server.

To write a case-sensitive search in SQL Server, don’t use UPPER or LOWER functions around the column to filter. Instead, use the COLLATE keyword with a case-sensitive collation followed by the comparison.

Often by mistake, to do a case-sensitive search, we wrap a column around LOWER(). Something like this,

SELECT TOP 50 DisplayName, Location, CreationDate, Reputation
FROM dbo.Users
WHERE LOWER(DisplayName) LIKE 'john%'
--    ^^^^^
ORDER BY Reputation DESC;
GO

We tried to find the first 50 StackOverflow users with DisplayName containing lowercase ‘john’.

But, interestingly enough, some results don’t match our intended filter. They include both lowercase and uppercase ‘john’.

Result of finding users by wrapping a column with LOWER
Naive case insensitive search using LOWER

It would be the same if we use UPPER() instead. In general, let’s not use functions around columns in our WHEREs. That’s a common bad practice.

scrabble pieces
Photo by Brett Jordan on Unsplash

Collation and case sensitivity

In SQL Server, collations provide sorting rules and case and accent sensitivity for our data. For example, when we use an ORDER BY, the sort order of our results depends on the database collation.

In SQL Server Management Studio, we find a database collation after clicking on the database Properties and then on General. It’s under the Maintenance section.

Here’s mine.

SQL Server database properties
SQL Server database properties

The default collation for English is SQL_Latin1_General_CP1_CI_AS. This is a case-insensitive collation.

In collation names, CI means case insensitive. And, AS means accent sensitive.

That’s why when we wrote LOWER(DisplayName), SQL Server showed uppercase and lowercase results. SQL Server used the default collation, which was a case-insensitive one.

For more details about collations, check Microsoft docs on collations and Unicode support.

Case sensitive searches in WHERE clause

For case-sensitive searches in SQL Server, use COLLATE keyword with the case sensitive collation “SQL_Latin1_General_CP1_CS_AS” followed by the LIKE or = comparisons as usual.

When using the SQL_Latin1_General_CP1_CS_AS collation, a is different from A, and à is different from á. It’s both case and accent-sensitive.

Let’s rewrite our query with COLLATE,

SELECT TOP 50 DisplayName, Location, CreationDate, Reputation
FROM dbo.Users
WHERE DisplayName COLLATE SQL_Latin1_General_CP1_CS_AS LIKE '%john'
ORDER BY Reputation DESC;
GO

This time, we have the results we were expecting. Only users with lowercase ‘john’ in their display name.

Query to find StackOverflow users with COLLATE
Case sensitive query with COLLATE

Voilà! That’s how we can write case-sensitive searches in SQL Server. Remember, don’t use LOWER or UPPER. They won’t work for case-sensitive searches. Use a different collation instead.

For more content on SQL Server, check what are implicit conversions and why you should care, how to optimize GROUP BY queries, and just listen to index recommendations.

Happy coding!