Monday Links: Blog, Error Messages and Recruiters

Toxic Culture Is Driving the Great Resignation

I have read a couple of times about the Great Resignation. I guess the world isn’t the place after the pandemic. And that is reflected in the job market too. “A toxic corporate culture is the single best predictor of which companies suffered from high attrition in the first six months of the Great Resignation.” Read full article

What’s in a Good Error Message?

I don’t know how many times I have debugged the “The given key was not present in the dictionary” error. In fact, I wrote two C# idioms to avoid exceptions when working with dictionaries. From the article, to write better error messages: give context, show the actual error, and tell how to mitigate it. Read full article

Why don’t I have a blog?

I have always heard: “start your own blog.” Well, I started my own blog…you’re reading it. But this is a post about the opposite: why not have a blog. The author points out is that posts only rephrase Reddit or StackOverflow’s comments. But, I learned from Show Your Work that nothing is completely original. We all have some sort of inspiration. And, precisely that, and our own voice make our blogs unique. Read full article

office workers circa 1940s
Photo by Museums Victoria on Unsplash

Learn By Wrapping

The best advice to learn something is to learn by doing. Finding project ideas is often difficult. Another TODO app, really? This article shares a learning activity: Write a wrapper for a library or an API you’re already familiar with. Instead of jumping to the documentation, start writing a wrapper in the language you want to learn. Read full article

Career Advice Nobody Gave Me: Never Ignore a Recruiter

These days, anyone with a LinkedIn profile and “Software engineer” in the headline is getting lots of spam messages and connection requests. I bet you already got a message between these lines: “Hi X, I have one company interested in your profile. Are you available for a quick call to share more details? Please, send me your CV at this email.”

This article shares a template to reply back to these spammy messages. If you use it, you will be asking for a company name, seniority, and compensation before starting any conversation. That would be enough feedback for recruiters too. Read full article

Voilà! Another five reads! Do you answer spammy messages or connection requests on LinkedIn? Do you have a template to answer recruiters? What strategies do you use to learn new programming languages?

Are you interested in unit testing? Check my Unit Testing 101 series. Don’t miss the previous Monday Links on Going solo, Making Friends and AutoMapper.

BugOfTheDay: Object definitions, spaces, and checksums

These days I was working with a database migration tool. This is what I learned after debugging an issue for almost an entire day.

In one of my client’s projects to create or update stored procedures, we use a custom migrator tool. A wrapper on top of DbUp. I’ve already written about Simple.Migrator, a similar tool.

To avoid updating the wrong version of a stored procedure we rely on checksums. Before updating a stored procedure, we calculate the checksum of its definition at the database using a command-line tool.

How to find object definitions in SQL Server?

By the way… To find the text of a stored procedure in SQL Server, use the OBJECT_DEFINITION() function with the object id of the stored procedure.

Like this,

SELECT OBJECT_DEFINITION(OBJECT_ID('dbo.MyCoolStoredProc'))
GO

Checksums and CREATE statements

With the checksum of the stored procedure to update, we write a header comment in the new script file. Something like,

/*
Checksum: "A-SHA1-HERE-OF-THE-PREVIOUS-STORED-PROC"
*/
ALTER PROC dbo.MyCoolStoredProc
BEGIN
  ...
END

To upgrade the database, the migrator compares the checksums of objects in the database with the checksums in header comments. If they’re different, the migrator displays a warning message and stops the upgrade.

Here comes the funny part. When I ran the migrator on my local machine, it always reported a difference. Even when I was grabbing the checksum from the migrator tool itself. Arrggg!

After debugging for a while and isolating the problem I found something. On the previous script for the same stored procedure, I started the script with CREATE OR ALTER PROC. There’s nothing wrong with that.

But, there’s a difference in the object definitions of a stored procedure created with CREATE and with CREATE OR ALTER.

CREATE PROC vs CREATE OR ALTER PROC

Let me show you an example. We’re creating the same stored procedure with CREATE and CREATE OR ALTER to see its object definition.

/* With just CREATE */
CREATE PROC dbo.test
AS
SELECT 1
GO

SELECT LEN(OBJECT_DEFINITION(OBJECT_ID('dbo.Test'))) AS Length, OBJECT_DEFINITION(OBJECT_ID('dbo.Test')) AS Text
GO

/* What about CREATE OR ALTER? */
CREATE OR ALTER PROC dbo.test
AS
SELECT 1
GO

SELECT LEN(OBJECT_DEFINITION(OBJECT_ID('dbo.Test'))) AS Length, OBJECT_DEFINITION(OBJECT_ID('dbo.Test')) AS Text
GO

Here’s the output.

SQL Server object definitions
Object definition of a stored procedure with CREATE and CREATE OR ALTER

Notice the length of the two object definitions. They’re different! Some spaces were making my life harder.

The migrator compared checksums of the object definition from the database and the one in the header comment. They were different in some spaces.

I made the mistake of writing CREATE OR ALTER, and the migrator didn’t take into account spaces in object names before creating checksums. I had to rewrite the previous script to use ALTER and recreate the checksums.

Parting thoughts

But, what’s in this story for you? We should create processes to prevent mistakes in the first place. Scripts to make sure developers commit the code formatted properly. Checks to avoid applying data migrations to the wrong environment. Extensions or plugins to follow naming conventions. Scripts to install the right tools and dependencies to run a project. Up to date documentation for internal tools.

Often, code reviews aren’t enough to enforce conventions. We’re humans, and we all make mistakes. And, the more code someone reviews in a session, the more tired he will get. And, the more reviewers we add, the less effective the process gets.

Voilà! That’s what I learned these days: read object definitions from SQL Server, polish my debugging skills and build processes around our everyday development practice.

For more SQL Server content, check how to do case sensitive searches, when to follow index recommendations and how to write constants in stored procedures.

Happy debugging!

Working with ASP.NET Core IDistributedCache Provider for NCache

As we learned last time, when I covered in-memory caching with ASP.NET Core, a cache is a storage layer between an application and an external resource (a database, for example) used to speed up future requests to that resource. In this post, let’s use ASP.NET Core IDistributedCache abstractions to write a data caching layer using NCache.

What’s NCache?

From NCache official page, “NCache is an Open Source in-memory distributed cache for .NET, Java, and Node.js applications.”

Among other things, we can use NCache as a database cache, NHibernate 2nd-level cache, Entity Framework cache, and web cache for sessions and responses.

NCache comes in three editions: Open Source, Professional, and Enterprise. The Open Source version supports up to two nodes and its cache server is only available for .NET Framework version 4.8. For a complete list of differences, check NCache edition comparison.

One of the NCache key features is performance. Based on their own benchmarks, “NCache can linearly scale to achieve 2 million operations per second with a 5-server cache cluster.”

How to install NCache on a Windows machine?

Let’s see how to install an NCache server on a Windows machine. For this, we need a Windows installer and have a trial license key. Let’s install NCache Enterprise edition, version 5.2 SP1.

After running the installer, we need to select the installation type from three options: Cache server, remote client, and Developer/QA. Let’s choose Cache Server.

NCache Installation Types
Cache Server Installation Type

Then, we need to enter a license key. Let’s make sure to have a license key for the same version we’re installing. Otherwise, we will get an “invalid license key” error. We receive the license key in a message sent to the email address we used during registration.

NCache License Key
Enter NCache license key

Next, we need to enter the full name, email, and organization we used to register ourselves while requesting the trial license.

Enter User Information
Enter User Information

Then, we need to select an IP address to bind our NCache server to. Let’s stick to the defaults.

Configure IP Binding
Configure IP Binding

Next, we need to choose an account to run NCache. Let’s use the Local System Account.

Account to run NCache
Account to run NCache

Once the installation finishes, our default browser will open with the Web Manager. By default, NCache has a default cache named demoCache.

NCache Web Manager
NCache Web Manager

Next time, we can fire the Web Manager by navigating to http://localhost:8251.

NCache’s official site recommends a minimum of two servers for redundancy purposes. But, for our sample app, let’s use a single-node server for testing purposes.

So far, we have covered the installation instructions for a Windows machine. But, we can also install NCache in Linux and Docker containers. And, we can use NCache as virtual machines in Azure and AWS.

Storage unit
A cache is a fast storage unit. Photo by Jezael Melgoza on Unsplash

How to add and retrieve data from an NCache cache?

Now, we’re ready to start using our NCache server from a .NET app. In Visual Studio, let’s create a solution with a .NET 6 “MSTest Test Project” and a class file to learn the basic caching operations with NCache.

Connecting to an NCache cache

Before connecting to our NCache server, we need to first install the client NuGet package: Alachisoft.NCache.SDK. Let’s use the latest version: 5.2.1.

To start a connection, we need the GetCache() method with a cache name. For our sample app, let’s use the default cache: demoCache.

Let’s start writing a test to add and retrieve movies from a cache.

using Alachisoft.NCache.Client;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace NCacheDemo.Tests;

[TestClass]
public class NCacheTests
{
    [TestMethod]
    public void AddItem()
    {
        var cacheName = "demoCache";
        ICache cache = CacheManager.GetCache(cacheName);

		// We will fill in the details later
    }
}

If you’re new to unit testing, start looking at how to write your first unit test in C# with MSTest.

Notice we didn’t have to use a connection string to connect to our cache. We only used a cache name. The same one as in the Web Manager: demoCache.

NCache uses a client.ncconf file instead of connection strings. We can define this file at the application or installation level. For our tests, we’re relying on the configuration file at the installation level. That’s why we only needed the cache name.

Adding items

To add a new item to the cache, we need to use the Add() and AddAsync() methods with a key and a CacheItem to cache. The key is an identifier and the item is a wrapper for the object to cache.

Every item to cache needs an expiration. The CacheItem has an Expiration property for that.

There are two basic expiration types: Absolute and Sliding.

A cached item with Absolute expiration expires after a given time. Let’s say, a few seconds. But, an item with Sliding expiration gets renewed every time it’s accessed. If within the sliding time, we don’t retrieve the item, it expires.

Let’s update our test to add a movie to our cache.

using Alachisoft.NCache.Client;
using Alachisoft.NCache.Runtime.Caching;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System;
using System.Threading.Tasks;

namespace NCacheDemo.Tests;

[TestClass]
public class NCacheTests
{
    private const string CacheName = "demoCache";

    [TestMethod]
    public async Task AddItem()
    {
        var movie = new Movie(1, "Titanic");
        var cacheKey = movie.ToCacheKey();
        var cacheItem = ToCacheItem(movie);

        ICache cache = CacheManager.GetCache(CacheName);
        // Let's add Titanic to the cache...
        await cache.AddAsync(cacheKey, cacheItem);

        // We will fill in the details later
    }

    private CacheItem ToCacheItem(Movie movie)
        => new CacheItem(movie)
        {
            Expiration = new Expiration(ExpirationType.Absolute, TimeSpan.FromSeconds(1))
        };
}

[Serializable]
public record Movie(int Id, string Name)
{
    public string ToCacheKey()
        => $"{nameof(Movie)}:{Id}";
}

Notice, we used two helper methods: ToCacheKey() to create the key from every movie and ToCacheItem() to create a cache item from a movie.

We used records from C# 9.0 to create our Movie class. Also, we needed to annotate it with the [Serializable] attribute.

Retrieving items

After adding items, let’s retrieve them. For this, we need the Get<T>() method with a key.

Let’s complete our first unit test to retrieve the object we added.

using Alachisoft.NCache.Client;
using Alachisoft.NCache.Runtime.Caching;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System;
using System.Threading.Tasks;

namespace NCacheDemo.Tests;

[TestClass]
public class NCacheTests
{
    private const string CacheName = "demoCache";

    [TestMethod]
    public async Task AddItem()
    {
        var movie = new Movie(1, "Titanic");
        var cacheKey = movie.ToCacheKey();
        var cacheItem = ToCacheItem(movie);

        ICache cache = CacheManager.GetCache(CacheName);
        await cache.AddAsync(cacheKey, cacheItem);

        // Let's bring Titanic back...
        var cachedMovie = cache.Get<Movie>(cacheKey);
        Assert.AreEqual(movie, cachedMovie);
    }

    private CacheItem ToCacheItem(Movie movie)
        => new CacheItem(movie)
        {
            Expiration = new Expiration(ExpirationType.Absolute, TimeSpan.FromSeconds(1))
        };
}

[Serializable]
public record Movie(int Id, string Name)
{
    public string ToCacheKey()
        => $"{nameof(Movie)}:{Id}";
}

Updating items

If we try to add an item with the same key using the Add() or AddAsync() methods, they will throw an OperationFailedException. Try to add a unit test to prove that.

To either add a new item or update an existing one, we should use the Insert() or InserAsync() methods instead. Let’s use them in another test.

using Alachisoft.NCache.Client;
using Alachisoft.NCache.Runtime.Caching;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System;
using System.Threading.Tasks;

namespace NCacheDemo.Tests;

[TestClass]
public class NCacheTests
{
    // Our previous test is the same
    
    [TestMethod]
    public async Task UpdateItem()
    {
        ICache cache = CacheManager.GetCache(CacheName);

        var movie = new Movie(2, "5th Element");
        var cacheKey = movie.ToCacheKey();
        var cacheItem = ToCacheItem(movie);
        // Let's add the 5th Element here...
        await cache.AddAsync(cacheKey, cacheItem);

        var updatedMovie = new Movie(2, "Fifth Element");
        var updatedCacheItem = ToCacheItem(updatedMovie);
        // There's already a cache item with the same key...
        await cache.InsertAsync(cacheKey, updatedCacheItem);

        var cachedMovie = cache.Get<Movie>(cacheKey);
        Assert.AreEqual(updatedMovie, cachedMovie);
    }

    // Rest of the file...
}

Notice we used the InsertAsync() method to add an item with the same key. When we retrieved it, it contained the updated version of the item.

There’s another basic method: Remove() and RemoveAsync(). We can guess what they do. Again, try to write a test to prove that.

How to use ASP.NET Core IDistributedCache with NCache?

Up to this point, we have NCache installed and know how to add, retrieve, update, and remove items.

Let’s revisit our sample application from our post about using a Redis-powered cache layer.

Let’s remember the example from that last post. We had an endpoint that uses a service to access a database, but it takes a couple of seconds to complete. Let’s think of retrieving complex object graphs or doing some computations with the data before returning it.

Something like this,

using DistributedCacheWithNCache.Responses;

namespace DistributedCacheWithNCache.Services;

public class SettingsService
{
    public async Task<SettingsResponse> GetAsync(int propertyId)
    {
        // Beep, boop...Aligning satellites...
        await Task.Delay(3 * 1000);

        return new SettingsResponse
        {
            PropertyId = propertyId,
            Value = "Anything"
        };
    }
}

Notice we emulated a database call with a 3-second delay.

Also, we wrote a set of extensions methods on top of the IDistributedCache to add and retrieve objects from a cache.

There were the extension methods we wrote last time,

using Microsoft.Extensions.Caching.Distributed;
using Newtonsoft.Json;

namespace DistributedCacheWithNCache.Services;

public static class DistributedCacheExtensions
{
    public static readonly DistributedCacheEntryOptions DefaultDistributedCacheEntryOptions
        = new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromSeconds(60),
            SlidingExpiration = TimeSpan.FromSeconds(10),
        };

    public static async Task<TObject> GetOrSetValueAsync<TObject>(this IDistributedCache cache,
                                                                  string key,
                                                                  Func<Task<TObject>> factory,
                                                                  DistributedCacheEntryOptions options = null)
        where TObject : class
    {
        var result = await cache.GetValueAsync<TObject>(key);
        if (result != null)
        {
            return result;
        }

        result = await factory();

        await cache.SetValueAsync(key, result, options);

        return result;
    }

    private static async Task<TObject> GetValueAsync<TObject>(this IDistributedCache cache,
                                                              string key)
        where TObject : class
    {
        var data = await cache.GetStringAsync(key);
        if (data == null)
        {
            return default;
        }

        return JsonConvert.DeserializeObject<TObject>(data);
    }

    private static async Task SetValueAsync<TObject>(this IDistributedCache cache,
                                                     string key,
                                                     TObject value,
                                                     DistributedCacheEntryOptions options = null)
        where TObject : class
    {
        var data = JsonConvert.SerializeObject(value);

        await cache.SetStringAsync(key, data, options ?? DefaultDistributedCacheEntryOptions);
    }
}

Notice we used Newtonsoft.Json to serialize and deserialize objects.

NCache and the IDistributedCache interface

Now, let’s use a .NET 6 “ASP.NET Core Web App,” those extension methods on top of IDistributedCache, and NCache to speed up the SettingsService.

First, we need to install the NuGet package NCache.Microsoft.Extensions.Caching. This package implements the IDistributedCache interface using NCache, of course.

After installing that NuGet package, we need to add the cache into the ASP.NET dependencies container in the Program.cs file. To achieve this, we need the AddNCacheDistributedCache() method.

// Program.cs
using Alachisoft.NCache.Caching.Distributed;
using DistributedCacheWithNCache;
using DistributedCacheWithNCache.Services;

var (builder, services) = WebApplication.CreateBuilder(args);

services.AddControllers();
// We add the NCache implementation here...
services.AddNCacheDistributedCache((options) =>
{
    options.CacheName = "demoCache";
    options.EnableLogs = true;
    options.ExceptionsEnabled = true;
});
services.AddTransient<SettingsService>();

var app = builder.Build();
app.MapControllers();
app.Run();

Notice, we continued to use the same cache name: demoCache. And, also we relied on a Deconstruct method to have the builder and services variables deconstructed. I took this idea from Khalid Abuhakmeh’s Adding Clarity To .NET Minimal Hosting APIs.

Back in the SettingsService, we can use the IDistributedCache interface injected into the constructor and the extension methods in the DistributedCacheExtensions class. Like this,

using DistributedCacheWithNCache.Responses;
using Microsoft.Extensions.Caching.Distributed;

namespace DistributedCacheWithNCache.Services
{
    public class SettingsService
    {
        private readonly IDistributedCache _cache;

        public SettingsService(IDistributedCache cache)
        {
            _cache = cache;
        }

        public async Task<SettingsResponse> GetAsync(int propertyId)
        {
            var key = $"{nameof(propertyId)}:{propertyId}";
            // Here we wrap the GetSettingsAsync method around the cache logic
            return await _cache.GetOrSetValueAsync(key, async () => await GetSettingsAsync(propertyId));
        }

        private static async Task<SettingsResponse> GetSettingsAsync(int propertyId)
        {
            // Beep, boop...Aligning satellites...
            await Task.Delay(3 * 1000);

            return new SettingsResponse
            {
                PropertyId = propertyId,
                Value = "Anything"
            };
        }
    }
}

Notice, we wrapped the GetSettingsAsync() with actual retrieving logic around the caching logic in the GetOrSetValueAsync(). At some point, we will have the same data in our caching and storage layers.

With the caching in place, if we hit one endpoint that uses that service, we will see faster response times after the first call delayed by 3 seconds.

Faster response times after using NCache
A few miliseconds reading it from NCache

Also, if we go back to NCache Web Manager, we should see some activity in the server.

NCache Dashboard
NCache Dashboard showing our first request

In this scenario, all the logic to add and retrieve items is abstracted behind the IDistributedCache interface. That’s why we didn’t need to directly call the Add() or Get<T>() method. Although, if we take a look at the NCache source code, we will find those methods here and here.

Voilà! That’s NCache and how to use it with the IDistributedCache interface. With NCache, we have a distributed cache server with few configurations and a dashboard out-of-the-box. Also, we can add all the caching logic into decorators and have our services as clean as possible.

To follow along with the code we wrote in this post, check my Ncache Demo repository over on GitHub.

canro91/NCacheDemo - GitHub

To read more content, check my Unit Testing 101 series to learn from how to write your first unit tests to mocks.

I wrote this post in collaboration with Alachisoft, NCache creators.

Happy coding!

TIL: T-SQL doesn't have constants and variables aren't a good idea

Today I learned how to use constants in SQL Server stored procedures. While getting a stored procedure reviewed, I got one comment to remove literal values. This is how to bring constants in T-SQL.

SQL Server doesn’t have a keyword for constants. To introduce constants in stored procedures, write literal values next to an explaining comment or use single-row views with the constant values as columns.

Don’t use variables as constants

From C# and other programming languages, we’ve learned to use constants or enums instead of magic values all over our code. Often, we bring constants to our T-SQL queries. But…

T-SQL doesn’t have a keyword for constants. And, SQL Server engine doesn’t inline variables when executing stored procedures.

The first thing we try by mistake to emulate constants is to use variables.

For example, let’s find all StackOverflow users with two reputation points. That’s not a popular reputation among StackOverflow users. Something like this,

/* An index to speed things up a bit */
CREATE INDEX Reputation ON dbo.Users(Reputation)
GO

CREATE OR ALTER PROC dbo.usp_GetUsers
AS
BEGIN
    /* This is how we often emulate constants */
    DECLARE @Reputation INT = 2;

    SELECT *
    FROM dbo.Users u
    WHERE u.Reputation = @Reputation
    ORDER BY u.CreationDate DESC;
END
GO

This is the execution plan. Let’s keep an eye on the number of estimated users.

StackOverflow users with reputation = 2
Execution plan of finding users with 2-point reputation

But, there’s a downside. Variables inside stored procedures trigger a different behavior in SQL Server.

Variables and execution plans

When executing a stored procedure, SQL Server creates an execution plan for the first set of parameters it sees. And, SQL Server reuses the same execution plan the next time we run that stored procedure. We call this behavior Parameter Sniffing.

SQL Server uses statistics (histograms built from samples of our data) to choose the shape of the execution plan. Which table to read first, how to read that table, the number of threads, the amount of memory, among other things.

But, when there are variables in a stored procedure, SQL Server builds execution plans, not from statistics (e.i. samples of our data), but from an “average value.”

Variables make SQL Server build different execution plans, probably not suited for the set of parameters we’re calling our stored prcedures with. That’s why variables aren’t a good idea to replace constants.

Literal values and comments

The simplest solution to constants in T-SQL is to use literal values.

To make stored procedures more maintainable, it’s a good idea to write an explaining comment next to the literal value.

Let’s rewrite our stored procedure with a literal and a comment.

CREATE OR ALTER PROC dbo.usp_GetUsers
AS
BEGIN
    SELECT DisplayName, Location, CreationDate
    FROM dbo.Users u
    WHERE u.Reputation = /* Interesting reputation */2
    ORDER BY u.CreationDate DESC;
END
GO

This is the execution plan.

StackOverflow users with reputation = 2
This time, we're back to a literal value and a comment

Do you remember the estimated number of users from our example with variables? Now, we have a more accurate estimated number. SQL Server isn’t using an average value. It has better estimates this time.

We even have an index recommendation in our execution plan. By the way, don’t blindly follow index recommendations, just listening to them. They’re only a list of columns to consider indexing.

Create a view for constants

The hardcoded value and an explanatory comment are OK if we have our “constant” in a few places.

A more maintainable solution to literal values is to create a single-row view with columns named after the constants to declare.

CREATE OR ALTER VIEW dbo.vw_Constant
AS
SELECT (2) InterestingReputation
GO

With that view in place, we can replace the hardcoded values in our stored procedure.

CREATE OR ALTER PROC dbo.usp_GetUsers
AS
BEGIN
    SELECT *
    FROM dbo.Users u
    /* The views with our constant */
    INNER JOIN dbo.vw_Constant c 
    ON u.Reputation = c.InterestingReputation
    ORDER BY u.CreationDate DESC;
END
GO

A more maintainable alternative while keeping good estimates.

Voilà! That’s how to use constants with a view in SQL Server. I found a proposal to introduce a constant keyword in SQL Server. I learned about the trick with views while getting my code reviewed. But I also found the same idea in this StackOverflow question and in this one too.

For more content on SQL Server, check my other posts on functions and WHERE clauses, implicit conversions and index recommendations.

Happy coding!

SQL Server Index recommendations: Just listen to them

I guess you have seen SQL Server index recommendations on actual execution plans. But, you shouldn’t take them too seriously. This is what I learned about SQL Server index recommendations.

SQL Server builds index recommendations based on the WHERE and SELECT clauses of a query, without considering GROUP BY or ORDER BY clauses. Use index recommendations as an starting point to craft better indexes.

What’s a nonclustered index anyways?

If you’re wondering… A nonclustered index is a redundant, sorted, and smaller copy of one table to make queries go faster.

Imagine you want to find a particular book in a library. Would you go shelve by shelve, book by book until you find it? Or would you use the library catalog to go directly to the one you want? Well, these days, I guess libraries have software for that. But that’s the same idea. That library catalog or reference software works like an index.

After that aside…

Rijksmuseum, Amsterdam, Netherlands
Rijksmuseum, Amsterdam, Netherlands. Photo by Will van Wingerden on Unsplash

The next time you see an index recommendation on actual execution plans or from the Tuning Advisor, don’t rush to create it. Just listen to them.

To prove why we shouldn’t blindly create every index recommendation, let’s use the StackOverflow database to write queries and indexes.

Index recommendations and Scans

Let’s start with no indexes and a simple query to find all users from Colombia. Let’s check the actual execution plan.

SELECT Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Location = 'Colombia'

This is the execution plan. Do you see any index recommendations? Nope.

Users from Colombia with no index recommendation
Execution plan of finding all users from Colombia

If SQL Server has to scan the object, the actual execution plan won’t show any index recommendations.

Now, let’s change the query a bit. Let’s find the first 1000 users from Colombia ordered by Reputation instead.

SELECT TOP 1000 Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Location = 'Colombia'
ORDER BY Reputation DESC

Now, we have the missing index recommendation.

Users from Colombia with an index recommendation
Ladies and gentlemen, now the index recommendation

For that query, SQL Server suggests an index on Location including DisplayName and Reputation.

Recommeded index for Users from Colombia
The recommended index

Indexes aren’t sorted by included columns. Indexes might have some columns appended or “included” to avoid looking back to the table to access them.

Index recommendations including all the columns

To point out the next reason not to blindly add recommended indexes, let’s change our query to bring all columns instead of four of them.

Don’t write SELECT * queries, by the way.

SELECT TOP 1000 *
FROM dbo.Users
WHERE Location = 'Colombia'
ORDER BY Reputation DESC

Let’s see what the execution plan looks like.

Execution plan for a 'SELECT *'
Execution plan for a 'SELECT *'

At first glance, the plan looks similar. But let’s focus on what changed on the recommended index. Here it is.

Recommended index with all columns in the Users table
Recommended index with all columns in the Users table

SQL Server recommended an index with all the columns in the table. Even NVARCHAR(MAX) columns. Arrrggg!

Often, SQL Server recommends adding all the columns from the table into the INCLUDE part of indexes.

Indexes aren’t free. They take disk space. Even included columns take disk space. The more keys and included columns, the bigger the indexes get and the longer SQL Server will hold locks to insert, update, and delete data.

Index Recommendations and ORDER BY’s

The next thing to know about index recommendations has to do with the keys in the index.

SQL Server index recommendations are based on the WHERE and SELECT clauses. SQL Server doesn’t use GROUP BY or ORDER BY clauses to build index recommendations.

For our last query, let’s add the recommended index and another one with the ORDER BY in mind. These are the two new indexes,

/* This one has Reputation, which is on the ORDER BY */
CREATE INDEX Location_Reputation ON dbo.Users(Location, Reputation);

/* This is the recommended one */
CREATE INDEX Location ON dbo.Users(Location);
GO

After creating these indexes, let’s run our query again,

SELECT TOP 1000 Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Location = 'Colombia'
ORDER BY Reputation DESC
GO

This time, the execution plan looks like this,

Recommended index with all columns in the Users table
Recommended index with all columns in the Users table

SQL Server recommended one index but used another. Even when the recommended index was in place.

SQL Server only looks at WHERE and SELECT part of queries to build recommendations. We can create better indexes than the recommended ones for queries with ORDER BY and GROUP BY clauses.

Next, let’s dig into the order of keys in recommended indexes.

Keys on the recommended indexes are based on equality and inequality comparisons on the WHERE. Columns with equality comparisons are shown first, followed by columns with inequality comparisons.

Let’s add another comparison to our sample query. This time, let’s look for users from Colombia with more than 10 reputation points.

SELECT TOP 1000 Id, DisplayName, Reputation, Location
FROM dbo.Users
WHERE Reputation > 10
AND Location = 'Colombia'
ORDER BY Reputation DESC

Let’s check the recommended index on the execution plan.

Recommended index with all columns in the Users table
Recommended index on Location followed by Reputation

The recommended index contains the Location column first, then the Reputation column. But, in the query, the filter on Reputation was first. What happened here?

SQL Server builds recommended indexes on equality comparisons followed by inequality comparisons. That’s why an apparent mismatch in the order of keys on the index and filters on the query.

Last thing about recommended indexes.

Index recommendations don’t take into account existing indexes.

Check your existing indexes. See if you can combine the recommended indexes with your existing ones. If your existing indexes overlap with the recommended ones, drop the old ones.

Build as few indexes as possible to support your queries. Keep around 5 indexes per table with around 5 columns per index.

Parting words

Voilà! These are some of the things I learned about SQL Server index recommendations. Remember, indexes aren’t free. The more indexes you add, the slower your queries will get.

Next time you see index recommendations on your execution plans, check if you already have a similar index and modify it. If you don’t, please remember to at least change its name. And not to include all the columns of your table.

I learned these things following Brent Ozar’s Master Index Tuning class. Great class!

For more content on SQL Server, check how to do case sensitive searches, how to optimize GROUP BY queries and what implicit conversions are.