SQL Server compares columns and parameters with the same data types. But, if the two data types are different, weird things happen. Let’s see what implicit conversions are and why we should care.
An implicit conversion happens when the data types of columns and parameters in comparisons are different. And SQL Server has to convert between them, following type precedence rules. Often, implicit conversions lead to unneeded index or table scans.
An implicit conversion that scans
Let’s see an implicit convention! For this, let’s create a new table from the StackOverflow Users table. But, this time, let’s change the Location data type from NVARCHAR to VARCHAR.
Let’s find all users from Colombia. To prove a point, let’s query the dbo.Users_Varchar table instead of the original dbo.Users table.
DECLARE@LocationNVARCHAR(20)=N'Colombia';SELECTId,LocationFROMdbo.Users_Varchar/* The column is VARCHAR, but the parameter NVARCHAR */WHERELocation=@Location;GO
Notice we have declared @Location as NVARCHAR. We have a type mismatch between the column and the variable.
Let’s see the execution plan.
SQL Server had to scan the index on Location. But, why?
Let’s notice the warning sign on the execution plan. When we hover over it, it shows the cause: SQL Server had to convert the two types. Yes! SQL Server converted between VARCHAR and NVARCHAR.
SQL Server data type precedence
To determine what types to convert, SQL Server follows a data type precedence order. This is a short version:
datetimeoffset
datetime2
datetime
smalldatetime
date
time
decimal
bigint
int
timestamp
uniqueidentifier
nvarchar (including nvarchar(max))
varchar (including varchar(max))
SQL Server convert “lower” types to “higher” types. We don’t need to memorize this order. Let’s remember SQL Server always has to convert VARCHAR to other types.
In our example, the VARCHAR type was on the column, on the left side of the comparison in the WHERE. It means SQL Server had to read the whole content of the index to convert and then compare. That’s more than 2 million rows. That’s why the index scan.
Let’s use the original dbo.Users table with Location as NVARCHAR and repeat the query. This time, switching the variable type to VARCHAR. What would be different?
DECLARE@LocationVARCHAR(20)='Colombia';SELECTId,Location/* We're filtering on the original Users table */FROMdbo.Users/* This time, the column is NVARCHAR, but the parameter VARCHAR */WHERELocation=@Location;GO
Now, the VARCHAR type is on the right of the comparison. It means SQL Server has to do one single conversion: the parameter.
This time we don’t have a yellow bang on our execution plan. And, we have an Index Seek. Not all implicit conversions are bad.
In stored procedures and queries, use input parameters with the same types as the columns on the tables.
To identify which queries on your SQL Server have implicit conversions issues, we can use the third query from these six performance tuning tips from Pinal Dave. But, after taking Brent Ozar’s Mastering courses, I learned to start working with the most expensive queries instead of jumping to queries with implicit convertion issues right away.
Voilà! Those are implicit conversions and why you should care. Let’s use input parameters with the right data types on your queries and store procedures. Otherwise, we will pay the performance penalty of converting and comparing types. Implicit conversions are like functions around columns, implicitly added by SQL Server itself.
Let’s remember that not all implicit conversions are bad. When looking at execution plans, let’s check how many rows SQL Server reads to convert and compare things.
There’s one thing we could do to write faster queries in SQL Server: don’t use functions around columns in WHERE clauses. I learned it the hard way. Let me share this lesson with you.
Don’t use user-defined or built-in functions around columns in the WHERE clause of queries. It prevents SQL Server from estimating the right amount of rows out of the function. Write queries with operators and comparisons to make SQL Server better use the indexes it has.
With functions around columns in WHEREs
To prove this point, let’s query a local copy of the StackOverflow database. Yes, the StackOverflow we all know and use.
StackOverflow has a Users table that contains, well…, all registered users and their profiles. Among other things, every user has a display name, location, and reputation.
Let’s find the first 50 users by reputation in Colombia.
To make things faster, let’s create an index on the Location field. It’s an NVARCHAR(100) column.
CREATEINDEXLocationONdbo.Users(Location);
This is the query we often write,
DECLARE@LocationNVARCHAR(20)=N'Colombia';SELECTTOP50DisplayName,Location,CreationDate,ReputationFROMdbo.Users-- Often, we put LOWER on both sides of the comparisonWHERELOWER(Location)=LOWER(@Location)-- ^^^^^ ^^^^^ORDERBYReputationDESC;GO
Did you notice the LOWER function on both sides of the equal sign?
We all have written queries like that one. I declared myself guilty too. Often, we use LOWER and UPPER or wrap the column around RTRIM and LTRIM.
But, let’s see what happened in the Execution Plan.
Here, SQL Server chose to scan the index Location first. And let’s notice the width of the arrow coming out of the first operator. When we place the cursor on it, it shows “Number of Rows Read.”
In this copy of the StackOverflow database, there are 2,465,713 users, only 463 of them living in Colombia. SQL Server had to read the whole content of the index to execute our query. Arrrggg!
It means that to find all users in Colombia, SQL Server had to go through all users in the index. We could use that index in a better way.
Write queries with comparisons, functions, and operators around parameters. This way SQL Server could properly use indexes and have better estimates of the contents of tables. But, don’t write functions around columns in the WHERE clauses.
The same is true when joining tables. Let’s not put functions around the foreign keys in our JOINs either.
Rewrite your queries to avoid functions around columns in WHERE
Let’s go back and rewrite our query without any functions wrapping columns. This way,
DECLARE@LocationNVARCHAR(20)=N'Colombia';SELECTTOP50DisplayName,Location,CreationDate,ReputationFROMdbo.Users-- We remove LOWER on both sidesWHERELocation=@LocationORDERBYReputationDESC;GO
And, let’s check the execution plan again.
This time, SQL Server used an Index Seek. It means SQL Server didn’t have to read the whole content of the Location index to run our query. And the execution plan didn’t go parallel. We don’t have the black arrows in yellow circles in the operators.
Let’s notice that this time we have a thinner arrow on the first operator. Let’s see how many rows SQL Server read this time.
After that change, SQL Server only read 463 records. That was way better than reading the whole index.
Voilà! If we want to write faster queries, let’s stop using functions around columns in our WHEREs. That screws SQL Server estimates. For example, let’s not use LOWER or UPPER around our columns. By the way, SQL Server string searches are case insensitive by default, we don’t need those functions at all.
This episode of Monday Link is a bit diverse. From getting better at programming to why it’s harder for juniors to get hired. And, a rant about the C# evolution.
I couldn’t avoid adding the last article to this Monday Links. I try to only share five articles. But, that’s a great story of establishing priorities and putting life, health, and work in a balance. Enjoy!
A Path to Better Programming by Robert “Uncle Bob” Martin and Allen Holub
This is a conversation between Uncle Bob and Allen Holub (I didn’t know about Allen before this conversation) about principles, isolation, mob programming, and leadership.
By the way, the military has been leading large companies for years. How do they do it? The answer mentioned in the video is to empower people to do things. But there’s a whole book on the subject.
“If it’s a good idea, the good idea tends to spread, unless there’s somebody at the management level working hard to make it not spread”
Allen Holub
Also, I didn’t know there was a “Clean Agile” book. I would like to see what Agile meant for the creators and how it differs from what we call “Agile” today. Watch full video
Why flags do not represent languages
These days I needed to translate my Unit Testing 101 workshop instructions to Spanish. And call it fate or not, I found this article about not using flags to represent languages and what to do instead. How often do you see on web pages the US flag to signal English? Read full article
Is C# Getting Too Complex?
I like the evolution of C# as a language. I wrote about some of the newest and coolest C# features. But, I don’t like some of the new features. Some features, quoting the article, “just add to the complexity of the language without bringing anything to the table besides saving a few lines of code or a couple of curly braces.” Some features make the language less consistent. I share the main point of this article. Read full article.
Read the source code of your dependencies
I first heard about the concept of reading code as a learning exercise from a friend. To learn how star developers code, read their code. This post takes this concept to the next level. Use only dependencies you’re able to read its source code. Read full article.
Why nobody hires junior developers and what happens next
Another consequence of the pandemic. Nobody hires juniors. One thing I’ve noticed is companies don’t have clear career plans and salary policies for developers. Someone starts in a company and after a few years, what? Close tickets forever? Work extra hours to get noticed and promoted? That’s why juniors as soon as they feel more confident, move to a better-paid place. Brain drain is expensive. Read full article
Fetch the bolt cutters
Maybe I don’t have serious eye strain or carpal tunnel syndrome (yet?). But, I can totally relate to this story. And, it’s something more frequent these days. One quote that made me wow and stand in ovation: “In a choice between a job and me, I’m always going to choose me”. Such an amazing way to finish that story. Read full article
Voilà! Another five six reads! Do you also think C# is getting too complex? Have you seen a clear career path at companies where you’ve been?
In 2021, I switched from posting whenever I had an idea about anything to posting regularly every other week about some topics.
I wrote a whole series of posts about unit testing. From how to write your first unit test with MSTest to what stubs and mocks are. In fact, I wrote my first ebook Unit Testing 101 with some of the posts of the series. By the way, you can download it for free. No email asked.
These are the 5 posts I wrote in 2021 you read the most. In case you missed any of them, here they are:
TIL: How to convert 2-digit year to 4-digit year in C#. Last year, I worked with a Stripe-powered payment module in a reservation management system. And, I needed to parse card expiration dates from a 2-digit year to a 4-digit year. I almost just added 2000 to them. But I didn’t. This is what I found and did instead.
The Art of Readable Code is the perfect companion for the Clean Code. It contains simple and practical tips to improve your code at the function level. It isn’t as dogmatic as Clean Code. Tips aren’t as strict as the ones from Clean Code. But, it still deserves to be read.
These are some of my notes on The Art of Readable Code.
1. Code should be easy to understand
Code should be written to minimize the time for someone else to understand it. Here, understanding means solving errors, spotting bugs, and making changes.
It’s good to write compact code. But, compact doesn’t always mean more readable.
If something is critical to understand, put it in a name.
Choose specific words for your method names. For example, does def GetPage(url) get the page from the network, a cache or a database? Use FetchPage(url) or DownloadPage(url) instead.
Avoid empty names like retval, tmp, foo. Instead, use a variable name that describes the value. In the next code sample, use sum_squares instead of retval.
Variables i, j, k don’t always work for loop indices. Prefer more concrete names. Indices could be misused or interchanged.
Use concrete over abstract names. Instead of --run-locally, use --extra-logging or --use-local-database.
Attach extra information to your names. For example, encode units. Prefer delay_secs over delay, size_mb over size and degrees over angle. Also, encode extra information. Prefer plaintext_password over password.
Do not include needless words in your names. Instead of ConvertToString, use ToString.
Names can’t be misconstructed
Make your names resistant to misinterpretation. Does a method on arrays named filter pick or get rid of elements? If it picks elements, use select(). And, if it gets rid of elements, use exclude().
// Does results have elements that satisfy the condition? Or elements that don't?results=Database.all_objects.filter("year <= 2011")
Use Min and Max for inclusive limits. Put min or max in front of the thing being limited. For example,
Use First and Last for inclusive limits. What’s the result of print integer_range(start=2, stop=4)? Is it [2,3] or [2,3,4]? Prefer First and Last. For example, print integer_range(first=2, last=4) to mean [2,3,4].
Use Begin and End for inclusive/exclusive ranges. For example, to find all events on a date, prefer PrintEventsInRange("OCT 16 12:00am", "OCT 17 12:00am") instead of PrintEventsInRange("OCT 16 12:00am", "OCT 16 11:59.999am").
For booleans variables, make clear what true or false means. Use is, has, can, should, need as prefixes. For example, SpaceLeft() or HasSpaceLeft().
Avoid negated booleans. For example, instead of disable_ssl = false, use use_ssl = true.
Don’t create false expectation. With these two names GetSize() and ComputeSize(), we expect GetSize() to be a lightweight operation.
Aesthetics
Similar code should look similar. Pick a meaningful order and maintain it. If the code mentions A, B, and C, don’t say B, C, and A in other places.
Organize declarations into blocks, like sentences. Break code into paragraphs.
Knowing what to comment
Don’t comment what can be derived from code. GOOD CODE > BAD CODE + COMMENTS.
Comment the why’s behind a decision. Anticipate likely questions and comment the big picture. Comment why you choose a particular value or what it’s a valid range. For example, MAX_THREADS = 8; // Up to 2*num of procs
Making comments precise and compact
Use comments to show examples of input and output values.
Use named parameters or use comments to make the same effect. Prefer connect(timeout: 10, use_ssl: false) over connect(10, false). In languages without named parameters, use connect(/*timeout_ms=*/10, /*use_ssl=*/false).
3. Simplifying loops and logic
Making control flow easy to read
When doing conditionals, write the changing variable first, followed by an operator and by stable expression. Prefer if (length >= 10) over if (10 <= length).
When doing if-then-else, treat the positive case first or the simplest case or the most interesting case first.
Use the ternary operator ?: with simple statements, not to squeeze logic into a single line. For example, time_str += (hour > 12) ? "pm" : "am"
Avoid do-while loops. Do-while loops break the convention of keeping the condition first. Use while instead.
Breaking down giant expressions
Write explaining variables and summary variables. For example,
ifline.split(':')[0]=='root':# ...
# vs
username=line.split(':')[0]ifusername=='root':# ...
Use De Morgan’s Laws. For example,
if(!(file_exists&&!is_protected))Error()# vs
if(!file_exists||is_protected)Error()
Eliminate intermediate results. Do your task as quickly as possible. For example, avoid writing functions like this
List all the things your code is doing. And try to separate every task into a separate function.
Defragment your code to do one type of thing at a time. For example, initialize to default values, then calculate some data and lastly update some other values.
Writing less code
The most readable code is no code at all.
Keep your codebase as small and lightweight as possible. Create as much utility code to remove duplication. Remove unused code. Keep your project separated into isolated sub-projects
Know the capabilities of your libraries. Every once in a while spend 15 minutes reading the names of all functions/modules/types in your standard libraries. Write less code as possible.
For example, don’t be tempted to write your own de-duplicate code in Python.
defunique(elements):tmp={}foreinelements:tmp[e]=Nonereturntmp.keys()unique([2,1,2])# vs
unique=list(set([2,1,2]))
5. Testing and readability
Tests should be easy to understand. Coders are afraid of changing code, so coders don’t add new tests.
Hide less important details from the user, so he can focus only on the important ones.
Create the minimal test statement. Most tests can be reduced to: “given an input, expect this output.” Ideally a unit test is just 3-line long.
Create mini-languages.
Choose the simplest set of inputs that exercise your code. Prefer clean and simple test values. Embrace the Least astonishing principle.
Write smaller test cases, instead of a single perfect one-size-fits-all test case. Every test pushes your code in a different direction.
Test names should indicate a unit of work (or code under test), situation or bug being tested, and expected result.
Voilà! These are some of my notes. One tip I started to practice after reading this book was to “extract unrelated problems.”
The Art of Readable Code is a good starting point to introduce the concept of readability and clean code to your team. These tips and tricks are a good reference for code standards and reviews.