TIL: How to do a case-sensitive search in SQL Server
21 Feb 2022 #todayilearned #sqlDo you use LOWER or UPPER to do case-sensitive searches? Let’s see how to write a case-sensitive search in SQL Server.
To write a case-sensitive search in SQL Server, don’t use UPPER or LOWER functions around the column to filter. Instead, use the COLLATE keyword with a case-sensitive collation followed by the comparison.
Naive case sensitive search
Often by mistake, to do a case-sensitive search, we wrap a column around LOWER()
. Something like this,
SELECT TOP 50 DisplayName, Location, CreationDate, Reputation
FROM dbo.Users
WHERE LOWER(DisplayName) LIKE 'john%'
-- ^^^^^
ORDER BY Reputation DESC;
GO
We tried to find the first 50 StackOverflow users with DisplayName containing lowercase ‘john’.
But, interestingly enough, some results don’t match our intended filter. They include both lowercase and uppercase ‘john’.
It would be the same if we use UPPER()
instead. In general, let’s not use functions around columns in our WHEREs. That’s a common bad practice.
Collation and case sensitivity
In SQL Server, collations provide sorting rules and case and accent sensitivity for our data. For example, when we use an ORDER BY, the sort order of our results depends on the database collation.
In SQL Server Management Studio, we find a database collation after clicking on the database Properties and then on General. It’s under the Maintenance section.
Here’s mine.
The default collation for English is SQL_Latin1_General_CP1_CI_AS. This is a case-insensitive collation.
In collation names, CI means case insensitive. And, AS means accent sensitive.
That’s why when we wrote LOWER(DisplayName)
, SQL Server showed uppercase and lowercase results. SQL Server used the default collation, which was a case-insensitive one.
For more details about collations, check Microsoft docs on collations and Unicode support.
Case sensitive searches in WHERE clause
For case-sensitive searches in SQL Server, use COLLATE keyword with the case sensitive collation “SQL_Latin1_General_CP1_CS_AS” followed by the LIKE or = comparisons as usual.
When using the SQL_Latin1_General_CP1_CS_AS collation, a is different from A, and à is different from á. It’s both case and accent-sensitive.
Let’s rewrite our query with COLLATE,
SELECT TOP 50 DisplayName, Location, CreationDate, Reputation
FROM dbo.Users
WHERE DisplayName COLLATE SQL_Latin1_General_CP1_CS_AS LIKE '%john'
ORDER BY Reputation DESC;
GO
This time, we have the results we were expecting. Only users with lowercase ‘john’ in their display name.
Voilà! That’s how we can write case-sensitive searches in SQL Server. Remember, don’t use LOWER or UPPER. They won’t work for case-sensitive searches. Use a different collation instead.
For more content on SQL Server, check what are implicit conversions and why you should care, how to optimize GROUP BY queries, and just listen to index recommendations.
Happy coding!