Peeking into LINQ DistinctBy source code
11 Jul 2022 #tutorial #csharp“Don’t use libraries you can’t read their source code.”
That’s a bold statement I found and shared in a past Monday Links episode. Inspired by that, let’s see what’s inside the new LINQ DistinctyBy
method.
What DistinctBy does
DistinctBy returns the objects containing unique values based on one of their properties. It works on collections of complex objects, not just on plain values.
DistinctBy is one of the new LINQ methods introduced in .NET 6.
Here’s how to find unique movies by release year.
var movies = new List<Movie>
{
new Movie("Schindler's List", 1993, 8.9f),
new Movie("The Lord of the Rings: The Return of the King", 2003, 8.9f),
new Movie("Pulp Fiction", 1994, 8.8f),
new Movie("Forrest Gump", 1994, 8.7f),
new Movie("Inception", 2010, 8.7f)
};
// Here we use the DistinctBy method with the ReleaseYear property
var distinctByReleaseYear = movies.DistinctBy(movie => movie.ReleaseYear);
// ^^^^^^^^^^
foreach (var movie in distinctByReleaseYear)
{
Console.WriteLine($"{movie.Name}: [{movie.ReleaseYear}]");
}
// Output:
// Schindler's List: [1993]
// The Lord of the Rings: The Return of the King: [2003]
// Pulp Fiction: [1994]
// Inception: [2010]
record Movie(string Name, int ReleaseYear, float Score);
We used the DistinctBy
method on a list of movies. We didn’t use it on a list of released years to then find one movie for each unique release year found.
Before looking at DistinctBy source code, how would you implement it?
LINQ DistinctBy source code
This is the source code for the DistinctBy method. Source
Well, it doesn’t look that complicated. Let’s go through it.
1. Iterating over the input collection
First, DistinctBy()
starts by checking its parameters and calling DistinctByIterator()
.
This is a common pattern in other LINQ methods: Checking parameters in one method and then calling a child iterator method to do the actual logic. (See 1. in the image above)
Then, the DistinctByIterator()
initializes the underling enumerator of the input collection with a using
declaration. The IEnumerable
type has a GetEnumerator()
method. (See 2.)
The IEnumerator
type has:
- a
MoveNext()
method to advance the enumerator to the next position - a
Current
property to hold the element at the current position.
If a collection is empty or if the iterator reaches the end of the collection, MoveNext()
returns false
. And, when MoveNext()
returns true
, Current
gets updated with the element at that position. Source
Then, to start reading the input collection, the iterator is placed at the initial position of the collection calling MoveNext()
. (See 3.) This first if
avoids allocating memory by creating a set in the next step if the collection is empty.
2. Keeping only unique values
After that, DistinctByIterator()
creates a set with a default capacity and an optional comparer. This set keeps track of the unique keys already found. (See 4.)
The next step is to read the current element and add its key to the set. (See 5.)
If a set doesn’t already contain the same element, Add()
returns true
and adds it to the set. Otherwise, it returns false
. And, when the set exceeds its capacity, the set gets resized. Source
If the current element’s key was added to the set, the element is returned with the yield return
keywords. This way, DistinctByIterator()
returns one element at a time.
Step 5 is wrapped inside a do-while
loop. It runs until the enumerator reaches the end of the collection. (See 6.)
Voilà! That’s the DistinctBy source code. Simple but effective. Not that intimidating, after all. The trick was to use a set.
To learn about LINQ, check my quick guide to LINQ, five common LINQ methods in pictures and what’s new in LINQ with .NET 6.0.
Happy coding!