Looking inside a box

Peeking into LINQ DistinctBy source code

“Don’t use libraries you can’t read their source code.”

That’s a bold statement I found and shared in a past Monday Links episode. Inspired by that, let’s see what’s inside the new LINQ DistinctyBy method.

What DistinctBy does

DistinctBy returns the objects containing unique values based on one of their properties. It works on collections of complex objects, not just on plain values.

DistinctBy is one of the new LINQ methods introduced in .NET 6.

Here’s how to find unique movies by release year.

var movies = new List<Movie>
{
    new Movie("Schindler's List", 1993, 8.9f),
    new Movie("The Lord of the Rings: The Return of the King", 2003, 8.9f),
    new Movie("Pulp Fiction", 1994, 8.8f),
    new Movie("Forrest Gump", 1994, 8.7f),
    new Movie("Inception", 2010, 8.7f)
};

// Here we use the DistinctBy method with the ReleaseYear property
var distinctByReleaseYear = movies.DistinctBy(movie => movie.ReleaseYear);
//                                 ^^^^^^^^^^

foreach (var movie in distinctByReleaseYear)
{
    Console.WriteLine($"{movie.Name}: [{movie.ReleaseYear}]");
}

// Output:
// Schindler's List: [1993]
// The Lord of the Rings: The Return of the King: [2003]
// Pulp Fiction: [1994]
// Inception: [2010]

record Movie(string Name, int ReleaseYear, float Score);

We used the DistinctBy method on a list of movies. We didn’t use it on a list of released years to then find one movie for each unique release year found.

Before looking at DistinctBy source code, how would you implement it?

Puppy looking inside a gift bag
Let's peek into DistinctBy source code. Photo by freestocks on Unsplash

LINQ DistinctBy source code

This is the source code for the DistinctBy method. Source

DistinctBy source code
DistinctBy source code

Well, it doesn’t look that complicated. Let’s go through it.

1. Iterating over the input collection

First, DistinctBy() starts by checking its parameters and calling DistinctByIterator().

This is a common pattern in other LINQ methods: Checking parameters in one method and then calling a child iterator method to do the actual logic. (See 1. in the image above)

Then, the DistinctByIterator() initializes the underling enumerator of the input collection with a using declaration. The IEnumerable type has a GetEnumerator() method. (See 2.)

The IEnumerator type has:

If a collection is empty or if the iterator reaches the end of the collection, MoveNext() returns false. And, when MoveNext() returns true, Current gets updated with the element at that position. Source

Then, to start reading the input collection, the iterator is placed at the initial position of the collection calling MoveNext(). (See 3.) This first if avoids allocating memory by creating a set in the next step if the collection is empty.

2. Keeping only unique values

After that, DistinctByIterator() creates a set with a default capacity and an optional comparer. This set keeps track of the unique keys already found. (See 4.)

DefaultInternalSetCapacity declaration
DefaultInternalSetCapacity = 7

The next step is to read the current element and add its key to the set. (See 5.)

If a set doesn’t already contain the same element, Add() returns true and adds it to the set. Otherwise, it returns false. And, when the set exceeds its capacity, the set gets resized. Source

If the current element’s key was added to the set, the element is returned with the yield return keywords. This way, DistinctByIterator() returns one element at a time.

Step 5 is wrapped inside a do-while loop. It runs until the enumerator reaches the end of the collection. (See 6.)

Voilà! That’s the DistinctBy source code. Simple but effective. Not that intimidating, after all. The trick was to use a set.

To learn about LINQ, check my quick guide to LINQ, five common LINQ methods in pictures and what’s new in LINQ with .NET 6.0.

Want to write more expressive code for collections? Join my course, Getting Started with LINQ on Udemy! You'll learn from what LINQ is, to refactoring away from conditionals, and to new methods and overloads from recent .NET versions. Everything you need to know to start working productively with LINQ — in less than two hours.

Happy coding!