Looking inside a box

Peeking into LINQ DistinctBy source code

“Don’t use libraries you can’t read their source code.” That’s a bold statement I found and shared in a past Monday Links. I decided to look into the LINQ DistinctBy source code. For a while, I thought the C# standard library was like a black box that only experienced wizards programmers could understand. I was wrong. Let’s see what’s inside the new LINQ DistinctyBy method.

What LINQ DistinctBy method does?

DistinctBy returns the objects containing unique values based on one of their properties. It works on collections of complex objects, not just on plain values.

DistinctBy is one of the new LINQ methods introduced in .NET 6.

The next code sample shows how to find unique movies by release year.

var movies = new List<Movie>
    new Movie("Schindler's List", 1993, 8.9f),
    new Movie("The Lord of the Rings: The Return of the King", 2003, 8.9f),
    new Movie("Pulp Fiction", 1994, 8.8f),
    new Movie("Forrest Gump", 1994, 8.7f),
    new Movie("Inception", 2010, 8.7f)

// Here we use the DistinctBy method with the ReleaseYear property
var distinctByReleaseYear = movies.DistinctBy(movie => movie.ReleaseYear);
//                                 ^^^^^^^^^^

foreach (var movie in distinctByReleaseYear)
    Console.WriteLine($"{movie.Name}: [{movie.ReleaseYear}]");

// Output:
// Schindler's List: [1993]
// The Lord of the Rings: The Return of the King: [2003]
// Pulp Fiction: [1994]
// Inception: [2010]

record Movie(string Name, int ReleaseYear, float Score);

Notice we used the DistinctBy method on a list of movies. We didn’t use it on a list of released years to then find one movie for each unique release year found.

Before looking at DistinctBy source code, how would you implement it?

Puppy looking inside a gift bag
Let's peek into DistinctBy source code. Photo by freestocks on Unsplash

LINQ DistinctBy source code

This is the source code for the DistinctBy method. Source

DistinctBy source code
DistinctBy source code

Well, it doesn’t look that complicated. Let’s go through it.

1. Iterating over the input collection

First, DistinctBy() starts by checking its parameters and calling DistinctByIterator(). This is a common pattern in other LINQ methods. Check parameters in one method and then call a child iterator method to do the actual logic. (See 1. in the image above)

Then, the DistinctByIterator() initializes the underling enumerator of the input collection with a using declaration. The IEnumerable type has a GetEnumerator() method. (See 2.)

The IEnumerator type has a MoveNext() method to advance the enumerator to the next position and a Current property to hold the element at the current position.

If a collection is empty or if the iterator reaches the end of the collection, MoveNext() returns false. And, when MoveNext() returns true, Current gets updated with the element at that position. Source

Then, to start reading the input collection, the iterator is placed at the initial position of the collection calling MoveNext(). (See 3.) This first if avoids allocating memory by creating a set in the next step if the collection is empty.

2. Keeping only unique values

After that, DistinctByIterator() creates a set with a default capacity and an optional comparer. This set keeps track of the unique keys already found. (See 4.)

DefaultInternalSetCapacity declaration
DefaultInternalSetCapacity = 7

The next step is to read the current element and add its key to the set. (See 5.)

If a set doesn’t already contain the same element, Add() returns true and adds it to the set. Otherwise, it returns false. And, when the set exceeds its capacity, the set gets resized. Source

If the current element’s key was added to the set, the element is returned with the yield return keywords. This way, DistinctByIterator() returns one element at a time.

Step 5 is wrapped inside a do-while loop. It runs until the enumerator reaches the end of the collection. (See 6.)

Voilà! That’s the DistinctBy source code. Simple but effective. Not that intimidating, after all. By no means I want to diminish the work of .NET contributors. On the contrary, it’s a good exercise to read the source code of standard libraries to pick conventions and patterns.

To learn about LINQ, check my quick guide to LINQ, five common LINQ methods in Pictures and What’s new in LINQ with .NET 6.

If you want to write more expressive code to work with collections, check my course Getting Started with LINQ on Educative, where I cover from what LINQ is, to refactoring conditionals with LINQ and to the its new methods and overloads in .NET6. All you need to know to start using LINQ in your everyday coding.

Happy coding!