Let's refactor a test: Remove duplicated emails

22 Dec 2022 #tutorial #csharp

This post is part of my Advent of Code 2022.

Recently, I’ve been reviewing pull requests as one of my main activities. This time, let’s refactor two tests I found on one code review session. The two tests check if an email doesn’t have duplicated addresses before sending it. But, they have a common mistake: testing private methods directly. Let’s refactor these tests to use the public facade of methods.

Always write unit tests using the public methods of a class or a group of classes. Don’t make private methods public and static to test them directly. Test the observable behavior of classes instead.

Here are the test to refactor

These tests belong to an email component in a Property Management Solution. This component stores all emails before sending them.

These are two tests to check we don’t try to send an email to the same addresses. Let’s pay attention to the class name and method under test.

public class SendEmailCommandHandlerTests
{
    [Fact]
    public void CreateRecipients_NoDuplicates_ReturnsSameRecipients()
    {
        var toEmailAddresses = new List<string>
        {
            "toMail1@mail.com", "toMail2@mail.com"
        };
        var ccEmailAddresses = new List<string>
        {
            "ccMail3@mail.com", "ccMail4@mail.com"
        };

        var recipients = SendEmailCommandHandler.CreateRecipients(toEmailAddresses, ccEmailAddresses);
        //                                       ^^^^^
        
        recipients.Should().BeEquivalentTo(
          new List<Recipient>
          {
              Recipient.To("toMail1@mail.com"),
              Recipient.To("toMail2@mail.com"),
              Recipient.Cc("ccMail3@mail.com"),
              Recipient.Cc("ccMail4@mail.com")
          });
    }

    [Fact]
    public void CreateRecipients_Duplicates_ReturnsRecipientsWithoutDuplicates()
    {
        var toEmailAddresses = new List<string>
        {
            "toMail1@mail.com", "toMail2@mail.com", "toMail1@mail.com"
        };
        var ccEmailAddresses = new List<string>
        {
            "ccMail1@mail.com", "toMail2@mail.com"
        };

        var recipients = SendEmailCommandHandler.CreateRecipients(toEmailAddresses, ccEmailAddresses);
        //                                       ^^^^^

        recipients.Should().BeEquivalentTo(
          new List<Recipient>
          {
              Recipient.To("toMail1@mail.com"),
              Recipient.To("toMail2@mail.com"),
              Recipient.Cc("ccMail1@mail.com"),
          });
    }
}

I slightly changed some names. But those are the real tests I had to refactor.

What’s wrong with those tests? Did you notice it? Also, can you point out where the duplicates are in the second test?

To have more context, here’s the SendEmailCommandHandler class that contains the CreateRecipients() method,

using MediatR;
using Microsoft.Extensions.Logging;
using MyCoolProject.Commands;
using MyCoolProject.Shared;

namespace MyCoolProject;

public class SendEmailCommandHandler : IRequestHandler<SendEmailCommand, TrackingId>
{
    private readonly IEmailRepository _emailRepository;
    private readonly ILogger<SendEmailCommandHandler> _logger;

    public CreateDispatchCommandHandler(
        IEmailRepository emailRepository,
        ILogger<CreateDispatchCommandHandler> logger)
    {

        _emailRepository = emailRepository;
        _logger = logger;
    }

    public async Task<TrackingId> Handle(SendEmailCommand command, CancellationToken cancellationToken)
    {
        // Imagine some validations and initializations here...

        var recipients = CreateRecipients(command.Tos, command.Ccs);
        //               ^^^^^
        var email = Email.Create(
            command.Subject,
            command.Body,
            recipients);

        await _emailRepository.CreateAsync(email);

        return email.TrackingId;
    }

    public static IEnumerable<Recipient> CreateRecipients(IEnumerable<string> tos, IEnumerable<string> ccs)
    //                                   ^^^^^
        => tos.Select(Recipient.To)
              .UnionBy(ccs.Select(Recipient.Cc), recipient => recipient.EmailAddress);
    }
}

public record Recipient(EmailAddress EmailAddress, RecipientType RecipientType)
{
    public static Recipient To(string emailAddress)
        => new Recipient(emailAddress, RecipientType.To);

    public static Recipient Cc(string emailAddress)
        => new Recipient(emailAddress, RecipientType.Cc);
}

public enum RecipientType
{
    To, Cc
}

The SendEmailCommandHandler processes all requests to send an email. It grabs the input parameters, creates an Email class, and stores it using a repository. It uses the free MediatR library to roll commands and command handlers.

Also, it parses the raw email addresses into a list of Recipient with the CreateRecipients() method. That’s the method under test in our two tests. Here the Recipient and EmailAddress work like Value Objects.

Now can you notice what’s wrong with our tests?

What’s wrong?

Our two unit tests test a private method directly. That’s not the appropriate way of writing unit tests. We shouldn’t test internal state and private methods. We should test them through the public facade of our logic under test.

In fact, someone made the CreateRecipients() method public to test it,

Diff showing a private method made public — Someone made the internals public to write tests

Making private methods public to test them is the most common mistake on unit testing.

For our case, we should write our tests using the SendEmailCommand class and the Handle() method.

Don’t expose private methods

Let’s make the CreateRecipients() private again. And let’s write our tests using the SendEmailCommand and SendEmailCommandHandler classes.

This is the test to validate that we remove duplicates,

[Fact]
public async Task Handle_DuplicatedEmailInTosAndCc_CallsRepositoryWithoutDuplicates()
{
    var duplicated = "duplicated@email.com";
    //  ^^^^^
    var tos = new List<string> { duplicated, "tomail@mail.com" };
    var ccs = new List<string> { duplicated, "ccmail@mail.com" };

    var fakeRepository = new Mock<IDispatchRepository>();

    var handler = new CreateDispatchCommandHandler(
        fakeRepository.Object,
        Mock.Of<ILogger<SendEmailCommandHandler>>());

    // Let's write a factory method that receives these two email lists
    var command = BuildCommand(tos: tos, ccs: ccs);
    //            ^^^^^
    await handler.Handle(command, CancellationToken.None);

    // Let's write some assert/verifications in terms of the Email object
    fakeRepository
        .Verify(t => t.CreateAsync(It.Is<Email>(/* Assert something here using Recipients */), It.IsAny<CancellationToken>());
    // Or, even better let's write a custom Verify()
    //
    // fakeRepository.WasCalledWithoutDuplicates();
}

private static SendEmailCommand BuildCommand(IEnumerable<string> tos, IEnumerable<string> ccs)
    => new SendEmailCommand(
        "Any Subject",
        "Any Body",
        tos,
        ccs);

Notice we wrote a BuildCommand() method to create a SendEmailCommand only with the email addresses. That’s what we care about in this test. This way we reduce the noise in our tests. And, to make our test values obvious, we declared a duplicated variable and used it in both destination email addresses.

To write the Assert part of this test, we can use the Verify() method from the fake repository to check that we have the duplicated email only once. Or we can use the Moq Callback() method to capture the Email being saved and write some assertions. Even better, we can create a custom assertion for that. Maybe, we can write a WasCalledWithoutDuplicates() method.

That’s one of the two original tests. The other one is left as an exercise to the reader.

Voilà! That was today’s refactoring session. To take home, we shouldn’t test private methods and always write tests using the public methods of the code under test. We can remember this principle with the mnemonic: “Don’t let others touch our private parts.” That’s how I remember it.

For more refactoring sessions, check these two: store and update OAuth connections and generate payment reports. Don’t miss my Unit Testing 101 series where I cover from naming conventions to best practices.

Happy coding!

To Value Object or Not To: How I choose Value Objects

21 Dec 2022 #csharp #tutorial

This post is part of my Advent of Code 2022.

Today I reviewed a pull request and had a conversation about when to use Value Objects instead of primitive values. This is the code that started the conversation and my rationale to promote a primitive value to a Value Object.

Prefer Value Objects to encapsulate validations or custom methods on a primitive value. Otherwise, if a primitive value doesn’t have a meaningful “business” sense and is only passed around, consider using the primitive value with a good name for simplicity.

In case you’re not familiar with Domain-Driven Design and its artifacts. A Value Object represents a concept that doesn’t have an “identifier” in a business domain. Value objects are immutable and compared by value.

Value Objects represent elements of “broader” concepts. For example, in a Reservation Management System, we can use a Value Object to represent the payment method of a Reservation.

TimeStamp vs DateTime

This is the piece of code that triggered my comment during the code review.

public class DeliveryNotification : ValueObject
{
    public Recipient Recipient { get; init; }
    
    public DeliveryStatus Status { get; init; }
    
    public TimeStamp TimeStamp { get; init; }
    //     ^^^^^^

    protected override IEnumerable<object?> GetEqualityComponents()
    {
        yield return Recipient;
        yield return Status;
        yield return TimeStamp;
    }
}

public class TimeStamp : ValueObject
{
    public DateTime Value { get; }

    private TimeStamp(DateTime value)
    {
        Value = value;
    }
    
    public static TimeStamp Create()
    {
        return new TimeStamp(SystemClock.Now);
    }

    protected override IEnumerable<object> GetEqualityComponents()
    {
        yield return Value;
    }
}

public enum DeliveryStatus
{
    Created,
    Sent,
    Opened,
    Failed
}

We wanted to record when an email is sent, opened, and clicked. We relied on a third-party Email Provider to notify our system about these email events. The DeliveryNotification has an email address, status, and timestamp.

The ValueObject base class is Vladimir Khorikov’s ValueObject implementation.

Notice the TimeStamp class. It’s only a wrapper around the DateTime class. Mmmm…

Sand clock — Photo by Alexandar Todov on Unsplash

Promote Primitive Values to Value Objects

I’d dare to say that using a TimeStamp instead of a simple DateTime in the DeliveryNotification class was an overkill. I guess when “when we have a hammer, everything looks like a finger.”

This is my rationale to choose between value objects and primitive values:

If we need to enforce a domain rule or perform a business operation on a primitive value, let’s use a Value Object.
If we only pass a primitive value around and it represents a concept in the language domain, let’s wrap it around a record to give it a meaningful name.
Otherwise, let’s stick to the plain primitive values.

In our TimeStamp class, apart from Create(), we didn’t have any other methods. We might validate if the inner date is in this century. But that won’t be a problem. I don’t think that code will live that long.

And, there are cleaner ways of writing tests that use DateTime than using a static SystemClock. Maybe, it would be a better idea if we can overwrite the SystemClock internal date.

I’d take a simpler route and use a plain DateTime value. I don’t think there’s a business case for TimeStamp here.

public class DeliveryNotification : ValueObject
{
    public Recipient Recipient { get; init; }
    
    public DeliveryStatus Status { get; init; }
    
    public DateTime TimeStamp { get; init; }
    //     ^^^^^^

    protected override IEnumerable<object?> GetEqualityComponents()
    {
        yield return Recipient;
        yield return Status;
        yield return TimeStamp;
    }
}

// Or alternative, to use the same domain language
//
// public record TimeStamp(DateTime Value);

public enum DeliveryStatus
{
    Created,
    Sent,
    Opened,
    Failed
}

If in the “email sending” domain, business analysts or stakeholders use “timestamp,” for the sake of a ubiquitous language, we can add a simple record TimeStamp to wrap the date. Like record TimeStamp(DateTime value).

Voilà! That’s a practical option to decide when to use Value Objects and primitive values. For me, the key is asking if there’s a meaningful domain concept behind the primitive value. Otherwise we would end up with too many value objects or obsessed with primitive values.

If you want to read more about Domain-Driven Design, check my takeaways from these books Hands-on Domain-Driven Design with .NET Core and Domain Modeling Made Functional.

Happy coding!

Dump and Load to squash old migrations

20 Dec 2022 #csharp #tutorial #showdev

This post is part of my Advent of Code 2022.

Recently, I stumbled upon the article Get Rid of Your Old Database Migrations. The author shows how Clojure, Ruby, and Django use the “Dump and Load” approach to compact or squash old migrations. This is how I implemented the “Dump and Load” approach in one of my client’s projects.

1. Export database objects and reference data with schemazen

In one of my client’s projects, we had too many migration files that we started to group them inside folders named after the year and month. Squashing migrations sounds like a good idea here.

For example, for a three-month project, we wrote 27 migration files. This is the Migrator project,

List of migration files in one of my projects — 27 migration files for a short-term project

For those projects, we use Simple.Migrations to apply migration files and a bunch of custom C# extension methods to write the Up() and Down() steps. Since we don’t use an all-batteries-included migration framework, I needed to generate the dump of all database objects.

I found schemazen in GitHub, a CLI tool to “script and create SQL Server objects quickly.”

This is how to script all objects and export data from reference tables with schemazen,

dotnet schemazen script --server (localdb)\\MSSQLLocalDB
    --database <YourDatabaseName>
    --dataTablesPattern=\"(.*)(Status|Type)$\"
    --scriptDir C:/someDir

Notice I used --dataTablesPattern option with a regular expression to only export the data from the reference tables. In this project, we named our reference tables with the suffixes “Status” or “Type.” For example, PostStatus or ReceiptType.

I could simply export the objects from SQL Server directly. But those script files contain a lot of noise in the form of default options. Schemazen does it cleanly.

Schemazen generates one folder per object type and one file per object. And it exports data in a TSV format. I didn’t find an option to export the INSERT statements in its source code, though.

Schemazen generates a folder structure like this,

 |-data
 |-defaults
 |-foreign_keys
 |-tables
 props.sql
 schemas.sql

After this first step, I had the database objects. But I still needed to write the actual migration file.

Piles of used cars and trucks waiting to be recycled — Photo by Randy Laybourne on Unsplash

2. Process schemazen exported files

To write the squash migration file, I wanted to have all scripts in a single file and turn the TSV files with the exported data into INSERT statements.

I could write a C# script file, but I wanted to stretch my Bash/Unix muscles. After some Googling, I came up with this,

# It grabs the output from schemazen and compacts all dump files into a single one
FILE=dump.sql

# Merge all files into a single one
for folder in 'tables/' 'defaults/' 'foreign_keys/'
do
    find $folder -type f \( -name '*.sql' ! -name 'VersionInfo.sql' \) | while read f ;
    do
        cat $f >> $FILE;
    done
done

# Remove GO keywords and blank lines
sed -i '/^GO/d' $FILE
sed -i '/^$/d' $FILE

# Turn tsv files into INSERT statements
for file in data/*tsv;
do
    echo "INSERT INTO $file(Id, Name) VALUES" | sed -e "s/data\///" -e "s/\.tsv//" >> $FILE
    cat $file | awk '{print "("$1",\047"$2"\047),"}' >> $FILE
    echo >> $FILE
    
    sed -i '/^$/d' $FILE
    sed -i '$ s/,$//g' $FILE
done

The first part merges all separate object files into a single one. I filtered the VersionInfo table. That’s Simple.Migration’s table to keep track of already applied migrations.

The second part removes the GO keywords and blank lines.

And the last part turns the TSV files into INSERT statements. It grabs table names from the file name and removes the base path and the TSV extension. It assumes reference tables only have an id and a name.

With this compact script file, I removed the old migration files except the last one. For the project in the screenshot above, I kept Migration0027. Then, I used all the SQL statements from the dump file in the Up() step of the migration. I had an squash migration after that.

Voilà! That’s how I squashed old migrations in one of my client’s projects using schemazen and a Bash script. The idea is to squash our migrations after every stable release of our projects. From the reference article, one commenter said he does this approach one or twice a year. Another one, after every breaking changes.

By the way, recently, I got interested in the Unix tools again. Check how to replace keywords in a file name and content with Bash and how to create ASP.NET Core Api project structure with dotnet cli.

Happy coding!

Lessons I learned as a code reviewer

19 Dec 2022 #career #codereview

This post is part of my Advent of Code 2022.

In the past month, for one of my clients, I became a default reviewer. I had the chance to check everybody else’s code and advocate for change. After dozens of Pull Requests (PRs) reviewed, these are the lessons I learned.

I’ve noticed that most of the comments fall into two categories. I will call them “babysitting” and “surprising solution.”

1. Babysitting

In these projects, before opening a PR, we have to cover all major code changes with tests, have zero analyzer warnings, and format all C# code. But try to guess what the most common comments are. Things like “please write tests to cover this method,” “address analyzers warnings,” and “run CodeMaid to format this code.”

As a reviewee, before opening a PR, wear the reviewer hat and review your own code. It’s frustrating when the code review process becomes a slow and expensive linting process.

To have a smooth code review, let’s automate some of the things checked during the review process. For example, let’s clean and format our files with a Git hook or Visual Studio extension. And let’s turn all warnings into compilation errors.

For example, with this idea of automation in mind, I ended up writing a Git pre-commit hook to format sql files.

2. Surprising solution

Apart from making developers follow conventions, the next most common comments are clarification comments. Things like “why did you do that? Possibly, this is a simpler way.” Often, it’s easy when there’s a clear and better solution. Like when a developer used semaphores to prevent concurrent access to dictionaries. We have concurrent collections for that.

As a reviewee, use the PR description and comments to give enough context to avoid unnecessary discussion. The most frustrating PRs are the ones with only a ticket number in their title. Show the entry point of your changes, tell why you chose a particular design, and signal places where you aren’t sure if there’s a better way.

Voilà! These are some of the lessons I learned after being a reviewer. The next time you open a PR, review your own code first and give enough context to your reviewers.

But, the one thing to improve code reviews is to use short PRs. PRs everyone could review in 10 or 15 minutes without too much discussion. As a reviewer, I wouldn’t mind reviewing multiple short PR’s in a working session than reviewing one single long PR that exhausts all my mental energy.

Also as a reviewer, I learned to stop using leading or tricky questions. And I taught to use simple test values to write good unit tests.

If you’re new to code reviews, check these Tips for better code reviews.

Happy coding!

Lessons I learned from my ex-coworkers about software engineering

18 Dec 2022 #career

This post is part of my Advent of Code 2022.

For better or worse, we all have something to learn from our bosses and co-workers.

These are three lessons I learned from three of my ex-coworkers and ex-bosses about software engineering, designing, and programming.

I didn’t take the time to thank them when I worked with them. This is my thank you note.

1. Inspire Change

From Edgardo, the most senior of all developers, I learned to inspire change. He didn’t talk too much. But when he did, everybody listened.

He always brought new ideas to improve our development process. Instead of doing things himself, he dropped a seed on us. “Hey, what if we do something? Think of a way of achieving something else.”

He was the kind of guy who inspired trust to ask him anything, not only about coding. I tapped his shoulder: “hey, Edgardo. I have a question about life” and he dropped whatever he was doing to listen, answer, and inspire us all.

In emergencies, while everybody panicked, Edgardo was calm, going through log files and running diagnostics.

2. Stand on the shoulders of giants

From Javier, the architect, I learned to stand on the shoulder of giants.

When we ran into issues, he always said “you’re not the first one solving that problem” and “smarter people have already solved that.” He made us look out there first.

Every time I’m tempted to start something from scratch, I start looking at GitHub. Maybe I can stand on somebody else’s shoulders.

Recently, a coworker told me that reading an authorization token from a custom header with ASP.NET Core was impossible. And my first thought was: “we’re not the first ones doing that.” After some Googling, we definitively can do that. Javier was right!

Also, from Javier, I learned to read other people’s source code. He believed that’s the way of learning from others. By looking at his code.

3. Identify your users and their goals

From Pedro, the boss, I learned to keep in mind who our end users are.

More than once, I remember Pedro asking designers to change fonts and increase their size. He said: “you aren’t the one who’s going to use this app. This is for your dad and granddad. This is for oldies.”

Also, from Pedro, I learned to optimize for the most frequent scenario. Once we had to read and validate XML files, Pedro suggested storing the XML documents first and then validating them and continuing with the rest of the processing. Because “90% of the time, those documents are valid.”

Voilà! These are some of the lessons I learned from some of my post coworkers. What have you learned from your coworkers and bosses? I bet they have something to teach you.

For more career lessons, check things I wished I knew before becoming a software engineer, ten lessons learned after one year of remote work, and things I learned after a failed project.

Happy coding!

Older Newer

Just Some Code

Let's refactor a test: Remove duplicated emails

Here are the test to refactor

What’s wrong?

Don’t expose private methods

To Value Object or Not To: How I choose Value Objects

TimeStamp vs DateTime

Promote Primitive Values to Value Objects

Dump and Load to squash old migrations

1. Export database objects and reference data with schemazen

2. Process schemazen exported files

Lessons I learned as a code reviewer

1. Babysitting

2. Surprising solution

Lessons I learned from my ex-coworkers about software engineering

1. Inspire Change

2. Stand on the shoulders of giants

3. Identify your users and their goals