A Simple Alternative to "If.." AKA Strategy Pattern Envy

The application that I am working on is a bit of a spy on other applications. One of the primary systems our business uses internally is a monstrous trading system that allows our traders to interact with the market making buys and sells of various securities. We put orders out on a system called FIX and vendors pick up those orders and fill them as the day goes by. At the end of the day, we need to send a report to big brother describing not only our actions but our vendors actions taken on our behalf for regulatory purposes. Our vendors send us this report which we forward on. Our vendors have various levels of accuracy and one misreport could mean thousands of dollars in fines. So the application I am working on attempts to monitor the entire process and catch mistakes before they cost my company big bucks.

The basic lifecycle of my problem domain revolves around the concept of an Order. The order is created in one system and has a bit of workflow there before it actually leaves the building for the market. Once the order is placed, messages start flying back and forth on the FIX network regarding that order. These messages are not maintained by the system that creates the order, in most cases the bits of info the system needs are picked out and used and some of it is munged into a message field in a order history table. My application monitors the FIX traffic and the order system and tries to reconcile the two with the reports coming back from the vendors at the end of the day. Seriously, it’s a spy.

Anyway, on to my point; in my domain I have an entity that represents an OrderHistory record. it looks something like this:

public class OrderHistory : IAmReadOnly
{
    public virtual decimal OrderId { get; set; }
    public virtual string Action { get; set; }
    public virtual decimal Shares { get; set; }
    public virtual string Message { get; set; }
    public virtual DateTime ActTimestamp { get; set; }
    public virtual decimal Seq { get; set; }

    public virtual string GetClientOrderId()
    {
       //todo: figure out how to get client order id from message
    }

}

The client order id is one of those bits of data that gets munged into the message property. A valid test case might look something like this:

[TestFixture]
public class given_a_message_with_a_long_client_order_id_to_parse : with_an_order_history
{
    private string result;

    public override void Context()
    {
        base.Context();
        orderHistory.Message = "Rec'd execution report: New status: New on Client Order ID:1507532299 Order ID:3341-60 Exec.  ID:603341 from ITG.";
    }

    public override void Because_of()
    {
        result = orderHistory.GetClientOrderId();
    }

    [Test]
    public void I_should_see_the_client_order_id()
    {
        result.ShouldBe("1507532299");
    }
}

The simplest way to get this test to pass is to use a regular expression to pick out the client order id like so:

public virtual string GetClientOrderId()
    {

        var match = new Regex("Client Order ID:(?<clientOrderId>\\w*)\\W").Match(Message);
        if (match.Success)
            return match.Groups["clientOrderId"].Captures[0].Value;
        else
    return string.Empty
}

But here comes the rub, client order id can show up in many different ways in the message text. An order might have thirty-five or more order history records written by different processes written by different developers who apparently don’t talk to each other. So the following is a valid test case as well.

[TestFixture]
public class given_a_message_with_a_short_client_order_id_to_parse : with_an_order_history
{
    private string result;

    public override void Context()
    {
        base.Context();
        orderHistory.Message = "Sent New Order Buy (1 - Market) 4200.0 to ITG (ClOrdID: 1507532299).";
    }

    public override void Because_of()
    {
        result = orderHistory.GetClientOrderId();
    }

    [Test]
    public void I_should_see_the_client_order_id()
    {
        result.ShouldBe("1507532299");
    }
}

So now we need to modify our GetClientOrderId method to handle different formats for the client order id found in message text. At first stab we might do something like this:

public virtual string GetClientOrderId()
        {
            Match match;

            match = new Regex("Client Order ID:(?\\w*)\\W").Match(Message);
            if (match.Success)
                return match.Groups["clientOrderId"].Captures[0].Value;

            match = new Regex("(ClOrdID: (?\\w*))").Match(Message);
            if (match.Success)
                return match.Groups["clientOrderId"].Captures[0].Value;

            return string.Empty;
        }

And this would work, but there are several smells about the approach. Every time I discover a new message format, I have to modify the OrderHistory class. That is a clear violation of the Open Closed Principal. Also, what does all this regular expression goo have to do with my OrderHistory type? It smells like a violation of the Single Responsibility Principal. The even bigger smell is the sheer number of “if” statements I would need if I discovered say forty client order id formats in messages. This method would quickly grow out of control have tons of duplication and be very difficult to read.

So what to do about it… I have two passing tests, so I am going to refactor the regular expression goo out into a parser class. I want my new GetClientOrderId method to simply delegate the parsing to the new class. My first stab at this refactoring looks like this:

internal class ClientOrderIdParser
    {
        internal static string Parse(string message)
        {
            Match match;

            match = new Regex("Client Order ID:(?\\w*)\\W").Match(message);
            if (match.Success)
                return match.Groups["clientOrderId"].Captures[0].Value;

            match = new Regex("(ClOrdID: (?\\w*))").Match(message);
            if (match.Success)
                return match.Groups["clientOrderId"].Captures[0].Value;

            return string.Empty;
        }
    }

Notice that I have extracted the exact logic out into a static method hanging off a new ClientOrderParser class. This is a technique I picked up reading Michael Feather’s amazing book “Working Effectively with Legacy Code”. The idea being wrap the existing functionality first to isolate it. I also put this class in the same file right beside the OrderHistory type and marked it as internal. This new static class is only useful to the OrderHistory class and I like to keep things like that close together. My GetClientOrderId method can now look like this and all my tests should still pass:

public virtual string GetClientOrderId()
        {
            return ClientOrderIdParser.Parse(Message);
        }

The resulting method is a simple clean one line statement that reveals the intent of the underlying logic without polluting the OrderHistory class with all that regular expression goo. Now as new message formats are discovered the OrderHistory type can stay the same removing the OCP smell. It also delegates the parsing responsibility to another type removing the SRP smell. OrderHistory is looking to be in pretty good shape. Our new ClientOrderIdParser is kinda ugly though. It still suffers from the “if” statement issue and code duplication. To clean this up let’s try implementing a simple Strategy pattern that the Regex class will handle for us.

internal class ClientOrderIdParser
    {
        private static IEnumerable regularExpressions()
        {
            return new[]
                       {
                           new Regex("Client Order ID:(?\\w*)\\W"),
                           new Regex("(ClOrdID: (?\\w*))"),
                       };
        }

        internal static string Parse(string messageText)
        {
            var resultingMatch = string.Empty;
            regularExpressions().Where(regex => regex.IsMatch(messageText))
                                .Each(regex => resultingMatch = regex.Match(messageText)
                                                                     .Groups["clientOrderId"]
                                                                     .Captures[0].Value);

            return resultingMatch;
        }
    }

With this refactoring we have completely eliminated the duplication smell and there are no “if” statements to give me heart burn. The parse method simply uses a linq expression to select the correct regular expression and use it. I have split out the definition of the regular expressions into its own method as well. This gives me one single point to modify to add new message formats and a single place to refactor in the future if I decide we need something more robust for defining these regular expressions. Maybe next week I will want to pull these strings out of a configuration file, but for now this will do.

This code is not perfect, there are other improvements that could be made. But it is a significant improvement over the original code and dramatically reduces the amount of noise “if” statements add to code.

Follow me on Mastodon!