Implementing a Generic File Export System

You may find this shocking but the financial world runs on text files, delimited text files to be exact. At work we jokingly call it the file system message bus. If you want to get information from one system to use in your system you typically write a file to a request directory and then watch a response directory for a file to be dropped with your data in it. The project that I am currently working on is attempting to load our internal data into a large trading platform called Charles River. To load data into Charles River, you guessed it, you write a delimited file to a specific directory and then Charles River sucks it up for processing. It is all very old school.

I was tasked with writing a windows service that would host some jobs that kick off at specific times during the day to take data that is staged in a database and write it out to flat files. The data must be in a very specific format for each data type. We are exporting things like Securities, Accounts and Positions. Basically the stuff the traders need to make good investment choices for our customers.

Starting from the inside, I need a type that will represent the data I am going to be working with.

using System;
using System.Collections.Generic;

namespace System.DTO
{
    [AttributeUsage(AttributeTargets.Property)]
    public class ExportableAttribute : Attribute
    {
        public int Index { get; set; }

        public ExportableAttribute()
        { }

        public ExportableAttribute(int index)
        {
            Index = index;
        }
    }

    public interface ILoadable
    {
        DateTime LoadDate { get; }
        string ToExportString();
    }

    public class Account : ILoadable
    {
        public virtual int Id { get; set; }
        public virtual string ProductLine { get; set; }
        public virtual DateTime LoadDate { get; set; }
        [Exportable]
        public virtual string ACCT_CD { get; set; }
        [Exportable]
        public virtual string ACCT_NAME { get; set; }
        [Exportable]
        public virtual string ACCT_TYP_CD { get; set; }
        [Exportable]
        public virtual string CRRNCY_CD { get; set; }

        public virtual string ToExportString()
        {
            return PropertyCSVDumper.Dump(this);
        }
    }
}

This code represents several concepts. First we have the concept of an Account. It exposes public fields and pretty much lines up exactly with the file format we want to export. If we were to export the file using the exact order of the fields in the Account type we would pretty much have what we want. The next concept is the Exportable attribute, this allows me to decorate the fields in the Account DTO that I want to write to the file allowing me to exclude properties that are used only for queries or data base record ids. The third concept is the ILoadable interface. This is our first step to making a generic system that can be applied to all DTOs that need to be dumped to file. The feature that I am working on requires that I get the latest loaded data for each data type and write them to a file. We determine this by the LoadDate property in the database. So the interface specifies that each type must have a LoadDate. It also specifies that each type must have a method ToExportString(). This is our last concept from this file. we need a way to represent our Account and other types as a delimited string. To do this I wrote the following class.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;

namespace System.Domain
{
    public class PropertyCSVDumper
    {
        public static string Dump(object item)
        {
            var values = new List<string>();
            item.GetType().GetProperties()
                .Where(IsExportable)
                .OrderBy(GetExportOrder)
                .Each(x => values.Add(convertValueToString(x.GetValue(item, null))));
            return values.Join("~");
        }

        static int GetExportOrder(PropertyInfo prop)
        {
            return Attribute.GetCustomAttributes(prop)
                            .Where(IsExportable)
                            .Cast<ExportableAttribute>()
                            .FirstOrDefault().Index;
        }

        static bool IsExportable(Attribute attribute)
        {
            return attribute.GetType() == typeof(ExportableAttribute);
        }

        static bool IsExportable(PropertyInfo prop)
        {
           return Attribute.GetCustomAttributes(prop).Any(IsExportable);
        }

        static string convertValueToString(object value)
        {
            if (value is DateTime || value is DateTime?)
                return ConvertDateToString(value);
            return value != null ? value.ToString() : string.Empty;
        }

        static string ConvertDateToString(object value)
        {
            if (value == null)
                return string.Empty;

            return ((DateTime) value).ToString("MM/dd/yyyy");
        }
    }
}

The PropertyCSVDumper class uses reflection and linq magic to pull the values out of a simple DTO class, exclude the unattributed fields and generate a delimited string. At first glance this might seem like overkill for something that could be as simple as had formatting a string for each type. But doing it this way allows me to write this code once in one place; which is a good thing. Also, As you can see I have to write the dates in a specific format; if I were hand assembling strings for each type if that format changed I would literally have to hunt down every DateTime ToString and edit it. Which would make me want to kick puppies.

So now we have our DTO types and a way to convert them to a delimited string, next up is a way to get the data from the database. To do this I use NHibernate and created a simple Table Data Gateway to access the data. It looks something like this. Notice this is where we start doing some serious generic love.

using System;
using System.Collections.Generic;
using System.Linq;
using NHibernate.Linq;
using NHibernate.Transform;

namespace System.Data
{
    public interface IGateway<T>
    {
        DateTime GetMaxDate();
        IEnumerable<T> GetFor(DateTime loadDate);
    }

    public class Gateway<T> : IGateway<T> where T : ILoadable
    {
        protected readonly IUnitOfWorkManager worker;

        public Gateway(IUnitOfWorkManager worker)
        {
            this.worker = worker;
        }

        public DateTime GetMaxDate()
        {
            return worker.On(CRDStaging =>
                                     CRDStaging.Query<T>()
                                         .Max(x => x.LoadDate)
                                 );
        }

        public virtual IEnumerable<T> GetFor(DateTime loadDate)
        {
            return worker.On(CRDStaging =>
                                     CRDStaging.Query<T>()
                                         .Where(x => x.LoadDate >= getSearchDate(loadDate))
                                         .OrderBy(x => x.LoadDate)
                                 );
        }

        protected DateTime getSearchDate(DateTime loadDate)
        {
            return loadDate - loadDate.TimeOfDay;
        }
    }
}

Notice that the gateway is constrained to only work with types that implement the ILoadable interface, this gives us access to the LoadDate property of our DTOs which is very handy. The gateway implements two methods one to get the max load date for a type and one to get an enumerable set based on a given load date. The IUnitOfWorkManager interface represents some shared functionality we use her at work that basically wraps an NHibernate session in a transaction and manages that process. It is out of the scope of this post, but I may post on it in the future.

Now we have a DTO, a method to represent that DTO as a delimited string and a method for querying the database to get a set of DTOs. Next up I need a way to write files generically.

namespace System.IO
{
    public interface IWriteFiles
    {
        void WriteFile(Type type, Action<StreamWriter> action);
    }

    public class FileWriter : IWriteFiles
    {
        readonly IEnvironmentService environment;
        readonly IFileSystem fileSystem;

        public FileWriter(IEnvironmentService environment, IFileSystem fileSystem)
        {
            this.environment = environment;
            this.fileSystem = fileSystem;
        }

        public void WriteFile(Type type, Action<StreamWriter> action)
        {
            using(var writer = new StreamWriter(getFileStream(type)))
            {
                action(writer);
            }
        }

        Stream getFileStream(Type type)
        {
            var fileLocation = getFileLocation(type);

            if(fileSystem.FileExists(fileLocation))
                fileSystem.DeleteFile(fileLocation);

            return fileSystem.OpenFile(fileLocation, FileMode.CreateNew,
                                           FileAccess.Write, FileShare.None);
        }

        string getFileLocation(Type type)
        {
            var writeDirectory = environment.GetSetting(type.Name + "Path");
            var fileName = environment.GetSetting(type.Name + "FileName");

            return Path.Combine(writeDirectory, fileName);
        }
    }
}

The FileWriter class also has dependencies on some internal tools; IEnvironmentService and IFileSystem which are abstractions over ConfigurationManger and System.IO respectively. The file writer simply takes a type and an action of type stream writer, looks up the file to write to using the environment service and executes the action on the file using the file system service. The FileWriter has no idea what it is writing because that bit of functionality will be provided somewhere else.

Finally we need to tie all these objects together. We use Quartz.NET to schedule jobs, so the next class is a generic Quartz.NET job for exporting files.

using System.Collections.Generic;
using System.Linq;
using Quartz;

namespace System.Jobs
{
    public class FileExport<T> : IStatefulJob where T : ILoadable
    {
        readonly IGateway<T> gateway;
        readonly IWriteFiles fileWriter;
        readonly IUnitOfWorkManager uowManager;

        public FileExport(IGateway<T> gateway, IWriteFiles fileWriter, IUnitOfWorkManager uowManager)
        {
            this.gateway = gateway;
            this.fileWriter = fileWriter;
            this.uowManager = uowManager;
        }

        public void Execute(JobExecutionContext context)
        {
            uowManager.Begin();
            var exportDate = gateway.GetMaxDate();
            var loadables = gateway.GetFor(exportDate);

            writeRecords(loadables);
            uowManager.Complete();
        }

        void writeRecords(IEnumerable<T> records)
        {
            var type = records.FirstOrDefault().GetType();
            fileWriter.WriteFile(type, 
                writer => records.Each(
                    record => writer.WriteLine(record.ToExportString())));
        }
    }
}

The FileExporter takes dependencies on the gateway, file writer and unit of work manager and simply orchestrates their interaction. The file export is a single transaction so the unit of work manager is used to wrap it in a transaction. I get the export date and then use the export date to get a list of loadables and write them to a file using the file writer.

The last piece we have is to actually wire the job up to be triggered. We lean on Quartz.NET to do this based on configuration. Here is what the Quartz.NET configuration file looks like.

<?xml version="1.0" encoding="utf-8" ?>
<quartz xmlns="http://quartznet.sourceforge.net/JobSchedulingData"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  version="1.0" overwrite-existing-jobs="true">
     <!-- These jobs use CRON scheduling
         Read more here: http://quartznet.sourceforge.net/tutorial/lesson_6.html
    -->
    <job>
        <job-detail>
            <name>Account Export Job</name>
            <job-type>System.Jobs.FileExport`1[[System.DTO.Account,System]], System</job-type>
            <durable>true</durable>
        </job-detail>
        <trigger>
            <cron>
                <name>account-cron-trigger</name>
                <job-name>Position Export Job</job-name>
                <cron-expression>0 30 03 * * ?</cron-expression>
            </cron>
        </trigger>
        <trigger>
            <simple>
                <name>account-startup-trigger</name>
                <job-name>consoleWriterJob</job-name>
                <start-time>1982-06-28T12:24:00.0Z</start-time>
                <repeat-count>0</repeat-count>
                <repeat-interval>2000</repeat-interval>
                2 seconds
            </simple>
        </trigger>
    </job>
</quartz>

We simply create a job for each of the file exports we want to do specifying the DTO type and then create a schedule using CRON notation. Happy file exporting.

Follow me on Mastodon!