Interlocked

One of the things I enjoy most about computer programming is multi-threading. It’s hard but very rewarding when you find an improvement that makes a tangible difference to a system. It’s also interesting how sometimes it’s actually desirable to allow some redundant processing in order to reduce contention an thus improve performance overall. I have recently written a class that processes a message asynchronously. The requirement is that only the most recently received message should be processed and that messages are not processed concurrently.

T Event
0 Message A received.
1 Process begin execute asynchronously on message A.
2 Message B received.
3 Message C received
4 Process end execute asynchronously on message A.
5 Process begin execute asynchronously on message C.
… and so on

It’s a trivial problem, but I enjoyed trying different implementations that produced a wide range of results in terms of speed, thread use and memory footprint. As is more often than not the case there is a balancing act between various concerns. I want the least amount of work done on the receive message call so that there is as little delay to the client application as possible. I want the least amount of latency between the receive and process tasks which discounts any type of polling. Finally, as is standard good practice, I want to block the least amount of threads for the least possible time. I settled on the following:


using System;
using System.Threading;
 
namespace MPM.Threading
{
    public class MostRecentMessageProcessor<TMessage> : IMessageProcessor<TMessage> where TMessage : class
    {
        private delegate void ProcessMessageDelegate();
 
        private readonly Action<TMessage> _action;
        private readonly ProcessMessageDelegate _processMessageDelegate;
        private readonly object _lock = new object();
 
        private TMessage _last;
        private TMessage _current;
 
        private int _workItemsQueued;
 
        public MostRecentMessageProcessor(Action<TMessage> action)
        {
            _action = action;
            _processMessageDelegate = ProcessMessage;
        }
        
        public void Process(TMessage message)
        {
            _current = message;
 
            if (Interlocked.CompareExchange(ref _workItemsQueued, 1, 0)==0)
                _processMessageDelegate.BeginInvoke(null, null);
        }
 
        private void ProcessMessage()
        {
            lock (_lock)
            {
                Interlocked.Decrement(ref _workItemsQueued);
 
                if (_current == _last) return;
 
                _action(_last = _current);
            }
        }
    }
}

I’m not for a moment supposing that this is the best possible solution. I’m wise enough to know that I am not wise enough to know that. This is the best solution I found in the time I spent looking at the problem. I’d be very interested to know of any optimizations or entirely different approaches that might be an improvement over this.

I have created and packaged a library that includes this code and will grow over time as I produce more solutions in this space.

https://github.com/myles-mcdonnell/MPM.Threading
http://nuget.org/packages/MPM.Threading

Branches and Forks

This isn’t a blog post about gardening, if it’s gardening that floats your boat see Alan Titchmarsh. This is a blog post describing a strategy for using Git and GitHub to manage source code of a project that has several, perhaps dozens, of developers. When we talk about scalability we are usually referring to runtime scalability, but scalability of development effort is the focus of this post, at least as far as source control has any impact.

The Hub Repository

Yes Git is a Distributed VCS, but this strategy defines a hub repository. This doesn’t break the DVCS model, every clone and fork still holds the entire history for each tracked branch. Commits are never made to the hub repository directly. Changes to the hub are always merges from forked repositories

Forks

GitHub makes forking easy. A fork is just a copy of a repository, nothing more; the copy doesn’t know where it came from (GutHub does, the repository itself does not). The strategy described here dictates that all development takes place on a fork. Developers fork the hub repository to their own GitHub account, clone locally, make changes, rebase and squash commits (this is optional but often desirable), commit changes, push to GitHub (repeat n times) before sending pull requests back to the hub repository. A pull request is nothing more than a message saying “please merge this set of changes that I made to my copy to your copy”.

Branches

Master

The master branch is always in sync. with the current production (or most up to date and stable, however the latest release is defined) version of the code base. The master branch is only ever committed to as a result of a merge from a release branch. Break this rule, or any of the rules in this post, and you are on your own.

Integration

The integration branch is where the action is. This is where new features are merged. Commits to integration are either the result of a merged pull request from a forked repository or a merge from a release hot fix and NOTHING ELSE! Commits are not made to the hub repository directly. The reason for this is that it gets progressively harder to manage the repository the more developers you have committing to it directly. Quality is king. Be very specific, measured and detailed about the changes you integrate. Integrations that break things are EXPENSIVE * DEVELOPERS!

Feature

Any number of feature branches may exist at any time. Feature branches are seeded from integration and are merged back to integration. The beauty of the forking model is that feature branches need not appear in the central repository. Developers can branch, commit, push and merge to integration (repeat n times) then send a pull request to the hub repository thus making a clean separation between the fine grained process of committing changes (work may be committed and pushed simply to ensure it is persisted beyond a local failure) and actually integrating features into the hub repository.

Release Candidate

A release candidate branch is seeded from integration. The lifetime of the release branch is up to deployment at which time it is merged to master, or abandonment. The only changes made to an RC branch are via merges from hot fix branches.

Hot Fix

A hot fix branch may be seeded from a release candidate branch or from master. The lifetime of a hot fix branch is up to the point where it is merged back to it’s origin (either master or RC branch) and integration. A hot fix branch is unique amongst the branches as it is the only branch that is merged to two branches at it’s termination.

Branching Strategy

In Branches and Forks – Pt 2 I will go through some example work flows and detail the various git commands required, including a work flow that enables multiple developers to work on a single feature branch.

Enumeration Pattern – Pt II

Last month I had the good fortune of attending the DDD immersion course lead by Eric Evans. The opening gambit had nothing to do with computers, we looked at a map. A map that did not represent the world geographically, much like the London tube map. Why? To understand modelling. A model exists to express something in a particular context. It is not necessary, or perhaps even possible, to create a model that represents it’s subject through every perspective. The tube map is a model of the underground network that enables commuters to navigate terminals, and it does this flawlessly. Try using the tube map to work out how far away Cockfosters is from Woodford Green? Forget it, you are barking up the wrong tree.

So, how does any of this relate to Enumeration Pattern – Pt I? The solution put forward in that post includes the concept of degrees. An insightful colleague of mine, Ashter, pointed out that it was superfluous to the requirement and I agreed with him. I could sense that it was not quite right, but now I have revisited the problem domain I fully comprehend my mistake; I allowed my understanding of orientation to influence the model and in so doing I added the concept of degrees which was completely unnecessary in order to satisfy the requirement and in fact added complexity that didn’t need to be there.

Here is my latest solution:


public class Orientation
{
    public static readonly Orientation North =  new Orientation("N", 0);
    public static readonly Orientation East = new Orientation("E", 1);
    public static readonly Orientation South = new Orientation("S", 2);
    public static readonly Orientation West = new Orientation("W", 3);

    private static readonly Orientation[] All = new [] { North, East, South, West };

    private readonly string _mnemonic;
    private readonly ushort _sequence;

    protected Orientation(string mnemonic, ushort sequence)
    {
        _mnemonic = mnemonic;
        _sequence = sequence;
    }

    public string Mnemonic
    {
        get { return _mnemonic; }
    }

    public Orientation TurnLeft()
    {
        return All[(_sequence+3) %4];
    }

    public Orientation TurnRight()
    {
        return All[(_sequence + 1) % 4];
    }
        
    public override string ToString()
    {
        return _mnemonic;
    }

    public static Orientation Parse(string mnemonic)
    {
        if (string.IsNullOrWhiteSpace(mnemonic) || All.All(o => o.Mnemonic != mnemonic.Trim().ToUpper()))
            return null;

        return All.First(o=>o.Mnemonic==mnemonic.Trim().ToUpper());
    }
}

and tests of course;


[TestFixture]
public abstract class OrientationStaticsTests
{
    [Test]
    public void Parse()
    {
        Orientation.Parse("w").Should().Be(Orientation.West);
        Orientation.Parse(" S ").Should().Be(Orientation.South);
        Orientation.Parse(" e").Should().Be(Orientation.East);
        Orientation.Parse(" N  ").Should().Be(Orientation.North);
    }
}

public abstract class OrientationTests
{
    protected abstract Orientation Orientation { get; }
    protected abstract string Mnemonic { get; }
    protected abstract Orientation Left { get; }
    protected abstract Orientation Right { get; }

    [Test]
    public void OrientationValues()
    {
        Orientation.Mnemonic.Should().Be(Mnemonic);
    }

    [Test]
    public void Turn()
    {
        Orientation.TurnLeft().Should().Be(Left);
        Orientation.TurnRight().Should().Be(Right);
    }
}

[TestFixture]
public class NorthOrientationTests : OrientationTests
{
    protected override Orientation Orientation
    {
        get { return Orientation.North; }
    }

    protected override string Mnemonic
    {
        get { return "N"; }
    }

    protected override Orientation Left
    {
        get { return Orientation.West; }
    }

    protected override Orientation Right
    {
        get { return Orientation.East; }
    }
}

[TestFixture]
public class EastOrientationTests : OrientationTests
{
    protected override Orientation Orientation
    {
        get { return Orientation.East; }
    }

    protected override string Mnemonic
    {
        get { return "E"; }
    }

    protected override Orientation Left
    {
        get { return Orientation.North; }
    }

    protected override Orientation Right
    {
        get { return Orientation.South; }
    }
}

[TestFixture]
public class SouthOrientationTests : OrientationTests
{
    protected override Orientation Orientation
    {
        get { return Orientation.South; }
    }

    protected override string Mnemonic
    {
        get { return "S"; }
    }

    protected override Orientation Left
    {
        get { return Orientation.East; }
    }

    protected override Orientation Right
    {
        get { return Orientation.West; }
    }
}

[TestFixture]
public class WestOrientationTests : OrientationTests
{
    protected override Orientation Orientation
    {
        get { return Orientation.West; }
    }

    protected override string Mnemonic
    {
        get { return "W"; }
    }

    protected override Orientation Left
    {
        get { return Orientation.South; }
    }

    protected override Orientation Right
    {
        get { return Orientation.North; }
    }
}

Download the code here: https://dl.dropbox.com/u/30149716/Blog/EnumPart2.zip

Flyweight Concurrency

Thinking about Flyweight today. The wikipedia entry said nothing about concurrency (I have now contributed the content in this post to the article) and the java example is flawed in a multi-threaded context as it is possible to create two flavour instances for the same flavour. To make matters worst the Java CoffeeFlavour does not deal with equality and thus two separate instances of mocha are not equal. This is less than ideal as the whole point of Flyweight is to enable objects to be shared, which more often than not means between threads as well as clients.

In DDD parlance Flyweight types are by definition value types. They are not entities, they have no unique identity, they are immutable and one instance is equal to another instance of the same value. Here is a C# implementation of CoffeeFlavour which satisfies this criteria:


public class CoffeeFlavour
{
    private readonly string _flavour;

    public CoffeeFlavour(string flavour)
    {
        _flavour = flavour;
    }

    public string Flavour
    {
        get { return _flavour; }
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        return obj is CoffeeFlavour && Equals((CoffeeFlavour)obj);
    }

    public bool Equals(CoffeeFlavour other)
    {
        return string.Equals(_flavour, other._flavour);
    }

    public override int GetHashCode()
    {
        return (_flavour != null ? _flavour.GetHashCode() : 0);
    }

    public static bool operator ==(CoffeeFlavour a, CoffeeFlavour b)
    {
        return Equals(a, b);
    }

    public static bool operator !=(CoffeeFlavour a, CoffeeFlavour b)
    {
        return !Equals(a, b);
    } 
}

and tests:


[TestFixture]
public class CoffeeFlavourTests
{
    [Test]
    public void Equality()
    {
        var coffeeFlavourA = new CoffeeFlavour("Mocha");
        var coffeeFlavourB = new CoffeeFlavour("Mocha");
        var coffeeFlavourC = new CoffeeFlavour("Almond");

        //Identity of this value type is of no concern.  
        coffeeFlavourA.Equals(coffeeFlavourB).Should().BeTrue();
        coffeeFlavourB.Equals(coffeeFlavourC).Should().BeFalse();
    }

    [Test]
    public void EqualityOperators()
    {
        var coffeeFlavourA = new CoffeeFlavour("Mocha");
        var coffeeFlavourB = new CoffeeFlavour("Mocha");
        var coffeeFlavourC = new CoffeeFlavour("Almond");

        (coffeeFlavourA == coffeeFlavourB).Should().BeTrue();
        (coffeeFlavourA == coffeeFlavourC).Should().BeFalse();

        (coffeeFlavourA != coffeeFlavourB).Should().BeFalse();
        (coffeeFlavourA != coffeeFlavourC).Should().BeTrue();
    }  
}

Now with regard to creating our instances, we have options. If we know all of the possible values in advance we can create them all ahead of time. The alternative is dynamic flyweight instantiation in which case we can choose between having the minimum memory footprint with some thread contention or no thread contention with the possibility of having more than one instance per value. The latter is only viable because our flyweight is a true value type, has overridden equals and overloaded the == and != operators.

Example in C# of two factory implementations, the names of which are self-explanatory:


using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Threading;

public interface ICoffeeFlavourFactory
{
    CoffeeFlavour GetFlavour(string flavour);
}

public class ReducedMemoryFootprint : ICoffeeFlavourFactory
{
    private readonly object _cacheLock = new object();
    private readonly IDictionary<string, CoffeeFlavour> _cache = new Dictionary<string, CoffeeFlavour>();

    public CoffeeFlavour GetFlavour(string flavour)
    {
        if (_cache.ContainsKey(flavour))
            return _cache[flavour];

        var coffeeFlavour = new CoffeeFlavour(flavour);

        ThreadPool.QueueUserWorkItem(AddFlavourToCache, coffeeFlavour);

        return coffeeFlavour;
    }

    private void AddFlavourToCache(object state)
    {
        var coffeeFlavour = (CoffeeFlavour)state;
        if (_cache.ContainsKey(coffeeFlavour.Flavour)) return;

        lock (_cacheLock)
        {
            if (!_cache.ContainsKey(coffeeFlavour.Flavour))
                _cache.Add(coffeeFlavour.Flavour, coffeeFlavour);
        }
    }
}

public class MinimumMemoryFootprint : ICoffeeFlavourFactory
{
    private readonly ConcurrentDictionary<string, CoffeeFlavour> _cache = new ConcurrentDictionary<string, CoffeeFlavour>();

    public CoffeeFlavour GetFlavour(string flavour)
    {
        return _cache.GetOrAdd(flavour, flv => new CoffeeFlavour(flv));
    }
}

Download source code here.

Splitting a Git repository in two

Until now I have managed with only a handful of git commands and Git Extensions. I know how incredibly flexible Git is and I ‘get’ the DVCS paradigm, but I have not needed to use it, until that is I had to split a large repository with lots of branches in two. The new repository must have a subset branches from the source repository and the 4-5 developers working those branches much be able to continue to work seamlessly post transition.

With Git it seems there are always at least a dozen ways to achieve the same result. Initially I spent time looking at the filter-branch command thinking that I would re-write history at the same time as splitting the repos to save space. I decided against this as space isn’t as issue (ok, the repo is nearly 700Mb so I do need to consider the size, but more on that later) and I’d rather not re-write history without a very good reason. I’m being conservative, I’ll take the lightest touch approach. What I settled on is quite obvious really, as are most things most once you have worked them out;

  • Create the new GitHub repo but do not initialize it
  • Clone the source repo locally
  • Track the origin branches I want to migrate
  • Checkout those branches and pull
  • Add the new GitHub repo as a remote
  • Push each of the branches to the new remote
  • Checkout the integration branch
  • Remove the code not required in the new repo from Integration branch, commit and push

Script like this:

#!/bin/bash

sourceRepoPath="{git hub repo url}"
sourceRepoName="{repo name}"
sinkRepoPath="{git hub repo url}"
sinkRemoteName="{anyname}"

branches="
	master
	Integration
	featureBranch1
	featureBranch2
	featureBranch3
	"

git clone $sourceRepoPath
git pull

cd $sourceRepoName

git remote add $sinkRemoteName $sinkRepoPath
git push $sinkRemoteName --set-upstream Integration

for branch in $branches
do
	git branch -t $branch $sinkRemoteName/$branch
	git checkout $branch
	git pull origin/$branch
	git push $sinkRemoteName --set-upstream $branch
done

Now the devs on the new repo can continue on their feature branches and merge from Integration branch at their leisure to remove the redundant code. With regard to the size of the repo I’m assuming that at some stage in the future I can truncate the history of the new repo to a time after every branch had the code removal commit merged and before the inception of every current branch thus enabling git to determine all common ancestors.

Enumeration Pattern

UPDATE: I have since followed up this post with Enumeration Pattern – Pt II

In this post I want to discuss a pattern that I use regularly; Enumeration. I won’t claim this pattern as my own because it’s a fairly obvious pattern and I expect (although I have not researched) many people have come up the same or very similar pattern long before me, but I will describe in detail how I implement it in C# and why it is better than using vanilla C# enums.

The Problem

Even the newest of C# noob knows about enums, right?

public enum Orientation
{
    North,
    South
}

Handy, so now I can write conditional logic using Orientation, e.g.:

public void Foo(Orientation orientation)
{
    switch (orientation)
    {
        case Orientation.North:
            ...
            break;
        case Orientation.South:
            ...
             break;
    }
}

There is more to be said about enums, such as the static methods on the base class, use of flags, integer mapping etc., but that isn’t the point of this post as none of these things help with the problem I will go on describe.

What I don’t like about enums is that they are static; you can’t extend their behaviour via inheritance. In fact they have no behaviour and wherever there are enums invariably there is conditional logic. Switch statements and if..else blocks are often the enemy of extensibility and in this case encapsulation. For example, if I build a system that makes extensive use of the above enum and I now want to support NorthEast, SouthEast, SouthWest and NorthWest I have to revisit every conditional block that references Orientation. This is not the fertile promised land of OO <milkAndHoney/>, this is procedural wasteland <tumbleweed/>.

“Wait! you forgot extension methods” I hear you say. Well no, I didn’t. Yes, you can use extension methods to extend enums and rid yourself of all those conditional blocks in the client code, but extension methods are no panacea.  A full discussion of the pros and cons of extension methods is beyond the scope of this post, but sufficed you say that with regard to enums this approach only moves the problem, it doesn’t solve it. There will still be lots of conditional logic, albeit in one place and there will be lots of extrinsic data kicking around that should be intrinsic to the enum. There are other problems too such as you can’t override extension methods and you can’t mock enums (ok, there would be no point even if you could as they have no instance members, but you see where I am going with this)

What I do like about enums is that they are:

  • Immutable
  • Referenced statically (i.e. Direction.North so no need to instantiate anything explicitly)

So how can we get the best of both worlds?

The Solution

Below are two subtly different implementations of the enumeration pattern which are equivalent to the above with some bells and whistles to demonstrate the power of this pattern over plain enums. This is a subset of the complete enumeration pattern solution (VS2010):

using System.Collections.Generic;
using System.Linq;

namespace EnumerationPattern
{
    public interface IOrientation
    {
        string Name { get; }
        Degrees Degrees { get; }
    }

    public abstract class Orientation : IOrientation
    {
        public static readonly IOrientation North = new NorthOrientation();
        public static readonly IOrientation South = new SouthOrientation();

        public static readonly IEnumerable All = new[]
        {
            North, South
        };

        public IOrientation Parse(string name)
        {
            if (string.IsNullOrWhiteSpace(name))
                return null;

            name = name.ToLower().Trim();

            return All.FirstOrDefault(
                o => o.Name.ToLower().Trim().Equals(name));
        }

        private readonly string _name;
        private readonly Degrees _degrees;

        protected Orientation(string name, Degrees degrees)
        {
            _name = name;
            _degrees = degrees;
        }

        public Degrees Degrees
        {
            get { return _degrees; }
        }

        public string Name
        {
            get { return _name; }
        }

        public override string ToString()
        {
            return Name;
        }

        private class NorthOrientation : Orientation
        {
            public NorthOrientation()
                : base("North", Degrees.Zero) {}
        }

        private class SouthOrientation : Orientation
        {
            public SouthOrientation()
                : base("South", Degrees.OneHundredEighty) {}
        }
    }

    public struct Degrees
    {
        public static readonly Degrees Zero = new Degrees(0);
        public static readonly Degrees OneHundredEighty = new Degrees(180);

        public static readonly Degrees[] All =
        {
            Zero,
            OneHundredEighty,
        };

        private readonly uint _value;

        private Degrees(uint value)
        {
            _value = value;
        }

        public static Degrees operator+(Degrees left, Degrees right)
        {
            var newValue = left._value + right._value;

            newValue = (newValue < 360) ? newValue : newValue - 360;
            return All.First(d => d._value == newValue);
        }

        public static Degrees operator -(Degrees left, Degrees right)
        {
            var leftValue = (right._value > left._value)
                                ? left._value + 360
                                : left._value;

            var newValue = leftValue - right._value;

            return All.First(d => d._value == newValue);
        }

        public static bool operator ==(Degrees left, Degrees right)
        {
            return left._value == right._value;
        }

        public static bool operator !=(Degrees left, Degrees right)
        {
            return left._value != right._value;
        }

        public override int GetHashCode()
        {
            return "Degrees".GetHashCode() + _value.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            if (obj is Degrees)
                return this == (Degrees) obj;

            return false;
        }

        public override string ToString()
        {
            return _value.ToString();
        }
    }
}

Now we can use inheritance, we can introduce private state and we can define instance members to our hearts content.  We also have an IOrientation interface which is invaluable when it comes to mocking for TDD whilst maintaining immutability and static referencing as is the case with an enum.

You can download a sample solution here that makes use of the enumeration pattern. If the advantage is not immediately obvious then as an exercise try to write an implementation that satisfies the test suite where Orientation, Degrees and RotationDirection are all enums.

Test Driven

At least ten years ago a clever chap I had the good fortune to work with introduced me to Test Driven Development. He was far less fortunate as at the time I didn’t fully appreciate the benefits of this approach to developing software. After paying some ear service and making minimal effort I lost interest and continued to struggle on the old way. More fool me.

Around 6 or 7 years ago I tried again. I picked up TDD (Beck) put in the effort to get over the hump and never looked back. Now I’m fully hooked; line and sinker. I liken working on a code base without a comprehensive set of unit tests to poking it with a stick until it works. Of course you can never say with a good enough degree of certainty that it ‘works’ without the test suite, you just observe that it does what you are trying to make it do at the time along with everything else you can remember that it should do. The truth is, I’m not exactly the best programmer in the world, and as my wife will tell you I can only manage to do one thing at a time, and sometimes barely that. That is why I need unit tests and why I need to use TDD; I can only hold so much in my mind at any one time.  When writing even a relatively simple application a developer must consider many factors and eventualities and so I argue that unless you are a genius to rival Einstein or you are writing Hello World you too need TDD.

Not only does TDD change the way you write software, it changes the software you write. It forces design decisions to facilitate testing, and this is a very good thing. All modern combustion engines have diagnostic computer systems. The engine has been designed with fault diagnosis as a key design goal. The result is as a user you are warned before the engine suffers a breakdown and you spend less time waiting for it to be repaired when it does. A unit test suite is a diagnostic system. A code base without a comprehensive unit test suite is a liability.

A few years ago I read a job ad. for a ‘Unit Tester’. The job specification made it clear that this role involved taking code written by the development team and writing unit tests for it. Only someone with absolutely no understanding or experience of unit testing could have conceived that position and even less so anyone who actually took the job. The TDD mantra is Test First.  It is not Test Last. Writing and satisfying tests is an integral part of the development process, not something done post implementation.

In a nutshell:

  • Following TDD enables a developer to focus on what is being built separately from how it is being implemented.  Design the interface and the interaction between types before the implementation.
  • TDD naturally leads to loosely coupled code.  Mocking is an essential part of unit testing, in order to mock effectively classes must be loosely coupled, typically via an interface.
  • TDD results in high level of test coverage. Combined with continuous integration this is the best protection from regression bugs and also serves as an invaluable document of exactly how the system is designed to behave.

Enough waxing lyrical, there is plenty of TDD evangelism on the web for you to peruse. It’s not exactly a new topic, but it does come up a lot on StackOverflow so in following posts I’m going to write up a story board detailing exactly how I go about TDD using C#, Visual Studio with ReSharper (which IMHO is absolutely essential) and the excellent MOQ library.