Control Charts

A couple of posts ago I promised you a post on Control Charts. Here it is. For those of you who have never come across these before, they are come from the field of Statistical Process Control (no... really... don't go...stay with me here... it's worth it, I promise). They provide a means of charting process data in a way that answers the single most important question you should be thinking of when looking at a chart of process data. No it's "not when can I go home?", or even "I wonder whether stabbing myself in the eye with this pencil will be more interesting?". It's – "what's normal?". When does the chart show normal variation and when does it show something I should be concerned about? Is this spike in the data something I need to investigate, or is it normal?

There are about a dozen different types of control chart for different types of data and you can use the various types to build a chart for just about any metric you choose, but for an agile project the most useful ones to chart are Velocity and Lead/Cycle Time. Even better, these two metrics use the simplest possible type of control chart – the IMR chart or Individual & Moving Range chart.

The IMR chart isn't a single chart but a pair of charts that work together. The first chart plots the value you are interested in (velocity per sprint or cycle time per story). The second plots the difference between the last two values, so if the last 2 values where 4 and 1, you would plot a 3 on the bottom chart. Values are plotted starting at the left with newer values in a time series on the right. Probably easier if I draw a picture -

IMR Chart - a line graph with the scale in black numbered 1 through 8 and a blue, moving graph point line.

So far this looks just like a regular time series. So what? The key to control charts are those three lines marked AVG, UCL and LCL. The Avg is of course the average (or for the maths folks out there the arithmetic mean) of the values on the chart. The other two lines are the upper and lower control limits. These lines are calculated to fall 3 standard deviations above and below the average. In plan language, they represent your 95% confidence interval - 95% of all values will fall within the UCL and the LCL. The exact formulas (for the maths geeks) are -


One thing to note here - if your lower control limit is less than zero, which happens frequently for cycle time and velocity charts, just leave off the LCL and stop the chart at 0. Particularly for the MR chart, which is why I have only shown the formuyla for the UCL. The LCL is so often 0 that there is not much point. Having a negative cycle time makes no sense unless you own a time machine and if your team has a negative velocity then a control chart is not going to help you.

Fortunately, there are plugins available for tools like Excel that do the calculation for you so you don't need to worry too much about the maths. If you, like me, do worry about the maths, there is a full derivation in the Wiki page here.

So when someone asks how long a story will take to deliver, your cycle time chart will give you an average and a 95% value - "Well, on average we finish a story in 4 days and we finish 95% of them within 15 days." That's a good story to be able to tell. And we haven't even started to look at the full power of the chart.

The real power of a control chart is in determining what is normal and what needs to be looked at further. Essentially, anything within the control limits is regarded as normal. Variation there is the natural result of variation within the process. There is nothing you can do about it other than to change the process to reduce variation. This is often referred to as "normal cause". Anything outside the control limits though, is not a result of normal process variation and should be looked into further. This is known as special or attributable cause.


You can also look at patterns of data inside the control limits that may also indicate systematic issues with the process. things like values repeatedly alternating above and below the mean. In a velocity chart that would be a sure sign that the team was carrying work over from one sprint to the next - not claiming much in sprint 1 and claiming a lot the next sprint as they finish things off.

Other patterns like long runs of data above or below the mean, or a continually increasing or decreasing pattern, could indicate a fundamental shift in the process that needs further investigation. How long a run indicates a real signal is a matter of heated debate in statistical process control circles (as heated as those folks ever get anyway) but 5 seems to be a good place to start.

There are sets of established rules for interpreting control charts (the Western Electric and Nelson rules being the most common) but you really don't need to go to that level of analysis to get the value out of the chart. Wiki, as usual has some good pages on this - Western Electric and Nelson.

Just because all your points are within the control limits doesn't mean that all is rosy either. What the chart is showing is the normal level of variation for your current process. That my not be acceptable. The fact that your velocity bounces around between 5 and 100 may be normal but it's probably not acceptable. What you want to do is change the process to reduce variation (usual suspects apply here - limit WIP, reduce batch size). You can then use the chart to see whether the process change you made has the desired effect. What you should see are your control limits moving towards the mean so your range of normal variation becomes smaller. You can also see whether the process change has made things flow faster (or slower) as the mean will shift up and down as well. You could also see a situation where you improve the mean (higher average velocity) but at the expense of increased variation. Whether that's acceptable is up to you, but at least with the control chart in place you can see that it's happening.

So, that's control charts. One simple set of charts that will give you average and 95% limits, show you what's normal and what's not and will show you whether any process changes you make are having the desired effect. Not bad. Told you it was worth sticking around for.