The Importance of Slack

Hi Folks

In many companies, resource utilization is considered an important measurement. The idea is that resources, whether they be machines or people should be occupied as close to 100% of the time as possible doing chargeable work. On the surface this looks pretty sensible. No point having people spending time doing things that the company doesn't make money from is there? The more time people spend working on chargeable tasks, the more money the company makes. Right?

But look deeper and things aren't quite that simple. It's all to do with the behaviour of processes under load. A project is not made up of individual resources working independently. It is a process made up of individual process steps with dependencies between them. Development depends on the BAs for requirements. Testers depend on the coders and so on. Let's ignore the individual steps for the moment and focus on the overall process. The important thing about a process is its overall throughput - how much finished product the process can produce in a given time. Also important is the cycle time - how long material spends in process. In a development project, the cycle time is measured from the moment the customer's requirement is discovered to the time that requirement is delivered. Together, cycle time and throughput give an accurate measure of any process.

The important thing about a project is not how busy the individual people are but on how fast and how much the overall project can deliver.

You can think of a process as being a pipeline taking raw material in at one end and spitting finished product out the other. The cycle time is the length of the pipe (shorter pipes mean that the material spends less time in the pipe) and the throughput is the width of the pipe (a larger pipe can fit more stuff through).

The thing with any process is that in order for material to flow through, there must be sufficient capacity in the process to take it. Let's try an experiment. If you take a pipe and start pouring water in slowly, it will all go through smoothly. If you slowly increase the speed of the water you will hit a point where the water starts to back up. You will also notice at this point that the amount of water flowing through the pipe drops right back to almost zero. What is happening is that the water flow through the pipe becomes turbulent and the flow rate drops off - water can't get past the turbulent flow. The really interesting thing is that this point is not reached when the pipe is at 100% capacity. It will happen significantly before the pipe is full.

Any process has a point at which the flow through the process breaks down. Flow through the process become turbulent and the output drops back to almost zero. Networks experience this behaviour. For Ethernet this point is 60% of full capacity. Above 60% and the network suffers what are called collision storms and the network grinds to a halt (OK.. for all the network engineers out there, yes, this assumes a single collision domain). Your PC will reach a point well below 100% utilization where it starts thrashing madly, the screen freezes, the CPU jumps to 100% and the whole thing becomes unresponsive. Projects are no exception. They have a point at which flow becomes turbulent and throughput drops.

If flow through a process starts to be come turbulent, each step in the process can either even out the turbulent flow or will make the turbulence worse. The key here is slack. It takes extra time to even out the flow and if that process step has insufficient spare capacity to do that, they will magnify the turbulence and pass it both up and down stream. Unless up and downstream processes have enough slack to even things out again, the turbulence is magnified and passed on again until the system breaks down completely.

The trouble with a resource utilization view is that it misses this completely. In a turbulent project, throughput is very low but all resources are fully (or more than fully) utilized. Fully utilized but producing very little.

So what causes turbulence in a development project? A key person off sick. A critical bug. A customer issue. A requirement change. Anything that interrupts the smooth flow will cause turbulence and unless there is sufficient slack, the turbulence will spread and the project will grind to a halt until the blockage is removed.

If we focus on driving up the utilization of individual people, we ensure that there is insufficient slack in the process to absorb turbulence and overall project throughput will suffer. In Lean terms this is called sub-optimization - optimizing an individual process step at the expense of the whole system.

If we focus on the whole system (principal 7 of lean - See The Whole) and build in enough slack, the project team will produce more while working less. The team will be happier and less stressed. The customer will be happier. The company will make more money. Everyone wins.