Estimation Part 5 - Story Counting

Last time we looked at T-shirt sizing and some of the benefits and problems that method has. We found that its greatest benefit was also its biggest disadvantage. The use of something completely abstract (T-shirt sizes) removes all our cognitive biases around numbers but by not using numbers we can’t really compare estimates against each other and make predictions except by converting back to numbers which of course brings our biases back.

We can use T-shirt sizes usefully if we make an adjustment to the scale we use. Rather than have Small, Medium, Large and Extra Large, let's just have Small and Extra Large. Now, this would obviously never work for clothing because people come in a range of sizes. Stories come in a range of sizes as well, so what gives? What makes this useful? The trick here is that unlike people where we can’t dictate what size someone should be (outside the modelling industry and certain trendy nightclubs), we can, and should, be pretty strict about what size a story can be before we accept it onto a sprint.

Remember our criteria for good stories – INVEST? The S in INVEST is for Small. We want small stories in our sprints not big ones. There are many reasons why small stories are good. Small stories are more likely to get completed than large ones, small estimates are more accurate, small batches flow through a process faster than large ones, and so on.

Many teams set a point limit for stories. Nothing bigger than a 13, or nothing bigger than an 8 can be considered Ready to Code. Anything larger needs to be broken up. Putting this hard limit on the size of a story has an interesting effect. Due to the scale we use for estimation, where larger estimates are further apart than smaller ones, the average size of a story in a team is invariably one size smaller than the maximum allowed. If the team has a 13 point limit, their average story size will invariably be 8 points. If their limit is 8, their average will be 5 and so on. Don’t believe me? Take a look at your backlog and do the maths. Happy now? OK. It's all to do with the nonlinear scale we use. Regardless of how this comes about (I could go on for ages about distributions and statistics but I value your sanity, and ongoing readership too highly), we can make use of this.

If we make anything smaller than our limit a Small and everything bigger an Extra Large, we really don’t need to do any more estimation than that. The ”small” stories will average out so we don’t need to keep track of their individual sizes. All we need to do is count how many we deliver each sprint. Our velocity then is in units of “stories delivered” rather than points. The rest of the stories are too big for a sprint and need to be broken down.

This count of stories method has a few interesting effects. For a start it makes the team’s velocity a bit more concrete. Rather than delivering “points” or whatever your estimation units are, the team is delivering stories and a story represents a user need fulfilled (if your stories don’t smell). So the team’s velocity is now measured directly in terms of user needs fulfilled. This tends to reduce a lot of the gaming of the system that we see with more abstract estimation units. If the team uses points and the boss insists that they double their velocity, the easiest thing to do is double the point estimates. Same amount of work, double the velocity. You can’t do that with stories delivered. You can only slice them so small before they no longer represent a finished piece of work. Any increase in the number of stories delivered is a real increase in the number of user needs being met.

The other thing this does is encourage the team to slice their stories as small as possible. Everyone wants a high velocity so the incentive is to have really small stories. This is a really good thing - small stories are more likely to get completed than large ones, small estimates are more accurate, small batches flow through a process faster than large ones and so on. Big stories hurt the team’s velocity. If they use points, they can accept large stories onto the sprint without hurting velocity as they will get 8 or 13 or whatever points if they finish it. This leads to larger stories on the backlog and all the issues that come with that – large stories are less likely to complete, have slower cycle times etc. Take what I said before and reverse it.

So counting stories directly drives some good behaviours, and it makes some bad behaviours much less likely. It also gets us away from a lot of our biases around accuracy of estimation. What’s not to like?

There are some common objections to this method – how can immature teams ever hope to slice stories properly? How can you look forward if you haven’t estimated the big stories? And many more. I’ll look at those objections and how they can be overcome next time…