Estimation Part 3 - Story Points

Last time we looked at the concepts of accuracy and precision and how getting the two mixed up can lead to all sorts of problems. We also looked a little at our cognitive bias, that has us assuming that precise numbers are also automatically accurate. The upshot of that is that we humans are absolutely terrible at estimation. We mistake precision for accuracy and our accuracy is really bad to begin with.

That last statement is only half true. We are really, really bad at things like guessing how many jelly beans are in a jar, or how tall that person is, or how much does that thing weigh. What we are bad at is absolute estimates. To make up for that, we are really, really good at relative estimates.

 So what is a relative estimate? That rock is twice as heavy as this rock. There are twice as many jelly beans in that jar as there are in this jar. We may not know the absolute number of jelly beans or the weight of either rock, but we can tell with a very high degree of accuracy how their sizes relate to each other.

Apparently this is an evolutionary thing. A lot of animals can do this as well - that pile of berries is larger than this pile of berries, that tasty looking snail is bigger than that tasty looking snail, or that buffalo weighs way more than I do and what's more it looks mad so I might just slink off and look for something easier to eat.

If only there were some way to harness this accuracy when doing estimates. Well, of course there is. We call that relative estimation and Agile methodologies have been using them for years. We don't estimate the size of anything absolutely, we estimate them relative to one another in a dimensionless unit (usually called a story point) then use a secondary measure like velocity to figure out how long they will take to do.

While we are accurate with relative estimates, there are some limitations. We are much more accurate estimating the relative sizes of small things than we are of large things. That pebble is twice as big as that other pebble will be more accurate than that huge boulder is twice the size of that other huge boulder. Gauging the relative number of jelly beans in small jars will be more accurate than the relative number of jelly beans in huge industrial vats. Fortunately, the developers of those relative estimation methods have already thought of that, and built in mechanisms to make that obvious.

The traditional scale we use - the modified Fibonacci numbers (1,2,3,5,8,13,20,40,100) are used, not just because we are geeks and love maths, but also because the numbers are further apart the larger they are. So the system makes it clear that large estimates are inherently less accurate than small ones.

So far relative estimates sound pretty good. And they are. But they aren't perfect. There are still some significant problems with them and typically they all come down to people and our relationship with numbers.

The main problem with story points is that they are numbers. People are weird about numbers. Numbers make us particularly prone to the precision equals accuracy bias. We love to get our numbers "just right". Coupled with the usual demands for "more accurate" (meaning precise) estimates, means that teams can get very hung up about getting the story point estimates right.

I have seen otherwise sane and sensible teams agonise for hours over whether a story should be a 2 or 3 pointer. Mind you, this was a team with a velocity of over 100 points per sprint so a 1 point difference in an estimate was really not going to make a difference either way. But, due to pressure from above for "accuracy", and our own cognitive biases, they agonised away on each and every story.

Story points were designed as a fast and accurate way to estimate. And they are. When done properly. But our built in bias towards precise numbers will often cause story points to become a very heavyweight estimation method. Most of the estimation methods using story points say that you should involve the whole team in estimation. So if the team is getting caught in the precision trap, the whole team is stuck in endless estimation meetings instead of getting things done. By contrast, a lot of traditional methods, while even more heavyweight than story points, involve only one or two people, so the overall impact of them on the whole team is less.

The other problem with story points is that you really need a baseline for them to work properly. You need something to estimate new stories relative to. For a new team, this can be quite difficult. The usual way is to take the smallest story in the backlog and make that an arbitrary 1 pointer and go from there. For a new team though, often with a very incomplete or poorly formed backlog, that arbitrary one pointer could still be very large. So as they refine their backlog, they may end up with half or quarter point stories or need to re-baseline their estimates with a smaller 1 point reference. Once this happens, the precision = accuracy cognitive bias starts to kick in again and they start to worry about whether their 1 point reference is really 1 point or whether there is something else that should be 1 point. Teams can get into continual cycles of estimation and re-estimation.

Of course, not every team uses story points wrongly. For many teams, they work perfectly well and they do just what it says on the box - provide an accurate, fast, lightweight way to estimate. But even the best team gets trapped in the "is it a 2 or 3 point" trap every so often and for the teams that don't (or can't) get story points working, it can really become the worst of both worlds - a heavyweight, time consuming estimation method that occupies the whole team instead of just a few experts. I have seen enough teams that can't get them working to start questioning whether they really are the best way to estimate.

If numbers are the problem, what about methods that don't use numbers at all? They do exist, and we will look at them next time.