Lies, Damned Lies, and Two-Week Estimates

One of the biggest problems in software development is trying to figure out when something will be finished. Unfortunately we do not understand the software development process as much as we understand the process of building a house or a car. Maybe that is not actually true. People make this comparison because there are some similarities in the end-product. There are some differences that need to be understood in order for the metaphor to work.

One of these differences is in the nature of available components. If you have all of the pieces well designed, then making a car is very easy. This is where the metaphor falls down. No one asks the question “What if you had to design a gearbox as a part of building a car”? And this is really where the problem exists.

Most attempts at creating a pipeline of features or products do not take into account the design elements needed as a part of creating new software features or products. This makes the process hard to plan out because there is generally geometrically increasing complexity in the interaction of systems as you put them together.

So how do we make this process more manageable? Ultimately this question is what birthed Agile Programming. As I say that magical phrase, half of you are cheering and half of you are rolling your eyes. I want to apologize to all of you because I am not going where you think I am going next.

There are some good things to Agile Programming and there are some bad things to Agile Programming. The best thing that comes out of Agile for me is the ability to develop work in sprints. I like two-week sprints for most projects. I will slow it down to three-week sprints sometimes when we are building very deep infrastructure and engineering teams need time to really think about what they are building. I have yet to move to a four-week sprint but I can believe there is a software system out there that would benefit from it.

So why do I like two-week sprints?

I like two-week sprints because two weeks is the amount of time I will tell you it will take me to build something that I do not know how to build. Two weeks is a safe number to say because it buys me some time to think about something and it buys me some time to experiment with some software interfaces.

And it is almost always 100% wrong.

I like two-week sprints because it is really difficult to make a great piece of software in two weeks.

I will tell people this from time to time. I want to be called out when I wave my engineering magic wand in the air and close my eyes and soothingly say “this will take me two weeks”. I want to be told I am wrong and I want to readjust my estimate up or down based on an examination of facts and perhaps a shorter window of time to do some experimenting.

I have led some great teams using Agile Programming to build software. Some of these amazing teams delivered very usable software in a timely manner that improved tremendously over the life of the project using two-week sprint-based development. I am happy to tell you all how this worked out for me. There are a few things that an Agile Programming approach has that work wonders on making software. Let’s start by discussing an anti-pattern: Fake story points.

We all love to hate story points. I recall the visceral project manager reactions when they are told the value of a story is 8 and another story is 5, and they have no basis to translate that into an effective number other than “the 8-point story is less complicated than the 5-point story”. The difference really has no basis in reality to planners and managers and they feel like they are utterly lost without a fixed reference point. In order to get to a shared understanding, I decided to throw away the abstract points and pin them to a very rough engineering cost. This is why I call them “fake story points.

Here is how that works:

Every story point should approximate a half day of work for a simple task.

All stories should be 1,2,3,5 or 8 story points. If it is 13 or 21, you likely need to break it down further before accepting the work.

Regardless of the previous rule, every external dependency on either an internal or external team automatically bumps up an estimate and should be preserved as such because dependencies are a gigantic risk.

So why does this work so well?

For starters, while it violates the steeple-fingered engineering abstraction that some people crave with their story points, it does provide some basis to do actual estimates that people can measure. If you are doing a two-week sprint, and each point is a half day of work, you are likely looking at 20 story points of capacity, minus 2 to 5 points depending on how involved your planning is. I would say that 15 points is safe, assuming you are doing sprint planning, grooming, and a sprint review and retrospective. While each person’s numbers will be different based on their own skill level, it will help planners assign a velocity score for the team.

The other nice thing about this is that if you are using 1,2,3,5 or 8 point stories, getting to 15 is relatively easy. However, getting to the right 15 story points is a very complicated process that involves looking very critically at each task and figuring out what is most important. Fitting all of the work into these 15 points is a critical negotiation between engineering and project management that ultimately results in both parties having a stronger shared understanding.

After applying fake story points it is important to capture all of that data into a spreadsheet. It is important to keep a running log of appetite vs accomplishment, sprint over sprint. You will notice that there will be trends in the work asked for and the work completed regardless of whether it is one team or five teams.

For a solid team that is getting more and more familiar with the project management requirements and more comfortable with the designs of systems they are needing to stand up, you will see the appetite vs accomplishment numbers converging over time quite nicely.

If it is not converging then you are identifying a problem that needs to be discussed at planning and at retrospectives to figure out why.

I was recently working at a company with a large number of teams who all had tools tasks. One of the things I observed is that the estimates for the amount of work needed to complete tools tasks never converged to a solid number. There was always a very high cost to every tools story, and there was also a smaller number completed compared to other tasks.

This could tell you a number of interesting things, any of which may hold true for you.

The tool designs were not sufficiently specced out enough to be meaningful
People hated working in the technology to build the tools, which usually is different from the tools they use to build their core software
People love building products more than they love building tools
On completing the tools that were asked for, they later get asked to just write a CSV importer due to the tools not working great after the use case for the software develops emergent complexity or emergent use cases that require nearly immediate iteration. This is a special case for 1) that needs to be called out on its own

If any of the above holds true consistently then it identifies that there is a gap in either designing or building tools that is not getting better. You can solve this by making an explicit tools team whose mandate is to ensure these problems go away. If that does not work, then it is potentially a place where you can leverage an outsourcing company.

The last thing to do is to be very public with the aggregate data. I have had some people very ashamed or upset when they do not make their numbers. It should be called out very early on and very frequently that this data will be available to everyone for review. It is important to communicate that out before sharing or people will be very upset. By having all of the data available for review, you will spark pattern recognition across the whole organization and spur meaningful conversations on how to address problems or highlight great successes. Improvement over time is something everyone should be invested in.

Fake story points, measurement, and sharing may sound simple, but it takes time and discipline to get to a great place.

Let’s discuss some times when these rules fail, because they will. Almost every time they fail you can spend time discussing how and why they failed and what to do about it.

The team is too small

The first time this does not work is when you have 3 or fewer people on a team. Even real story points that are abstract fall apart when you have too few people and numbers to get good measurement. If you are trying to figure out a small team’s velocity then the points will not do much to help you accomplish that unless you have a large number of sprints of data to stare at.

There is only one team

Another time this does not do great is when you only have one team. Similar to above, you will need to amass a large number of sprints to extract meaningful data about how the team is doing on story points. You are better using a much simpler metaphor, maybe even just stories accepted and stories completed.

Team instability

One of the biggest problems with any software team is churn. When people are joining teams or leaving teams, it causes a lot of instability in productivity and the ability to measure. There is a trite “don’t shake the Jello” comparison here. The best teams are the ones that get to gel over time. The more time you give them, the more successful they are. If they are not improving sprint-over-sprint on their own it may be entirely possible that the team does not work well together and that there needs to be some overall rearranging. This is a tool of last resort since it will cause the numbers to thrash a little.

Lack of shared understanding

Finally, you might just be a victim of a failure to communicate. It is very hard to improve on building out systems if you do not have a shared understanding between the people needing the systems built, and the people who are building them. I am pretty quick to ask “do we have shared understanding” when I see failures in accepting delivered work. If it is not declared “done”, generally that means that the process of communicating the work to the builders could be in need of improvement. This is one of the hardest problems to solve. I was introduced to “grooming” by one of my team members a long time ago as a means of solving this problem. Grooming is a painful weekly meeting where the product owners bring stuff to the engineering team to review without a commitment to accept the work. Anyone is able to ask questions and the first few of these meetings will bring out some amazing questions that should be educational to all parties. After a few months of grooming, I have observed that there are often repeated questions by engineers that turn into answers being added proactively once the pattern of questions is clear. This generally means that as shared understanding improves, the meeting gets shorter and shorter.

So there you have it! If you read this and are willing to adapt some of these practices to your work or teams, I should warn you that it takes over six weeks to really change your team behavior for the better. If you are two months into changing your team’s processes and not seeing a positive result then you at least have a good framework for measuring the gaps that can be freely discussed by all parties.

Thank you for reading this week’s article. Please comment, share, or otherwise Socials my weekly brain-spray into the ether. Feedback is a gift and who doesn’t love getting gifts?

Post Views: 798

By jszeder

One reply on “Lies, Damned Lies, and Two-Week Estimates”

Leave a Reply Cancel reply