Sunday 5 April 2015

Scrum - Part X: Planning spikes and low-level design


Last time I've talked about a small technique that eases sprint management: task pinning. What I skimmed over was its elder brother, planning spike, for reasons of prior art: Google helpfully pointed out that the commonality of this approach.

However, when I started digging into said prior art, it differed somewhat from what I've been using. Agile gurus treat planning spikes as a form of high-level design, and/or UX planning ahead of a major project. Even more so, some recommend treating them as the exception rather than the rule, and fix planning gaps during retrospective.

I already expressed an opinion that people put too much faith in retrospectives. No post-mortem can fix technical gaps, and these are the most common root causes of struggling sprints; not processes. Handling those is much better before people start the sprint, and raise their left (or, as they case may be, right) foot over the chasm. 

So what are planning spikes?

My bad. Spent three paragraphs talking what the subject isn't, rather than saying what it is
Easy to fix - here's the definition: A planning activity which provides a fine-grained low-level task breakdown.

No, it's not going to be carved in granite any time soon, and neither is it meant to amaze the community of Scrum practitioners. Frankly, it's one of the most obvious definitions you're going to meet.

The more interesting and harder question is:

How often should I do planning spikes?

This is where my road starts diverging a bit from the wisdom unearthed by the Google search. My take is that all sizeable tasks should have a planning spike - with a few notable exceptions due to be unveiled.

The reason is simple: people get estimates wrong. It is not a subjective opinion, and it is not a sample of a specific company. It is an observation.

Software engineering and testing is a very complex discipline, and it depends on many unknowns. If an estimate is more than a day and a half (or 5, 13, 21 story points, name your poison), or if the person did not spend more than a few minutes dropping in a cost tag, then it's very likely to be inaccurate.

This looks as an overgeneralisation, so let's pull out a few reasons why tasks get under-, or over-estimated:

  1. Missed error cases.
  2. Did not account for release and code merging activities.
  3. Code review took longer than expected.
  4. Missed code reuse opportunities; refactoring post initial code delivery.
  5. Test bed unstable or not ready.
  6. Missed downstream dependencies.
  7. Low-level interface had assumptions which proved to be incorrect.
  8. Specific test cases were hard to execute; required cumbersome setup (e.g. install of an atypical OS).
  9. Expected to reuse specific functions, but these had somewhat differing interfaces.
I am not compiling an encyclopedia; but am sure you have a private list of grievances from past efforts. 
Now, many people realise that a monolithic estimate of 4 days is not worth a lot, and that breaking it down is important. The breakdown they come up with then looks like this:

  • Design - one day
  • Implementation - two days
  • Testing - one day
I spent 2 minutes typing this, and by coincidence, this is exactly how long the estimation process takes in those cases. This helps neither man nor beast.

The important point is not how fine-grained an estimate is, but the thinking behind it. If you planned a work item through, then granularity is a beneficial by-product.

So why retrospectives do not help?

Now, with a few examples and definitions left behind, it's easier to reiterate the reasoning behind retrospectives (or lack thereof).

Let's say, we missed a set of error cases behind the long running telemetrics example. Middleware's team task on recording the set of formats took twice long than expected as they have not considered invalid codecs, codec families etc.
Now the team conducts a retrospective. What corrective actions can they agree on?

(a) "Never miss error cases."

While we're at it, we can also solve world's poverty or stop writing bugs. Easier said than done.

(b) "Consider invalid formats when dealing with codec-specific tasks."

This looks better, but won't help us with the next wrong estimate (e.g. processing and categorisation of client's headers).

(c) "Always consider error cases when planning tasks"

This pulls the sanity dial down the middle, but still does not provide the ultimate answer, since error cases are not that easy to flush out. Few people can look at a task, spend ten minutes, and rattle off the full and ultimate list of error conditions. 
Moreover, check out the sample list of other reasons the wrong estimate can occur; our dial might still be erring on the "too-specific" side.

(d) "Perform low-level design prior to starting risky tasks"

This creates a better tradeoff, but you do not need to wait for retrospectives to come up with something like that.

Anyhow, the (a)-(d) exploration above is not a mathematical proof, so if you think there is a gaping hole, feel free to point it out in comments.

The upshot of all of this is the same point I made a few posts back. Retrospectives are good for process corrections, while most sprint under-achievements are down to technical reasons.

The return of the low-level design

You might have picked up on a keyphrase above: "low-level design". Yep, I slipped it in on purpose.

In not so many words: a planning spike is often a low-level design; and, of course, this is not restricted to development activities only. Laying down detailed QA test cases, mapping out DevOps deployment or troubleshooting plan are all fair game.

At this point, you might have a slight feeling of betrayal. Many paragraphs with rhetoric and examples to read: all just to say that we should do low-level design and detailed planning? This is exactly the reason I deliberated skipping over this topic; however, the devil is in the details, and it's time to say
What low-level design is and isn't

While advocating a planning spike (this modern name looks fancier than just saying "figure out what you're going to do"), it is important to define its boundaries:

a)  The output is 100% technical. Only people who are going to do the task, and/or can provide useful insight need to see it.

b) It is not a living document. It's a trampoline to do the work, and it is going to become obsolete shortly after the work is complete.

c) Its job is to flush out time-consuming activities. It need not explore every nook and cranny, especially if those are small enough.

Most importantly, 

d) It need not be too long! If we estimate a task from afar at 4 days, we should not spend longer than a day or two doing the low-level design.

When planning spikes are unnecessary

I talked at length about the benefits of upfront planning. There are however a few notable cases where they do more harm than good.

Exactly the same task was done before. Manual regression testing or repetitive deployment activities are examples that come to mind. Yes, they should be automated, but there is the occasional distance between "should" and the reality of today.

The tasks are minor. For example, we have a set of 5-odd defect fixes, that we understand at a superficial level. Some might take a bit longer, some might take half an hour, but laws of averages work in our favour.

There is urgency. We have a critical fix to make, and it had to be done yesterday. We still need to figure out what to do, but we have to do it all this sprint, and can't accommodate a separate planning spike.

Isn't this the same as backlog grooming?

Yes and no. This idea and purpose is the same. 
The execution is somewhat different; doing low-level design in a meeting just does not cut it. This needs a preparation; an offline dedicated effort from someone. Review can happen in a meeting, though note my previous comments on validity of those in diverse teams.

In short, people need dedicated time and the right environment to plan their work. Backlog grooming meetings are suited better for elucidating requirements.

Sprint duration

The main weakness in spraying planning spikes is the time to execute. Let's assume the worst case, and say that your sprint is four weeks long.
You can't stick both the spike and the actual work in the same sprint; that would defeat the purpose of design as means of getting fine grained, accurate, sprint commitments.
This in turn means that we face eight weeks between deciding to do something and delivering it. It might be good in some situations, but it's not that Agile(TM).

In other words, this technique encourages shorter sprints: ideally two, and definitely not longer than three, weeks.
There are opposing forces against shorter sprints, which I'll come to in future posts. Getting the duration right is an important, but not by all means a straightforward, decision for a given department.

Typical acceptance criteria

As a reference, here are typical success criteria I've been using from while wearing a Product Owner hat:

Development. Description of main functions and interfaces to be changed. For more complex changes, flow diagrams (UML optional). Intended unit-test coverage, with possible TDD (Test-Driven Development) focus.

Testing. Description of the test environment. Low-level design of automation, e.g. how many new UI elements do we need to control. A reasonable estimate of the amount of test cases.

Deployment. Quality gates during deployment stages. Manual effort required during validation of those, if any. Collection of metrics, if for example, we are replacing or augmenting a technology. Training sessions for Technical Support and other interested folk.

Summary

Agile techniques do not obviate the need to do low-level design prior to starting on tasks. To properly and confidently slot tasks into sprints, preparatory work is required, and this can't be done purely through meetings and ceremonies.

There are situations where a separate planning spike does not work, and using this technique in anger requires shorter sprints.

No comments :

Post a Comment