Methodology Talk

1. Preamble

People in the business of paying for software development always have two questions on their mind: can you guess?
- How long is it going to take?
- How much is it going to cost?
So much energy has been spent on answering these questions
- But people take the premise for granted!
I'm going to argue that these are actually the wrong questions, and that there is a better question we could be asking and answering instead that does the same work we want these questions to do.
But first, I need to lay the foundation for this argument.

2. Ground truths/point of departure

First of all, I'm gonna just simplify here and assert that in software development, "how long is it going to take?" and "how much is it going to cost?" are roughly the same question, insofar as the answer to either is going to be proportional to the number of developer-hours.
When somebody asks you how long will it take and how much will it cost, what information do you need to answer that question?
- You need to know precisely what you're being asked to estimate
- Then you try to remember something (or ideally many things) you did before that was similar for all or part of the job
  - Even better: you have data from years of timesheets, meticulously tagged with said similar things
- And you try to get some decent coverage with prior art and extrapolate the rest
- And that's your basis for the time estimate
  - (how you manipulate that number afterward is up to you)
Later on I'm going to get to why this is such a slippery topic

3. Combination (decomposition)

Everybody knows that to solve a complex problem, you split it into subproblems, and you keep on splitting the subproblems until you have a bunch of subproblems that are small enough to be tractable
- You then solve each of the subproblems and assemble your results upward to the complete solution
This is the thesis of the architect Christopher Alexander's 1964 PhD dissertation, called Notes on the Synthesis of Form
Or rather, his thesis is that there exists an objective, mathematically optimal way to decompose a given problem based on its inherent structure
- Furthermore he asserts that if the problem is not decomposed in the optimal way, the non-optimal decomposition pattern will introduce flaws and failures in the design, and that will translate directly to cost overruns and benefit shortfalls
So Alexander's idea was that you go and find all the basic concerns of the problem you're trying to solve, which he called fitness variables
- These fitness variables can be understood as nodes in a graph, and each node is in a state of being satisfied or unsatisfied
- And the connections between the nodes represent mutual influence
  - Such that flipping one fitness variable will affect all of the other ones, either satisfying them also, or otherwise flipping them from a satisfied to unsatisfied state
  - So when you are trying to solve for one of these concerns, you need to take all of its neighbours into account
  - And all their neighbours, and so on
- So Alexander's solution is to find the partitioning line that cuts the smallest number of these connections
  - And then of course you repeat the process recursively with the two pieces, all the way down until you have a hierarchical structure which is tractable

3.1. Salience

Perhaps the most salient aspect of this book for me was the sheer magnitude of wrong answers in relation to the one right one
- That is, when you have a connected graph of size N nodes, you have 2^N possible ways to cut it up
- What this means is that there are 2^N-1 ways to get this pattern wrong and only one way to get it right
- And he related one insight here that instantly stuck out as being important:
  - If you overprescribe the categories that the parts and pieces of the problem are meant to reside in, you are practically guaranteed to get it wrong
  - And this is just a feature of the fact that the possible substructures vastly outnumber the words in the english language
    - So picking a category heading, like "database", for instance, is going to contort and pollute the solution, because it's going to draw an arbitrary boundary around some subset of the system, that is almost certainly not going to be consistent with the actual structure of the problem
The good news is that Alexander is describing a well-known problem in computer science, called min-cut, which is well-known because it has a lot of applications
- The bad news is that the problem of finding an optimal graph partition is NP-hard
  - If you're not familiar, just means an exact solution is impossibly expensive.
- The good news, again, is that since Alexander published his dissertation in 1964, there have been considerable advances in algorithms that yield approximate solutions
- And finally, the bad news is that even though this topological decomposition business is a fascinating and useful theory to have in your head, it may not be as widely applicable as it looks
  - Here is what I mean by that:

3.2. Relevance

When I found this book in 2008, I was like, yes, this will be great for effort estimation
1. All I have to do is get all the requirements and hook them together in this structure
2. Then feed the structure to a min-cut algorithm and it will punt out the hierarchical decomposition
3. Then I can take the results and estimate the elementary parts, and add up the individual estimates and that will provide a basis for an estimate for the whole thing
What are the problems with this?
1. Getting all the requirements is a lot of work!
  - They can't just be nebulous statements like "must be easy to use"!
  - In order for this process to work you will need hundreds of fitness variables and they will have to be pretty fine-grained
2. Changing the underlying structure radically alters the decomposition pattern!
  - In practice this means adding any new information is going to completely reconfigure your project plan and require you to re-estimate
3. Estimating a project by adding up the individual estimates of its parts is worthy of its own discussion, which i will get to later
It is worth asking, briefly, why this approach works for construction projects in 1964
- Why it works for construction projects is because if you look at the cost profile of a construction project, something like 80% of the money goes to materials and labour of building the actual building
  - The remaining 20% is split between the architect and the structural engineer
  - (note I'm not counting the cost of the land, which would cut this fraction in half)
  - So with respect to the empirical task of obtaining these fitness variables and hooking them together, that's what the architect is there on the project to do
- Why this approach works in 1964 is because your options for computing are punch cards on rented hardware: you aren't going to go back and run this a second time if or when you get new information
  - Even if you could, there's so much inertia in a conventional construction project it wouldn't matter if you did
So the takehome here is that the structure of the project affects the estimate and new information will affect the structure
- Getting this information is empirical work and is irreducible
- Also, if you use arbitrary categories rather than math, you are overwhelmingly likely to get this structure wrong
Let's just say there is a reason why Alexander abandoned this method (which I will get to eventually)

4. Permutation (sequence of operations)

So let's suppose you do manage to get an optimal hierarchical decomposition pattern for your project
- Herbert Simon (in 1969) called these systems nearly-decomposable, to the extent that most of the parts of the system don't interact with most of the other parts; they only interact with a few of them
- In the everyday world, we can understand this problem this way:
  - It doesn't matter what order you put your shirt or your pants on, but it absolutely does matter that your pants go on before your shoes
  - What happens if you get this wrong? You have to take your shoes off, put your pants on, and then put your shoes back on
  - In other words, it costs you
- So on a project you could have a hundred elements that don't interact with each other, but they cluster into five groups that absolutely does need to be in carried out in a strict sequence
  - How do you know you got the sequence right?
  - What is the penalty for being wrong?
- Let's imagine a toy model where each of our five path dependent steps takes one day to complete, so our estimate is 5 days
  - So we say, we'll do this in an arbitrarily prescribed order: 1 2 3 4 5
  - What we don't know is, the actual order of dependency is 5 4 3 2 1
  - So we try 1, find out we can't complete it, then we try 2, then 3, then 4, then we finally get success with 5
  - Then with 5 complete we say, okay that one step was clearly out of order, but let's get back to the plan
    - so you try 1 again, then 2, then 3, then finally 4
    - And so on
  - For 5 path-dependent steps for which we don't know the order of dependency, your worst case, with do-overs, is 15
    - So you say, I'll be clever and triple my estimate!
    - Problem with that is these numbers are triangular, so if there were six steps the worst case would be 21; 3 times 6 is 18, so tripling is not enough
  - The size of the worst-case penalty is: (n²+n)/2
  - That's not as bad as the odds of getting the sequence wrong!
    - The permutation of the elements of a set, that is to say all the possible ways you can put them in order, is the factorial of the number of elements
    - Five factorial is a hundred and twenty, so our five-element sequence has 119 wrong configurations to one right one
      - Six factorial is seven hundred and twenty
So you have the impossible odds of getting the right structure, multiplied by the impossible odds of getting the right order
- Granted the wrongness varies immensely in degree
- So you will be wrong in your estimates, potentially by a lot
The point of these last several minutes is to illustrate precisely to what degree the deck is completely stacked against you

5. How is it then that people succeed? (thin vs fat tails)

How do we reconcile these odds with the people who succeed?
- Overruns are caused by surprise, in the sense that something doesn't go to plan
- Namely the aggregate total of things not going to plan exceeds your budget for things not going to plan
- I just detailed two mechanisms for things not going to plan:
  - Not cutting the project up into the right pieces
  - Not doing the path-dependent pieces in the right order
  - These reduce to not getting, or otherwise appropriately processing, the information needed to do this important work
How do you eliminate, or otherwise absorb surprise?

Only committing to the most conservative, well-trodden work
this I am going to expand on in a second but a really good way to eliminate surprises is to just stick to the things where all the surprises have been wrung out, preferably by somebody else.

Padding the absolute hell out of your estimates
like, to the point of being actually embarrassed; to the point the client would fire you if they knew any better.

Charging an absolute buttload of money
if you charge a lot of money, like a really really obscene amount of money, there are a lot of situations you can just buy your way out of. not original development, mind you, but you might be able to buy your way out of something else to free up the resources to slide in under the wire.

Bullshitting your way out of it
I mean…
I suspect the people who are consistently successful at effort estimation — at least to onlookers — are doing at least one of those things.

5.1. My pet theory

Here's my pet theory about what's going on with effort estimates:
- We're all taught to empirically determine a plausible average number and then pad for safety
- What are we actually doing in this scenario though?
- Unless we have hard sample data to draw from, we're just recalling our own experience with similar projects
- What are we most likely to recall?
  - The most common outcome
  - As in the mode
- Worth noting that in a normal distribution, the mean, median, and mode are the same number
- And when you pad your estimates on a normally-distributed task, you're implicitly covering enough of the variance that you net nearly all the possible outcomes
  - If you're in the habit of doubling your estimates, you'll probably go way over two standard deviations
So I'm just going to assert here that there is good reason to assume that tasks that don't have a lot of room for surprise are going to have completion times that are normally distributed
And I'm going to argue that software development has a tendency to be not normally distributed.
- Note, this is not a property unique to software development, nor do all aspects of software development exhibit this property, i'm just arguing that software is a fertile place to look for processes that are what are called fat-tailed.
- And this has to do with the fact that there are a lot of surprises in software development, and furthermore, those surprises are capable of generating more surprises.
  - This right here is a recipe for fat-tailed processes.
What this means on the ground is that if your process is fat-tailed, no amount of padding is safe.
- You could pad 10x, it wouldn't matter. you'd get blown out of the water by the next 11x overrun
So that's what I think is happening with effort estimation: the padding heuristic works for thin-tailed processes, and the people doing fat-tailed processes, who don't acknowledge that's what they're doing, are going to be routinely disappointed.

5.2. Square footage

There is another constraint that is observable in fields like construction, where accurate cost estimates are business-critical practice:
any building project can be estimated to a first approximation if you know one number: the square footage of the building.
with that one number, or actually two, if you include the area of the lot, you can determine roughly:
- how many floors
- how many elevators
- how many stairwells
- how much concrete
- how much glass and other materials
- how much piping, ducting, cabling
- and, if you have historical data, how long it will take to build.
A construction estimator can then work within this parameter to — painstakingly, over weeks to months — parse reams of already-completed specifications from the architect and structural engineer, to come up with their bid.
If you want an example from the tech industry that has a nicely-behaved initial constraint like this, I submit user research is a good candidate, as the size of the job is proportional to the number of participants.
There have been attempts to do this kind of thing for programming, such as function points and story points, but they don't represent objective constraints in the environment, and I don't consider them to be especially rigorous.

6. Fermi estimates versus whatever this nonsense is

Now I want to turn attention to the premise underpinning "how long will it take?" / "how much will it cost?" and why I think we can do better.
I'm going to assume the purpose of effort estimation in general is to answer these questions
- The purpose of answering these questions is to set budgets and deadlines
- The purpose of setting budgets and deadlines is, presumably, to control resources and to set expectations
So here's the rub: effort estimation is actually a structurally, mathematically bad way of achieving this.
- I have to credit Carlos Bueno for what I'm about to tell you, this head-smackingly obvious revelation that has nevertheless escaped me almost my entire career:
- When you estimate by adding and multiplying up from zero, your errors accumulate, and they do so in the bad direction.
- There is, instead, a technique called a Fermi estimate, that starts at a maximum, and subtracts and divides its way downward.
  - Furthermore, the errors tend to cancel each other out.
Before I proceed with how to do a Fermi estimate on software I want to remark on one other thing with regard to estimation:
- Have you ever noticed that it's easier to estimate some development work that you don't want to do?
  - Like "uggh that's going to take so long"
  - People chalk this up to some psychological explanation, like we're optimistic about the things we want and pessimistic about the things we don't want
  - There may indeed be some truth to that but I don't believe it's the whole story; I think there's more going on there
  - When we estimate for deadlines, the question we're asking is "how much time is this thing not going to take more than?"
  - When we're estimating for rhetorical purposes, the question we're actually asking is "how much time is this thing not going to take less than?"
  - See the difference?
    - The former is exhaustive; it's expensive, you have to identify and eliminate as many sources of surprise as you can find
    - The latter, all you have to do is start counting until you hit some convenient threshold number; indeed it works better if you stop counting because you can just handwave the rest
    - I have come to call the ordinary "no-more-than" estimates "right-side" estimates, since you're coming at it from the right on the number line, and the rhetorical "no-less-than" estimates "left-side" estimates because they are opposite.
  - So I want you to keep this idea in mind

6.1. Value per run

The way you do a Fermi estimate with software is to recognize that all software can be ascribed a value per run
- And the number of runs per unit time can be measured
- So it's quite feasible to derive a value per, say, year
  - Or more appropriately, come up with a plausible lifecycle for the software
- This is going to be easiest when the particular software intervention is tied directly to revenue, but it's not too hard to come up with proxy metrics for situations like nonprofit or institutional entities
  - Even in for-profit scenarios, very little of the code is going to contribute directly to revenue, so we will have to use proxy metrics most of the time
So the question we ask is something like, "how many runs is this piece of software going to need to get in a given interval to be worth spending X amount of time on?" followed by "what are the odds of that?"
And we can discount this too in a number of ways:
- For starters, the valuation itself should be calculated using the discounted cash flow method
  - if you aren't familiar with discounted cash flow, it's like compound interest in reverse:
    - it's what tells you how many dollars you'd have to get in a year to be worth the same to you as, say, a hundred dollars today
- The "how many runs?" valuation question should also be padded with a profit margin or analogue thereof
- If applicable, there should be the option to set a date after which the intervention is worthless, ie discounted to zero
- Finally, we should discount the whole thing by the probability of success
- what this is going to amount to is a big hairy differential equation, but we have computers for that
  - Actually this is a good thing because this scheme will very likely undervalue specific interventions
The final remark is that any work that is not itself conspicuously valuable is almost always a dependency to more than one element that is conspicuously valuable, so it can be said to partake in a fraction of the aggregate value of all its dependents
So at the end of the day we control resources not by asking how long it's going to take / how much it's going to cost, but figuring out how much it's worth, and saying "if you can't do it in less than this amount of time, don't even try"
But more to the point, every discrete intervention valuated this way is going to begin with a tiny valuation because the discounts will start off huge
- But every intervention will be greenlit to spend some time gathering more information about it, which will affect both valuations and left-side estimates

6.2. Recap

So to recap the principles:
- Instead of trying to answer how much is a particular software intervention gonna cost, we try to figure out how much it's worth
- Then we figure out how well it would have to perform in order to "earn back" different investments of time, and ascribe probabilities
  - If it looks too unlikely a given intervention will yield a return, you don't do it
- Note that this is absolutely gambling; we are gambling on every unit of time we spend
  - Most of the time we will lose
  - But the gains are asymmetric, so we will win on average

7. And now the moment you've all been waiting for

Armed with this valuation mechanism it's time to paint a picture of how to operationalize it
I want to preface that I'm not claiming any of this is especially original; so if it resembles what somebody else is doing, that is a coincidence
- I spent all my time trying to figure this out myself; I didn't check to see if anybody else came to the same or similar conclusion but wouldn't be surprised if somebody did
I wanna make an observation first about in-house teams:
- Everybody's on salary; the cost of the team is fixed
- If they're Agile™, they regularly deliver software functionality, and presumably that software functionality is worth more to their employer than the team costs to maintain
- An in-house team could switch to this methodology I've been describing pretty easily; there wouldn't be much to change
The hard nut to crack is how do you do this on a contractual basis
- What precisely do you promise a client you're going to do for them?
- How do you make it sound attractive?
Going to have to address the marketing aspect some other time; right now I'm talking about how the contract works and how it interfaces with project management
What we agree to is a longitudinal relationship with the client, at the scale of their financial year
- That is, we allocate a fixed number of hours for a fixed number of dollars, which we distribute over the engagement
- Over this allocation we have checkpoints at regular intervals
  - Monthly is ideal but you could conceivably do every two months or even three, for a premium, but going longer than that kind of obviates the point, which is to lower risk
  - At this time all the work done in the preceding interval is billed, and rights to the work product are transferred when payment is received
We set low and high watermarks for hours per interval, such that the average between the two will run the allocation down to zero exactly by the end of the engagement.
- So we're billing hours, but we're billing hours on rails.
- The client can't get billed more than the total amount per engagement, and they can't get billed more than the high water mark per interval
- If, however, there's enough capacity, we can just keep working all the way up to the high water mark and the engagement will run to completion that much sooner
  - I should footnote here that this is not a profit maximization strategy; this is a risk minimization strategy.
    - If you want to maximize profit you should be selling products, not doing custom software development.

7.1. How the work gets done

Now, about the actual content of the work itself and how it gets done
- The basic unit of motivation is something I'm calling the issue
  - An issue is nothing more than a statement that there is some state of affairs in the world that needs something done about it.
  - Issues don't say what to do about them.
    - They only attempt to articulate the problem, not any particular solution.
    - That comes later.
- Issues are what get ascribed a value, and the initial set of issues should be workshopped with the client or written by the client themselves
  - We should also leave open a mechanism for the client to add more issues once they get the hang of the format
- This work comes together very quickly, and will likely result in more issues than we could possibly address in the time allocated
  - This is fine; it's actually kind of the point
  - Our goal is to create more value than we cost, not necessarily to do any particular task, so in these issues we are amassing as many opportunities to create value as we can get our hands on
Now you may want to know about how we prioritize
- Here I want to interject an insight from Alexander's dissertation:
  - People generally agree on whether a given issue is valid; where they tend to disagree is around how important it is
  - With this regime we capture everything because it's cheaper to capture an issue than argue about whether or not this or that issue is important enough to be captured
Now, let's imagine a horizon representing technical detail
- Above the horizon is the initial set of issues, ideally articulated in the client's own words
- Beneath the horizon is where we start to address what it's actually going to take to address the issues
- We respond to issues with positions, which are proposals for specific responses to one or more issue
  - And while each issue gets ascribed a value, each position has a cost
    - These are those left-side estimates I was talking about, the ones that are cheap and easy to do
  - And we can continue this further by registering arguments about why a particular position should or should not be adopted
  - This process continues along and we build up more and more of these connections
  - With all of these nodes connected in the system, we can do things like compare our left-side, minimum-cost estimates on positions to the valuations of the issues, and furthermore derive a sequence of operations informed by the structure of the system
Where we gain our efficiency is in not going against the inherent structure of the project
- We can see the structure for what it is, and thus determine how to take it apart and what order to do the parts in,
- Without having to commit to a composition or sequence of operations up front.
What we get from this exercise is a structure that looks very similar to Alexander's fitness variables, and can be manipulated accordingly
- It is worth remarking that this technique of connecting together a network of issues, positions, and arguments was invented by Horst Rittel, who worked with Alexander at Berkeley in the 70s
- So all of this theory is quite mature, and has even been given the digital treatment numerous times, dating back to the 80s
- My contribution is mainly the valuation mechanism, and figuring out what the contract needs to say in order to make this process viable outside of a big company
  - Oh, and I've also written some very crude prototype software to help capture this structure.
Now, some people in the Agile community might look at this and say "this is just a bug tracker" and "this is just user stories"
- To which I say, good, then you should have no trouble adopting it
- But cheeky remarks aside, I can highlight two major differences:
  1. the issues in the system have arbitrary scope, and are more than just software bugs or feature requests
  2. We separate the declaration of the problem with the proposals for a solution as well as from the discussion of those proposals; an ordinary bug tracker typically has a comment section but doesn't distinguish between types of comment
And finally, I want to address the UX community, who, if there are any of you still watching, are probably wondering where stuff like user research, personas, scenarios, content, IA, and so on fit in
- What I would say is this: one very important role of this structure is to get stakeholders to acknowledge that specific issues are worth something to resolve
  - And again this is not all about writing code
  - Indeed this structured argumentation is intended to concentrate information and aid in decision-making
  - For example, a number of these issues are going to raise questions that can only be answered empirically
  - In other words, user research
  - Well, within this framework, we should be able to price the answers to empirical questions
  - In other words, provide the rhetorical basis for, for instance, a user research budget.

8. Epilogue

This presentation represents years of research and experimentation, and this is the first time I have put the entire thing together, from theory to concrete implementation
It is my goal that at the very least it provokes some discussion around how software development is procured, contracted, designed, and managed.
I'm looking forward to discussing this approach with all of you.