Following the 2003 Space Shuttle Columbia disaster, an extensive investigation took place to discover the cause of the incident and prevent it from happening again. This piece from The Atlantic explores the investigation, its methods, and findings. While many of us will never work on a project that has the life-and-death consequences of space exploration, the approach to the investigation provides an opportunity to learn strategies to anticipate and recover from mistakes within projects. Here are three strategies employed by the investigation team that may be of use to project teams:
You can’t investigate yourself.
Following the Columbia disaster and the suspicion that there may have been systemic problems at NASA that contributed to the tragedy, an independent council was formed to lead the investigation. These investigators held no stake in the status quo at NASA and were better qualified to take an objective look at the organization.
At a project scale, the best performing development teams separate testing from development. The developer who wrote the code is not the one to test it. This limits subjective understanding of the function being tested and provides objective requirements-focused testing.
Project leadership can also use this technique to bring in another set of eyes to look at project planning documentation and identify gaps, inconsistencies, or risks. Sometimes a project leader can be too close to a project to see certain flaws.
Only rule out a cause when there is data to prove it.
When investigating the cause of a failure, it’s easy to let biases and opinions influence which causes are investigated and which are ruled out without examination. The investigative team for Columbia thought of every single possible cause for the disaster, regardless of likelihood, then only ruled out a potential cause when absolute proof was found.
The same approach can be taken to analyzing a technology failure. A project team can outline all possible causes, even seemingly unlikely ones, within the environment and contributing factors like infrastructure to consider where a failure occurred. The list of causes can be systematically reduced by targeted testing.
Proactively, creating a comprehensive test plan can help the team anticipate a wide range of potential anomalies and reduce their impact with improved requirements and risk-based prioritization.
Hold a pre-mortem before a deployment.
In the case of the shuttle Columbia, many engineers and technicians were aware of the risks associated with the materials used in the fuel tanks. Due to the bureaucratic and hierarchical culture at NASA, these concerns failed to reach leadership or carry the weight needed to influence decisions.
In an IT project, one way to address this issue is to hold a project pre-mortem. Before a release or product launch, sit down with the project team and ask what is most likely to go wrong. If the project were to fail, why do they think that would be? Often, these conversations can illuminate the issues that those closest to the project are most concerned about, but haven’t voiced. These concerns can lead to preventative measures and more comprehensive risk assessment.
These techniques helped the Columbia investigation team make recommendations to improve communication and thoroughness at NASA that continue to this day. Through these methods project teams can minimize the chances of reoccurring issues, or better, prevent them from happening to begin with.