Every year, hundreds of new students go through our BlueDragon workshops. These workshops include realistic case studies to highlight and reinforce the lessons taught during the course. ("Realistic" refers to the fact that every case study is based on actual incidents in various industries, but the names and specifics have been altered). One of my favorite case studies is a scenario that takes place on Christmas Eve. As the scenario unfolds, every cause-and-effect path seems to lead back to a common root cause; Christmas Eve.
On the morning of December 24th, two workers were given a work order to complete a rather dangerous valve replacement on a chemical system and were told they could go home as soon as their work was done. Workers rushed through their task, making serious judgement errors and assumptions that led to an incident. They were distracted and did not follow specific work controls and other procedure requirements. At the root of it all, was the fact that they wanted to get home in time for Christmas dinner and other festivities.
The Operator that isolated the valve to be worked that morning did not help the workers walk-down their job site before they started, because they were the only ones on site taking log readings. Workers ended up on the wrong valve, which exposed them to Hydrofluoric acid at system pressure. At the root of this cause & effect sequence was that the Operator (and the entire site) was at minimum staffing, because it was Christmas Eve.
The supervisor on shift that morning did not provide the workers with a pre-job briefing, which would have prompted them to read the work package. The work package contained warnings that the valve configuration was the reverse of the normal configuration. The supervisor did not have time that morning because he was the only one on shift. When combined with the workers being distracted and in a hurry, they made the incorrect assumption that this valve configuration was the same as the others and did not bother to check the isolation boundaries. The reason why the supervisor did not hold a pre-job briefing? He did not have time because he was the only one on site during minimum staffing, because - you guessed it - it was Christmas Eve.
So it is clear that Christmas Eve shows up as the root of all our problems and should be immediately banned...said the RCA Grinch. As amusing as this scenario might seem, a version of this scenario happens all too often; root cause efforts stop too soon and we end up taking corrective actions on symptoms that don't really address the deepest-seated causes for our problems.
After three decades as a root cause practitioner in the commercial nuclear power industry, the Nuclear Regulatory Commission and the Department of Energy, it is clear that there can be many reasons why our root cause analyses stop too soon.
Here is our BlueDragon Tip of the Week: my top five reasons why your root cause analysis efforts stop too soon or fall short:
Traditional definitions of a “Root Cause” are misleading: definitions that have been in place and readily accepted for over 50 years have inadvertently caused root cause analysis efforts to stop too soon. A simple version of the accepted definition is as follows. “Root causes are the deepest-seated causes for an event or condition that, when corrected or eliminated, would preclude the event or condition from happening again.” This traditional definition misleads us into looking for only the "event root causes;" the root causes for the event being investigated. There are deeper-seated causes that lie much deeper that the event causes but we must have a methodology that does not allow us to stop too soon. The deepest-seated causes explain the event being investigated, but also many other past events and will continue to cause future problems as well. However, organizations are often too eager to identify the obvious causes and move on. To get to the deepest-seated root causes requires a disciplined, rigorous approach that critically analyzes available data, attacks the problem from multiple perspectives and does not immediately stop at the event root causes.
Not Asking the Right Questions: There are some individuals that believe asking the right questions is the most important aspect of problem solving. They include notable figures such as Peter Drucker and Albert Einstein. An all-too-common mistake is to limit your Lines of Inquiry to the initial set of evidence and to focus on what mistakes were made (and by whom). More thorough Lines of Inquiry can be developed by: - Analyzing available data or creating your own data set to analyze; - Critically reviewing available information (including the sequence of events and personnel statements); - Identifying at-risk behaviors and error-likely situations; - Identifying and evaluating the programs, processes and procedures that should have prevented the event; - Identifying and evaluating equipment and material interfaces; - Identifying potential issues in the work environment; and, - Identifying potential issues with management and oversight. - It also helps to ask about similar events happening in the past and the effectiveness of previous corrective actions.
Not Identifying the Common Causes: All root causes are common causes. By definition, they are the source of scores of problems; they are common to many cause-and-effect sequences and lead to problems that range from minor problems to significant events. To uncover the root causes, we work backwards from an event using cause and effect analysis and attacking the problem from multiple perspectives. Using a good set of Lines of Inquiry and a multi-pronged approach, we can identify the common causes, the deepest of which, will be the root causes. To validate these root causes, we should be able to trace how the root causes led all the way up to the symptoms that manifested themselves during the event.
Not Gaining Enough Insights from Data Analysis: To greatly improve our chances of uncovering the root causes during our cause-and-effect analysis, we first conduct data analysis using available information. There are hundreds of data analysis tools we can use, and the conclusions from the data analyses provide additional insights that help us develop more Lines of Inquiry, adding rigor to our investigation. We can draw great insights from: - Pareto Analysis of corrective action and other databases; - Process Mapping of affected processes; - Value-Stream Mapping of inefficient processes; - Affinity Diagrams to manage large amounts of disparate data; and, - Fault Trees for equipment issues. For example, we can use a Fault Tree to identify how a motor failed. Then we can develop questions to ask why or how those failure modes were created, to get to the root causes using cause and effect analysis. The more insights we can generate from the data, the more Lines of Inquiry we can generate, which adds more rigor and greater insight to our analysis.
Not Getting the Proper Management Support: Even if a Causal Analyst is using the best available methodology, completing an RCA is a team sport that requires strong support from management. For example, making personnel available from various organizations as needed to support the analysis, providing a space for the team to work, and most importantly, not trying to steer the team’s analysis and conclusions. Any of these can hamper the effort and cause the analysis to fall short. Therefore, it is crucial for managers to understand what is required to conduct a thorough root cause analysis and that their participation and cooperation is integral to the RCA Team’s success.
For more information on how to solve complex, human-centric problems, visit us on the web at: https://www.dle-services.com/