Root Cause Analysis (RCA) is a method of problem-solving used for identifying the root causes of faults or problems. In software testing, it is used to identify the root causes of defects or problems and preventing them rather than treating the symptoms.
There are two types of RCA in Software Defects:
- RCA of appearing defects: To answer the question of why these defects happen during the development phase.
- RCA of escaped defects: To answer the question of why these defects happen after releasing or promoting next environments (Staging or Production)
- To recognize the root causes of issues to get lessons learned to prevent them from happening again in the future. The principle is “Prevention is better than the cure“.
- Cat the net wide: dig up the roots, investigate and fix the underlying cause instead of just fixing the problem at the point of discovery.
- Via RCA of escaped defects, your testing skills will be improved by recognizing these patterns.
- It depends on the context
- As a good practice: Should be carried out every sprint, or every release, or whenever there are leakage defects after deployment.
- Indicators to trigger RCA sessions:
- Bugs arise consistently
- Many bugs in the system
- A need to improve the development process
- Business needs: customer complaints, stakeholder requests
Set up and categorize the Root Cause
Basically, to collect data, the RCA should be added into the defect tracking system such as Jira, Bugzilla.
Depending on your context, normally, there are several following categories of RCA.
|Root Cause||Definition||Potential recommendation for prevention actions|
|RCA for appearing defects during the development phase|
Missing ACs/items mentioned in the requirementUnclear or ambiguous requirement: This makes the tester’s and developer’s view different.
|Design Issue||Wrong architecture design, or complex, or overkill, etc.||
|Coding Issue||Wrong implementation, missed items in ACs or lack of unit test.||
|Testing issue||Missing bugs in the program; finding bugs are not in the program; poor tracking and follow up.||
|Environment issue||Differences, inconsistency, a special case in a particular environment, etc.||
|Deployment issue||Related to Infrastructure, configuration, wrong build, missing steps in deployment, CI/CD issue, Scaling issue, Docker container issue, etc.||
|Out of Scope||
The issue has not required any actions to close:
|RCA for escaped defects after releasing or promoting next environments|
|Known Issue||This issue already reported on the bug tracking system.||
|Missed in SCRUM||
The issue happens because of missing test cases, not executed test cases or missing bug on executed test cases on new implementations or change requests from the SCRUM team.
|Missed in Regression Test or Unknown Impacted Areas||
The issues didn’t occur in the previous deployments and it happens in the current version.
The team has the wrong detection of risk impact analysis or does not have enough resources for regression testing due to time or budget constraints.
|Poor Coverage or poor Test design||
This tends to be tester’s mistakes.
The issue comes from unimplemented APIs, missing test cases for implemented API or API changes.
The issue is not caught by existed test scripts due to test script design issues or missing test data.
Some cases that rarely occur but can occur. Testing outside of the base assumptions, finding different ways to use a feature that was not intended.
|Enhancement||The issue relates to Usability, Chrismas, or user’s view, etc. to make the product usable, useful and successful.||
|Other team or 3rd Party||
The issue comes from an integrated system or other teams that should be in charge.
|Configuration and Infrastructure||
The issue comes from deployment issues: Flipped flag, configurations, special settings, etc.
Scaling Issue – AWS memory, CI/CD Issue, Docker Container Issue or Performance Issue
|Invalid Issue||The issue has not required any actions to close||N/A|
Where to get data
Basically, pulling data from defect tracking tools. If you are using JIRA tool, you can easily create queries to filter data.
- Determine problem
- Identify possible causes
- Propose preventive and corrective actions
- Analyze and monitor the effectiveness and usefulness of actions
- Collect the right information
- Create a fear-free incident reporting environment, don’t need to use MIP (Mention in Person)
- Get everyone to buy in but make sure that RCA is to uncover faults in the process rather than the individuals. Build a positive atmosphere of collaboration to figure out the root and appropriate actions, and avoid dwelling on the cause and individuals.
- Ask questions: use the 5-Whys technique and/or Fishbone diagram
- Apply Pareto: 80-20 rule. About 80% of your issues will be caused by 20% of your problems. Therefore, focus your attention on where you can have the greatest impacts.
- Leverage technology rather than manual with too cumbersome to have the most effectiveness
- To reduce the escaped defects and increase bug detection capability, you can refer to lessons learned in the two following books to generate test ideas better.
- “50 Quick Ideas to Improve your Tests” by Gojko Adzic, David Evans, Tom Roden
- “Explore It! Reduce Risk and Increase Confidence with Exploratory Testing” by Elisabeth Hendrickson
Great post. Thank you for sharing.
Thanks for any other wonderful post. Where else may just anybody get that type of information in such an ideal way of writing?
I’ve a presentation subsequent week, and I’m on the
search for such info.
Thank you, vivaldiaudio. I love to hear that you found it helpful. I captured these knowledge from my testing experience and testing consulting over the time.