Measuring the Unmeasurable: Gage R&R for Transactional Six Sigma Projects
Posted: 02/19/2010 12:00:00 AM EST | 6
How many times has this happened to you? You’re leading a Six Sigma project on a transactional process of some kind, something not directly tied to manufacturing or measurement of product quality. You get to the Measure phase of your Six Sigma project and struggle to figure out how to satisfy the requirement for a Gage R&R statistic to interpret. If that’s ever happened to you, read on for a solution to this sticky problem.
Where Gage R&R Fits into a Six Sigma Project
Before we get into the details, I want to spend a few words talking about where a gage study fits into a Six Sigma project and a little bit on the “spirit” of the Gage R&R requirement. Gage R&R is the second step in the Measure phase of the Six Sigma DMAIC process. Gage R&R comes after process mapping and building a data collection plan, and before we calculate the baseline capability of our process to be improved. Gage R&R also comes up again in the Control phase of the Six Sigma DMAIC process for the purpose of ensuring that we are able to measure the critical control parameters adequately to maintain the gains that we have achieved.
There are good reasons why Gage R&R is placed where it is in the process. A gage study follows process mapping because we must understand the process we are trying to improve and where the data about that process can be found before we can measure it. A gage study precedes calculating baseline capability because we need to be able to ensure that the data is good before we use it.
The reason we want to do a gage study boils down to confidence and good decision making. In Measure, we do a gage study of the data used to generate the Project Y or Critical To Quality (CTQ) measurement. This is the issue that is most important to the customer of the process. Why do we need to have confidence in this data? So we can be confident that, as we carry that data forward to capability analysis and root cause analysis in the Analyze phase, we can trust the conclusions that we will draw and the results we will see. That’s it, confidence and good decision making.
To understand the importance of a gage study, imagine your automobile speedometer for a moment. Imagine you’re driving down the road and your speedometer indicates that you're traveling at 55 miles per hour just as you pass a parked police car. Imagine your surprise if that policeman pulled you over and wrote you a ticket for going 65 miles per hour! It would have been great to know that your speedometer was inaccurate by 10 miles per hour. You might have made a different decision while passing the parked police car.
Getting Back to our Initial Six Sigma Project Problem
We are leading a Six Sigma project with attribute data rather than continuous data measured on a device. What do we do to ensure that we can trust the data, decisions, and conclusions that will follow? Attribute agreement is the answer.
Attribute agreement is a method of comparing the responses made by “appraisers” when judging the characteristic of interest. In an attribute agreement study there are four possible levels of analysis of the responses: 1. appraiser against themselves; 2. appraiser against other appraisers; 3. appraiser against a standard (if one exists); and 4. overall appraiser capability.
A case study helps explain the tool and how to interpret the results.
A Six Sigma project has been chartered to look into the high occurrence of Off-Quality product due to expired shelf life. This type of off-quality product typically accumulates about $1mm annually.
- Our Data: Classification codes of Off-Quality reasons from ERP
- Our Problem: Determine if we can trust the data that everything classified as shelf life is really a shelf life issue
- Possible Choices: SL=Shelf Life; EP=Experimental Product; RT=Retained Sample
- Each appraiser judged the samples twice
Once the proper selections have been made, go ahead and conduct the analysis and you’ll get results that look like this: (Click on image to enlarge.)
Interpreting these graphs is as follows: The graph on the left shows how much an appraiser agrees with their own earlier decisions across successive trials. This graph indicates that we may have a training issue with appraiser number 3 regarding their understanding of the criteria for the decision. The graph on the right indicates a percentage of agreement compared with the standard, if one exists. (If no standard is chosen then this panel will be blank.) This graph indicates that appraiser number 2 agrees 100 percent with the standard, while appraisers number 1 and 3 appear to be somewhat confused about the standard.
Fleiss’ Kappa Statistic
Next we move on to interpret the session window statistics, but before we go there a brief explanation of the Kappa statistic.
The basis for the Kappa statistic is a comparison to random chance. Imagine flipping a coin to make a quality decision on a process, that’s random chance. Kappa compares the results gathered through the study with the possibility that those results could be randomly generated as if flipping a coin or rolling a die.
Kappa ranges from -1 to +1 with a value of 0 indicating random chance. The closer the Kappa statistic gets to 1, the less likely that the results are the result of random chance. Said a different way, the less random chance-like the results, the more likely that the appraisers (getting back to the Six Sigma project case study) are actually able to discern differences between the categories.
Kappa values less than 0 indicate that the responses are worse than random chance would generate. It’s sort of the statistical equivalent to the old test taking advice of answering C when you don’t know the answer. You’ll be right some of the time. This indicates that the appraiser can not distinguish the categories or is not willing to try.
The Hypothesis regarding Kappa goes as follows:
- H0: The agreement within appraiser is due to chance
- H1: The agreement within appraiser is not due to chance
The way to related the Kappa statistic to a typical Gage R&R result is to subtract Kappa from 1 to get an approximation of a Gage R&R value. So if Kappa is .9, subtract .9 from 1 and the remainder is .1 or 10 percent Gage R&R. This is just a way to translate the Kappa result into terms that Six Sigma Master Black Belts and Black Belts understand. The same rules of interpretation of a gage study result apply with attribute studies. Just to refresh, the AIAG guidelines for acceptability of gage studies are:
Gage R&R > 30 percent = unacceptable, measurement process needs improvement
Gage R&R between 10 percent and 30 percent = Marginal, measurement system needs improvement
Gage R&R < 10 percent = acceptable
Interpret Attribute studies using the same rules.
Below is the statistical results for the two panel graph shown in figure 5 along with the specific interpretation. (Figures 6 and 7) (Click on image to enlarge.)
Six Sigma Project Case Study Conclusion
The final conclusion from this Six Sigma project case study was that something needed to be done to improve the ability of engineers making this decision to make a better decision about how to categorize scrap product. This one finding, when corrected, reduced the occurrence of the problem by nearly 50 percent and allowed the team to correctly interpret the magnitude of the problem originally stated. Failure to address the attribute agreement issues would have resulted in a vastly different set of solutions than resulted after this problem was corrected.
Use Attribute Agreement Analysis for Good Decision Making
Attribute agreement analysis is an effective method for delivering a statistical interpretation of a subjective judgment decision made by people, allowing fact based improvements to be identified, implemented and measured. Attribute agreement analysis allows those leading Six Sigma projects without continuous data to measure the quality of that data and boost confidence in the capability of the system, and decisions that are made to improve it.
Your Biggest Employee Engagement Challenge is AFTER the Economic Recovery
Setting a Standard for Lean Six Sigma Belt Certification
Driving Focus and Alignment with the Balanced Scorecard
The Mighty WOMBAT: A Simple Approach to Finding Muda
Application of Lean Tools to Eliminate Wastes in an ERP System
The Benefit of Using Social Networking Sites for Recruiting
Palmer Morrel-Samuels Talks Employee Surveys
Defining the Role of Champions in Business Excellence
Resurgence of Six Sigma in the Call Center: Decreasing Customer Churn
Prevent Recurring Defects with Root Cause Analysis
* = required.
[ "A little knowledge is a dangerous thing." ]
1) Gage R&R is a special application of ANOVA - before jumping into a gage R&R learn all you can about ANOVA - it's assumptions, applications, limitations.
2) Not every project (six sigma or otherwise) will need a "gage R&R" component. Before investing the relatively high resource requirement of that method, understand the nature and severity of the issues - much can be significantly improved *before* resorting to gage R&R (if it's even needed at all).
3) HerbRobbins is correct - the speedometer example has to do with accuracy (or bias, or off-set), *not* pecision. Gage R&R can do *nothing* for you about accuracy - that would require a calibration study.
4) While Attribute Agreement Analysis gives you a measure of agreement within an appraiser ("repeatability") and across appraisers ("reproducibility"), it doesn't tell you the total (common cause) variation in the system, nor does it break it down into it contributing components - a key feature of gage R&R (i.e., ANOVA)...which can be used as clues as to where to focus variation-reduction efforts (if indeed that's the key issue).
5) And finally, given the little information around context, it would appear that what statisticians call a Type III Error is being commited: working on the wrong problem. If the issue is "occurrence of Off-Quality product due to expired shelf life," isn't it / sholdn't it be standard practice to label (record) the expiration date / experimetnal nature / retained sample of each batch / lot / drum so its status is *not* a judgment call?
James, Thanks for the nicely written paper. I've performed numerous Kappa and Gauge R&R's. One comment about the police and the speed. A GR&R will not identify inaccuracy ,,, this tool identifies variation.
The Kappa study will provide the difference to the expert (or standard), although only if this information is included. As our Central Limit Theorem tells us ... take extra care when selecting the "Expert" or standard. My experience tells me that several folks, working together to derive the "Expert" rating is the way to go. - Herb
is this method applicable in leading a six sigma project to improve customer satisfaction in a contact center industry?
where we (contact center / bpo) provide service for a large american firm (client) who also created, delivers and collates the customer satisfaction surveys that gauges the local contact center's csat performance.
I have used Attribute Agreement Analysis in situation were users categorize the complaints in predefined categories. In large groups with high employee turn over the Gauge R&R tends to be high.