Pocket Stats: Quick Significance Tests You Can Remember
Contributor:
Andy Sleeper
Posted: 03/23/2009 5:51:00 PM EDT The ability to make the right decision quickly is essential in business. When decisions involve interpreting data, it is tempting to go with “gut feel” rather than wait for a statistician or a computer analysis. But what if you could quickly decide whether your data represents a significant effect, and state your confidence level, with no computer and almost no math? What if you could apply these statistical tools anywhere, anytime, as you might pull a multipurpose tool out of your pocket? Well, now you can, with Pocket Stats!
One of the great benefits of statistical tools is the ability to identify signals in your data from the surrounding noise. Signals are important messages, such as a change in the performance of a system. When random variation or noise is combined with the signal, it may be difficult to correctly identify the signal. When the signal is clearly larger than the noise, we say it is “significant,” as shown in Figure 1. (Click on diagram to enlarge.)
Figure 1–Signals and Noise
Hypothesis tests are a powerful set of statistical tools for identifying significant effects and measuring their level of significance. Whenever someone says, “I am 95 percent confident that the new system is better,” that statement ought to be supported by an appropriate hypothesis test. Hypothesis tests are typically complicated and confusing, requiring specialized terminology and software.
But in many common situations, there is a very simple hypothesis test that you can remember! With only basic math, you can examine a set of data and reliably state whether the change is significant or not.
Deciding Between One Sample and Two Samples
The first challenge is to identify the correct statistical tool for your situation. A sample is a set of things representing some larger population. After we collect sample data by measuring some aspect of those things, we want to use that data to infer something useful about the population. A onesample problem involves one sample representing one larger population.
Example 1: The mortgage department at a local bank has started a project to improve loan processing time. The objective is for 50 percent of loans to be processed from application to closing in 10 days or less. In other words, the median time should be 10 days or less. Baseline measurements of processing time on 20 loan applications are listed below:
In this sample, five loans took 10 days or less, and 15 did not. Does this data indicate that the median loan processing time is longer than 10 days, or is this just a string of bad luck?
In Example 1, loan applications are the things being measured. This sample of 20 loans is intended to represent the population of all loans processed by the same group in the same way. This is a onesample problem.
Example 2: Two lapping machines are compared by splitting a lot of 20 parts randomly into two sets of 10, running one set through machine A and one set through machine B. The surface roughness of a critical surface is then measured for all parts. Here are the measurements:
Is there a significant difference in the performance of these two machines?
Example 2 compares two machines by running a separate set of parts over each machine. This is a twosample problem.
The issue of one or two samples may seem trivial, but cases like the following often confuse people:
Example 3: A coil supplier and customer disagree about whether certain parts conform to specifications. To investigate, one lot of 10 parts is measured first at the supplier and then at the customer. Here are the inductance measurements:
Is there a significant difference between the supplier’s measurements and the customer’s measurements?
Is Example 3 a onesample or a twosample problem? Since there are two sets of data to be compared, it looks like a twosample problem, but this is incorrect. The data in this example represents two measurements on a single sample of parts. The only relevant information in this data is the difference between the two measurements, which is a single data set, shown here:
This is actually a onesample problem. Whenever one set of parts is measured twice, (before and after, here and there, by two gages) this is a onesample problem. This might also be called a pairedsample or repeatedmeasures problem.
Testing Two Samples with Tukey’s End Count Test
Now that you have seen the data in the above examples, can you decide whether there is a significant effect? Are there any signals bigger than the noise? It is very difficult to see patterns or effects in a table of numbers, but a graph makes patterns much easier to recognize. As Ellis Ott was fond of saying, “Always, always, always plot the data!” Since this article is about Pocket Stats, what kind of Pocket Stats graph can you make without a computer?
The easiest kind of graph to sketch involves a number line with dots or crosses representing the data, like the graph below made from the data in Example 2. (Click on diagram to enlarge.)
Figure 2–Dot graph made from Example 2 data
Looking at the graph, can you now say there is a significant difference between these two machines? How do you know? How confident are you in this conclusion?
John Tukey (1959) devised a simple twosample test to answer these questions using what he called the “end count” of the data. Assume that sample 1 has some values less than all values in sample 2. The end count is the count of values in sample 1 less than all values in sample 2, plus the count of values in sample 2 greater than all values in sample 1. Figure 3 highlights the values included in the end count of the Example 2 data, which is 1 + 7 = 8. (Click on diagram to enlarge.)
Figure 3–Example 2 data, showing end count of 8
Tukey found that an end count of at least 7 provides 95 percent confidence that the two population distributions are different. Here is a table of significant end counts for three levels of confidence:
Tukey’s end count test is simple enough to be remembered and applied without any computers or mathematical aids. Remarkably, this test does not depend on the distribution of the data, nor do the critical end counts change as the sample sizes change, as long as both samples are nearly the same size. When the two samples are of different size, the critical end counts increase slightly. For details on this correction, see Tukey (1959) or Sleeper (2006), p. 537.
Tukey’s end count test will not work on all problems. If one sample contains both the highest and lowest values in both samples, there is no end count.
If you have statistical training, you may know about the twosample ttest. Needless to say, the twosample ttest is more complicated than the Tukey end count test, and there are other differences as well. The table below summarizes some of these differences:
If you take the same data and apply both the Tukey end count test and the twosample ttest, you may get different answers. This is reasonable because the two are very different tools. The Tukey end count test tests for differences in distribution by looking only at the extreme values of samples, without assuming any particular distribution family. The twosample ttest tests for differences in mean, by looking at all the data, assuming a normal distribution.
The next article in this column will discuss the Pocket Stats version of Fisher’s onesample sign test, another test worth remembering for onesample problems.
References:
Sleeper, A. D. (2006) Design for Six Sigma Statistics: 59 Tools for Diagnosing and Solving Problems in DFSS Initiatives, McGrawHill
Tukey, J. W. (1959) “A Quick, Compact, TwoSample Test to Duckworth’s Specifications” Technometrics, Vol. 1, No. 1, Feb., p 3148
Posted: 03/23/2009 5:51:00 PM EDT The ability to make the right decision quickly is essential in business. When decisions involve interpreting data, it is tempting to go with “gut feel” rather than wait for a statistician or a computer analysis. But what if you could quickly decide whether your data represents a significant effect, and state your confidence level, with no computer and almost no math? What if you could apply these statistical tools anywhere, anytime, as you might pull a multipurpose tool out of your pocket? Well, now you can, with Pocket Stats!
One of the great benefits of statistical tools is the ability to identify signals in your data from the surrounding noise. Signals are important messages, such as a change in the performance of a system. When random variation or noise is combined with the signal, it may be difficult to correctly identify the signal. When the signal is clearly larger than the noise, we say it is “significant,” as shown in Figure 1. (Click on diagram to enlarge.)
Figure 1–Signals and Noise
Hypothesis tests are a powerful set of statistical tools for identifying significant effects and measuring their level of significance. Whenever someone says, “I am 95 percent confident that the new system is better,” that statement ought to be supported by an appropriate hypothesis test. Hypothesis tests are typically complicated and confusing, requiring specialized terminology and software.
But in many common situations, there is a very simple hypothesis test that you can remember! With only basic math, you can examine a set of data and reliably state whether the change is significant or not.
Deciding Between One Sample and Two Samples
The first challenge is to identify the correct statistical tool for your situation. A sample is a set of things representing some larger population. After we collect sample data by measuring some aspect of those things, we want to use that data to infer something useful about the population. A onesample problem involves one sample representing one larger population.
Example 1: The mortgage department at a local bank has started a project to improve loan processing time. The objective is for 50 percent of loans to be processed from application to closing in 10 days or less. In other words, the median time should be 10 days or less. Baseline measurements of processing time on 20 loan applications are listed below:
13  20  9  12  15  9  12  11  10  14 
20  19  12  19  9  28  16  11  27  9 
In this sample, five loans took 10 days or less, and 15 did not. Does this data indicate that the median loan processing time is longer than 10 days, or is this just a string of bad luck?
In Example 1, loan applications are the things being measured. This sample of 20 loans is intended to represent the population of all loans processed by the same group in the same way. This is a onesample problem.
Example 2: Two lapping machines are compared by splitting a lot of 20 parts randomly into two sets of 10, running one set through machine A and one set through machine B. The surface roughness of a critical surface is then measured for all parts. Here are the measurements:
Machine A:  28  29  30  31  33  30  26  29  30  29 
Machine B:  33  30  34  35  35  35  34  39  37  27 
Is there a significant difference in the performance of these two machines?
Example 2 compares two machines by running a separate set of parts over each machine. This is a twosample problem.
The issue of one or two samples may seem trivial, but cases like the following often confuse people:
Example 3: A coil supplier and customer disagree about whether certain parts conform to specifications. To investigate, one lot of 10 parts is measured first at the supplier and then at the customer. Here are the inductance measurements:
Coil ID:  1  2  3  4  5  6  7  8  9  10 
Supplier:  220  216  221  215  224  213  219  223  221  224 
Customer:  218  215  222  212  223  210  218  221  221  222 
Is there a significant difference between the supplier’s measurements and the customer’s measurements?
Is Example 3 a onesample or a twosample problem? Since there are two sets of data to be compared, it looks like a twosample problem, but this is incorrect. The data in this example represents two measurements on a single sample of parts. The only relevant information in this data is the difference between the two measurements, which is a single data set, shown here:
Coil ID:  1  2  3  4  5  6  7  8  9  10 
S  C:  +2  +1  1  +3  +1  +3  +1  +2  0  +2 
This is actually a onesample problem. Whenever one set of parts is measured twice, (before and after, here and there, by two gages) this is a onesample problem. This might also be called a pairedsample or repeatedmeasures problem.
Testing Two Samples with Tukey’s End Count Test
Now that you have seen the data in the above examples, can you decide whether there is a significant effect? Are there any signals bigger than the noise? It is very difficult to see patterns or effects in a table of numbers, but a graph makes patterns much easier to recognize. As Ellis Ott was fond of saying, “Always, always, always plot the data!” Since this article is about Pocket Stats, what kind of Pocket Stats graph can you make without a computer?
The easiest kind of graph to sketch involves a number line with dots or crosses representing the data, like the graph below made from the data in Example 2. (Click on diagram to enlarge.)
Figure 2–Dot graph made from Example 2 data
Looking at the graph, can you now say there is a significant difference between these two machines? How do you know? How confident are you in this conclusion?
John Tukey (1959) devised a simple twosample test to answer these questions using what he called the “end count” of the data. Assume that sample 1 has some values less than all values in sample 2. The end count is the count of values in sample 1 less than all values in sample 2, plus the count of values in sample 2 greater than all values in sample 1. Figure 3 highlights the values included in the end count of the Example 2 data, which is 1 + 7 = 8. (Click on diagram to enlarge.)
Figure 3–Example 2 data, showing end count of 8
Tukey found that an end count of at least 7 provides 95 percent confidence that the two population distributions are different. Here is a table of significant end counts for three levels of confidence:
End count  Confidence Percentage 
7  95 
10  99 
13  99.9 
Tukey’s end count test will not work on all problems. If one sample contains both the highest and lowest values in both samples, there is no end count.
If you have statistical training, you may know about the twosample ttest. Needless to say, the twosample ttest is more complicated than the Tukey end count test, and there are other differences as well. The table below summarizes some of these differences:
Tukey end count test  Twosample ttest  
Null hypothesis:  The two populations have the same distribution  The two populations have the same mean 
Alternative: The test might prove...  ...that the distributions are different  ...that the population means are different 
Assumed population distribution family  No assumption  Normal 
Additional assumptions  None  Two versions are available: one version assumes equal variances, the other does not 
If you take the same data and apply both the Tukey end count test and the twosample ttest, you may get different answers. This is reasonable because the two are very different tools. The Tukey end count test tests for differences in distribution by looking only at the extreme values of samples, without assuming any particular distribution family. The twosample ttest tests for differences in mean, by looking at all the data, assuming a normal distribution.
The next article in this column will discuss the Pocket Stats version of Fisher’s onesample sign test, another test worth remembering for onesample problems.
References:
Sleeper, A. D. (2006) Design for Six Sigma Statistics: 59 Tools for Diagnosing and Solving Problems in DFSS Initiatives, McGrawHill
Tukey, J. W. (1959) “A Quick, Compact, TwoSample Test to Duckworth’s Specifications” Technometrics, Vol. 1, No. 1, Feb., p 3148

Planning Your Project 
How to Secure Six Sigma Green Belt Certification 
The Kano Model: Critical to Quality Characteristics and VOC 
Cash Flow—Your Key to Business Survival 
Are Group Dynamics Problems Compromising the Effectiveness of Your Six Sigma Projects? 
Improving Insurance Claim Throughput and Quality with Lean Process Improvement 
Align Business Process Management Measures with Organizational Goals and Strategies 
Creating Value on the Vine: A [yellow tail] Case Study 
The Fishbone Diagram and The Reverse Fishbone Diagram Concepts 
Six Sigma Failures: Why Does Six Sigma Training Fail?

Operational Excellence for Telecoms
Radisson Edwardian Heathrow, London, United Kingdom
June 29 1, 2015 
Operational Excellence in Oil and Gas
Radisson Edwardian Heathrow, London, United Kingdom
June 29 1, 2015 
PEX Week Australia 2015
Amora Jamison, Sydney, Australia
July 27 30, 2015
Not a member? Sign Up
Reasons for Joining