# Pocket Stats: Quick Significance Tests You Can Remember Andy Sleeper
03/23/2009

The ability to make the right decision quickly is essential in business. When decisions involve interpreting data, it is tempting to go with "gut feel" rather than wait for a statistician or a computer analysis. But what if you could quickly decide whether your data represents a significant effect, and state your confidence level, with no computer and almost no math? What if you could apply these statistical tools anywhere, anytime, as you might pull a multi-purpose tool out of your pocket? Well, now you can, with Pocket Stats!

One of the great benefits of statistical tools is the ability to identify signals in your data from the surrounding noise. Signals are important messages, such as a change in the performance of a system. When random variation or noise is combined with the signal, it may be difficult to correctly identify the signal. When the signal is clearly larger than the noise, we say it is "significant," as shown in Figure 1. (Click on diagram to enlarge.) Figure 1–Signals and Noise

Hypothesis tests are a powerful set of statistical tools for identifying significant effects and measuring their level of significance. Whenever someone says, "I am 95 percent confident that the new system is better," that statement ought to be supported by an appropriate hypothesis test. Hypothesis tests are typically complicated and confusing, requiring specialized terminology and software.

But in many common situations, there is a very simple hypothesis test that you can remember! With only basic math, you can examine a set of data and reliably state whether the change is significant or not.

Deciding Between One Sample and Two Samples

The first challenge is to identify the correct statistical tool for your situation. A sample is a set of things representing some larger population. After we collect sample data by measuring some aspect of those things, we want to use that data to infer something useful about the population. A one-sample problem involves one sample representing one larger population.

Example 1: The mortgage department at a local bank has started a project to improve loan processing time. The objective is for 50 percent of loans to be processed from application to closing in 10 days or less. In other words, the median time should be 10 days or less. Baseline measurements of processing time on 20 loan applications are listed below:

 13 20 9 12 15 9 12 11 10 14 20 19 12 19 9 28 16 11 27 9

In this sample, five loans took 10 days or less, and 15 did not. Does this data indicate that the median loan processing time is longer than 10 days, or is this just a string of bad luck?

In Example 1, loan applications are the things being measured. This sample of 20 loans is intended to represent the population of all loans processed by the same group in the same way. This is a one-sample problem.

Example 2: Two lapping machines are compared by splitting a lot of 20 parts randomly into two sets of 10, running one set through machine A and one set through machine B. The surface roughness of a critical surface is then measured for all parts. Here are the measurements:

 Machine A: 28 29 30 31 33 30 26 29 30 29 Machine B: 33 30 34 35 35 35 34 39 37 27

Is there a significant difference in the performance of these two machines?

Example 2 compares two machines by running a separate set of parts over each machine. This is a two-sample problem.

The issue of one or two samples may seem trivial, but cases like the following often confuse people:

Example 3: A coil supplier and customer disagree about whether certain parts conform to specifications. To investigate, one lot of 10 parts is measured first at the supplier and then at the customer. Here are the inductance measurements:

 Coil ID: 1 2 3 4 5 6 7 8 9 10 Supplier: 220 216 221 215 224 213 219 223 221 224 Customer: 218 215 222 212 223 210 218 221 221 222

Is there a significant difference between the supplier’s measurements and the customer’s measurements?

Is Example 3 a one-sample or a two-sample problem? Since there are two sets of data to be compared, it looks like a two-sample problem, but this is incorrect. The data in this example represents two measurements on a single sample of parts. The only relevant information in this data is the difference between the two measurements, which is a single data set, shown here:

 Coil ID: 1 2 3 4 5 6 7 8 9 10 S - C: +2 +1 -1 +3 +1 +3 +1 +2 0 +2

This is actually a one-sample problem. Whenever one set of parts is measured twice, (before and after, here and there, by two gages) this is a one-sample problem. This might also be called a paired-sample or repeated-measures problem.

Testing Two Samples with Tukey’s End Count Test

Now that you have seen the data in the above examples, can you decide whether there is a significant effect? Are there any signals bigger than the noise? It is very difficult to see patterns or effects in a table of numbers, but a graph makes patterns much easier to recognize. As Ellis Ott was fond of saying, "Always, always, always plot the data!" Since this article is about Pocket Stats, what kind of Pocket Stats graph can you make without a computer?
The easiest kind of graph to sketch involves a number line with dots or crosses representing the data, like the graph below made from the data in Example 2. (Click on diagram to enlarge.) Figure 2–Dot graph made from Example 2 data

Looking at the graph, can you now say there is a significant difference between these two machines? How do you know? How confident are you in this conclusion?

John Tukey (1959) devised a simple two-sample test to answer these questions using what he called the "end count" of the data. Assume that sample 1 has some values less than all values in sample 2. The end count is the count of values in sample 1 less than all values in sample 2, plus the count of values in sample 2 greater than all values in sample 1. Figure 3 highlights the values included in the end count of the Example 2 data, which is 1 + 7 = 8. (Click on diagram to enlarge.) Figure 3–Example 2 data, showing end count of 8

Tukey found that an end count of at least 7 provides 95 percent confidence that the two population distributions are different. Here is a table of significant end counts for three levels of confidence:

 End count Confidence Percentage 7 95 10 99 13 99.9
Tukey’s end count test is simple enough to be remembered and applied without any computers or mathematical aids. Remarkably, this test does not depend on the distribution of the data, nor do the critical end counts change as the sample sizes change, as long as both samples are nearly the same size. When the two samples are of different size, the critical end counts increase slightly. For details on this correction, see Tukey (1959) or Sleeper (2006), p. 537.

Tukey’s end count test will not work on all problems. If one sample contains both the highest and lowest values in both samples, there is no end count.

If you have statistical training, you may know about the two-sample t-test. Needless to say, the two-sample t-test is more complicated than the Tukey end count test, and there are other differences as well. The table below summarizes some of these differences:

 Tukey end count test Two-sample t-test Null hypothesis: The two populations have the same distribution The two populations have the same mean Alternative: The test might prove... ...that the distributions are different ...that the population means are different Assumed population distribution family No assumption Normal Additional assumptions None Two versions are available: one version assumes equal variances, the other does not

If you take the same data and apply both the Tukey end count test and the two-sample t-test, you may get different answers. This is reasonable because the two are very different tools. The Tukey end count test tests for differences in distribution by looking only at the extreme values of samples, without assuming any particular distribution family. The two-sample t-test tests for differences in mean, by looking at all the data, assuming a normal distribution.

The next article in this column will discuss the Pocket Stats version of Fisher’s one-sample sign test, another test worth remembering for one-sample problems.

References:

Sleeper, A. D. (2006) Design for Six Sigma Statistics: 59 Tools for Diagnosing and Solving Problems in DFSS Initiatives, McGraw-Hill

Tukey, J. W. (1959) "A Quick, Compact, Two-Sample Test to Duckworth’s Specifications" Technometrics, Vol. 1, No. 1, Feb., p 31-48