The Right Tool Matters!

Andy Sleeper

Keep things as simple as possible, but not simpler. – Albert Einstein (paraphrased)

When I was a kid, my big sister drove a red 1967 VW Bug. In its day, this was an innovative car. It had 54 horsepower, a two-speed windshield wiper and zero computers. The heat was inadequate and it broke down a lot, but it got 100 miles per gallon—of oil. The Bug was simple, and VW rightly marketed it as everyman’s car.

Now in 2009, suppose you are shopping for a new car. You walk into a dealership and see a whole room stuffed with shiny, new 1967 VW Bugs. You might say, "Awww—how cute! Where did you put the real cars?"

You might explain to the salesman the context for your car search. Perhaps you need a reliable car for your two kids and a dog. Perhaps you need a car for commuting, plus summer camping and winter skiing trips. This doesn’t look like the right car for you.

But the salesman persists, citing reliability statistics: "97.3 percent of these models work fine, 97.5 percent of the time. This kind of behavior is the very definition of robustness." The salesman insists that you do not need any other car, saying, "Anyone who tells you anything to the contrary is simply trying to complicate your life unnecessarily."

How long would you listen to someone puffing up a car that is clearly inadequate for today’s world, and more importantly, clearly wrong for you? The right car matters.

Strangely, this is the exact situation we have today with such an important statistical tool as the control chart. In two recent articles, renowned author Don Wheeler argues that the individual X chart with three-sigma control limits is suitable for any process, regardless of its underlying distribution. Using the exact words and statistics I attributed to the car salesman, Wheeler argues that data transformations can be fatal to your analysis, and labels anyone who transforms data a "leptokurtophobe."

When shopping for a statistical tool, what are we looking for? Usually, we need help making a decision. Rather than just going with a gut feel or staring at a huge table of data, we need a tool to summarize data in a meaningful way, finding signals that may be relevant to our decision. Knowing that decisions have risks, we want to manage the probability of making a wrong decision. Specifically, the risk of a false alarm should be small and known. If we find that the risk of decision errors is unpredictable or unreliable, then we can’t trust the tool.

When the data are process measurements in time order, we often want to know whether the process is stable, or if there are assignable causes of variation. Control charts are the ideal tool for this task. On a control chart, we can quickly see trends and other nonrandom patterns. Along with these signals, points outside the control limits indicate probable assignable causes of variation. But which control chart should we use?

If the data are collected in rational subgroups, then the standard X-bar-S chart works very well in almost every case. But for individual data, the individual X chart may or may not work well, depending on the distribution of the data.

In "Do You Have Leptokurtophobia?" Wheeler makes one excellent point: "The first principle for understanding data is that no data have meaning apart from their context." But then Wheeler ignores this principle, encouraging readers to disregard what the context says about the underlying process distribution.

Figure 1 illustrates how context affects decisions about process stability. Somewhere in the Ocean of Reality is an island I call Dataland. Nothing is real on Dataland, but the models we create there can be useful representations of reality. (Click on diagram to enlarge.)

On Dataland, there are two decisions that depend on each other. We want to decide whether a process is stable, but this decision depends on a probability model. Before we can select a probability model, we need a sample of data representing stable process behavior. Context can resolve this dilemma. If we know from the context that the process typically follows a certain distribution family, we can use this knowledge to separate stable, common cause variation from unstable, assignable cause variation. On the other hand, if we have a dataset that we know to be stable or unstable, we can use this to select the most appropriate distribution model.

Wheeler argues that one probability model works for every situation, specifically, that observations beyond the three-sigma control limits are unlikely from a stable process. As an example, Wheeler uses a set of "hot metal transit times." A standard individual X chart shows several out of control conditions, but a transformation suggested by a computer algorithm causes those signals to disappear. Are the signals real or not? In this case, context says that the process is unstable. Wheeler says, "they still have no idea when the hot metal will arrive at the ladle house." According to context, the normal-based individual X chart is the appropriate tool.

Table 1: Runout data

12 17 6 8 16 8 18 9 10 11
8 6 16 12 11 9 4 2 3 8
1 5 5 13 9 6 8 8 9 4
9 8 5 18 12 5 3 6 3 9
9 16 5 9 5 1 17 4 6 14
12 4 12 6 12 12 24 5 12 26
3 7 4 2 1 4 10 4 4 4
6 1 5 6 5 6 5 9 8 2
4 2 3 16 16 6 9 13 17 7
10 3 6 5 17 1 13 9 6 13

Consider another dataset of 100 measurements, listed in Table 1. Suppose you are given the data with no context at all. You might construct a histogram, shown in Figure 2. (Click on diagram to enlarge.)

The statistics provided with this SigmaXL histogram include an Anderson-Darling normality test with a p-value of 0.0000. This indicates very strong evidence of nonnormality. But according to Wheeler, this should not matter. Continuing with Wheeler’s advice, Figure 3 is a standard normal-based individual X control chart. (Click on diagram to enlarge.)

This control chart shows two points (#66 and #96) outside the control limits. What caused these signals? Also, you might notice a suspicious gap at the bottom of the chart. Why are there no values near the lower control limit? With no context for this dataset, there are no answers to these questions.

Now include some context. This dataset represents measurements of runout, or eccentricity, of a critical diameter on an oil pump shaft. Because of the way runout is measured, negative measurements are impossible, so zero forms a lower boundary. This helps to explain the skewed appearance of the histogram.

The control chart in Figure 3 has to be wrong. The lower control limit should really be a zero, and the upper control limit should be higher to compensate. Are the two highest values really unusual or not?

To answer these questions, try a Box-Cox transformation, which is designed for datasets with a lower boundary of zero. This algorithm suggests a square-root transformation. Figure 4 shows an individual X chart based on this transformation. (Click on diagram to enlarge.)

Figure 4 plots the data in its original units, but the control limits are adjusted according to the square root transformation. Based on this new control chart, there is nothing unusual about the two highest runout values.
We can measure the risk of false alarms in control charts in terms of average run length or ARL. When normally distributed data is plotted on an individual X chart, the rule one ARL is 370, meaning that one point out of 370 will fall outside either the upper or lower control limits, even with no assignable causes of variation. This is generally accepted as a reasonable risk of false alarms.

In arguing against transformations, Wheeler claims that most distributions have more than 97.5 percent of their probability covered by symmetric three-sigma limits. 97.5 percent coverage equates to an ARL of 40 on the normal-based individual X chart, with nine times more false alarms than an ARL of 370. Wheeler argues that this level of unreliability in the control chart is acceptable. I have great respect for Don Wheeler’s expertise and encyclopedic body of work, but I disagree with him on this point. As Black Belts and quality experts, if we allow the risks of bad decisions to vary so widely with no regulation, we will lose the respect of managers who rely on us for timely and precise assessments of quantitative data.

Since 1924 when Walter Shewhart first devised the simple and elegant control chart, technology has advanced exponentially. Normal-based control charts are still the fundamental tool for statistical process control, and rightly so. But the enormous diversity of real processes demands a variety of process control tools, including transformation methods. Real measurement systems, such as the runout example, encompass transformations of underlying dimensional distributions into new ones never seen in Shewhart’s day. To reject the benefits of transformations, even when the process context clearly rejects normality, is to ignore our customers who need reliable data-based tools to support their decisions.


Wheeler, Don (8/5/2009) "Do You Have Leptokurtophobia?"
Wheeler, Don (9/9/2009) "Transforming the Data Can Be Fatal to Your Analysis"
Graphs provided by SigmaXL software,