Variation, So Meaningful Yet So Misunderstood - Deming's SoPK Part III

Add bookmark

Lynda M. Finn

When businesses use the wrong data or don't recognize the different types of variation, the resulting decisions and actions tend to increase costs, reduce quality, reduce productivity, and foster frustration throughout the organization, says contributor Lynda M. Finn in the third of a 4-part series on Deming's system of management, SoPK. Here are the 6 common mistakes that businesses make - and why they make them.

Read Part I: Systems Thinking and the Three Musketeers

Read Part II: The Trouble with Motivation

Read Part IV: How Do We Know What We Know?

What did the unemployed mathematician say to his hungry parrot that really ruffled the parrot’s feathers? Read on to find out the answer.

It seems that most business executives were not trained on understanding processes and variation. They study how to manage people and money, but not how to listen to a process through data, and use that data to make improvements. Because many are not familiar with Dr. W. Edwards Deming’s enlightened insights on data and variation, they are unaware of the importance of process data and that different types of variation exist –and that those different types of variation require different types of responses. Deming also said, "How would they know?" If no one ever taught them (even worse if they were taught approaches that seem to work –even though in reality they sometimes do more harm than good), indeed, how would they know?

The point is this: when the wrong data is used or different types of variation go unrecognized, undiagnosed, or are confused, the resulting decisions and actions tend to increase costs, reduce quality, reduce productivity, and foster frustration throughout the organization.

Simply put, Dr. Deming emphasized in his writings, that business leaders have typically been taught to treat everything they don’t like as having a "special cause" reason as to why it happened, and thus want to investigate what one thing or person was responsible for causing the "aberration". People in general, seem to be wired and trained to go looking for THE reason that something bad or good happened. This problematic approach is often reinforced, because we can usually find "something unusual" associated with the thing we are investigating. Unfortunately, this "something unusual" is rarely the cause of the problem.

Let me illustrate some of the mishandlings of data and variation with a scenario from a company that wanted to reduce the cost of field service:

Mistake #1: Failure to plot data over time

The monthly management reports provided the managers with performance numbers but the reports didn’t plot the data. Without plots over time it is virtually impossible to spot patterns and trends, and it is impossible to decide if the degree of variation observed is typical "common cause" or atypical "special cause" variation. Okay, so now you know the two key types of variation that exist in a process. How do you know when you have one or the other, though? Control charts can help make this distinction, but in most cases a plot of the data over time and some simple rules about what constitutes a special cause (e.g. 6 points in a row heading up) can be enough to separate common cause from special cause situations. If you see tables of numbers instead of plots in your management reports, this is likely the case in your organization.

Mistake #2: Neglecting to normalize

When the company’s costs were plotted over time, there was clearly an increase in overall service costs. But in a growing nationwide business, it is not surprising that service costs are growing, what is more of interest is the cost per unit being maintained. Once the monthly data on costs were normalized (that is divided by the number of units in service that month) it was clearer that the cost situation was a common-cause problem, that is built in to the current system they currently use to deliver field service, and not a special cause change in costs that just occurred of late. If you are in a growing or shrinking business, and your key cost metrics aren’t looked at on a "per unit sold" or "per unit supported" basis, you may be missing key information.

Mistake #3: Neglecting to stratify

The company had two main classes of equipment in service in the field. When looked at together the picture looked completely different from when the data was separated by which type of unit. Costs, number of repairs, time to repair etc, looked quite different from one class to the other. Once the data was divided, it was much clearer where effort needed to be focused. If your reports aren’t broken down into your important customer or product groups, you may not really have the real picture of what’s going on.

Mistake #4: Treating a continuous metric as discrete

In addition to cost, the other number reported on a monthly basis was the percent of service calls completed in a timely manner. This figure, usually around 95%, was used to illustrate how good a job the group was doing in rapidly servicing equipment that failed. Here, the response time, a continuous measure, was truncated. It was turned into one of two discrete categories, on time or late. In doing so, much of the information necessary for analysis and improvement was lost.

One of the major problems with this approach was their fluidity with the requirements for how soon a unit must be serviced. There were frequent changes in how each field unit was categorized with respect to expected response time, (e.g. 4 hours, same day, 24 hours, 48 hours, etc.). As field folks shuffled equipment from category to category, their timeliness statistic remained high. Though it made them feel good about their performance, it became useless in understanding much about how quickly they were really responding to outages. If instead had they been focusing on collecting and studying the continuous data on response time, they could more easily see where they were responding well, what the trends were over time, and where they needed to improve. This would have increased productivity and quality, thus reducing costs.

Oh well, sometimes it’s better to look good than to be good, right?!

Seriously, though, an organization that has managers who do not understand different types of variation and the correct responses for each type ENSURES that people will spend more time trying to make the numbers look good than trying to figure out how to actually improve the processes that will consistently deliver much better numbers. ) If your organization has many percent on-time metrics, consider instead monitoring the measured speed instead.

Mistake #5: Not identifying key metrics

Since this company made the equipment they serviced, one of the most useful metrics for overall system performance is mean time between failures. It reflects how well the engineering group is doing at designing the equipment, how well the purchasing group is doing at buying the proper parts --as well as how good the field group is doing with installation, preventative maintenance and lasting repair. Working on metrics such as these help encourage systems thinking and discourage sub-optimization within departmental silos. Also this metric gave them insight into just how often they were visiting each piece of equipment and led to some policy changes around preventative maintenance that resulted in big savings. Do your metrics reflect what is really important to the organization as a whole, and encourage systems thinking?

Mistake #6: Acting inappropriately in the face of common cause variation

  • When faced with a common cause system of expensive-to-maintain equipment, managers still tended to favor special cause approaches to reducing variation. Some examples of their special cause approaches:
  • Let’s observe the best worker (or manager, or equipment type) and find out what they are doing that’s different
  • There were a lot of outages last week, let’s figure out what happened last week
  • Let’s send notices to (reps, managers, regions) with higher than average failure rates asking them to improve
  • Let’s blame the problem on a particular individual, thinking that replacing him or her will fix the issue.

Instead it was clear that common cause issues (causes of variation that are present for each and every field service call) were driving the frequent need for repairs and the costliness of the site visits. Some of these common cause sources of variation were:

  • High failure rates on certain parts
  • Barriers between the engineering group that designed the equipment and field service group that installed and maintained it
  • Incomplete or unclear work instructions
  • Not having all needed parts available before beginning a repair or install
  • Maintenance policies that drove up costs in the long term

Identifying these problems required observation of the work and analysis of all the service history, not just attention to the results that management liked or the ones they didn’t. Recognizing that most problems are built into the system and not the result of lack of effort from a particular individual is a key mind shift change to identifying and addressing the right issues –in the right ways. In other words, assuming an issue is the result of a special cause will send you on a hunt for the special cause. Walter Shewhart and Deming proved that special cause thinking will lead you astray most of the time. So, if in your company there is often a search for whom or what is to blame before questioning whether the problem is built into the current processes and systems, then you too are likely wasting time and misidentifying causes.

Most companies have a wealth of data available, but are too often unable to turn that data into helpful insights that could guide their action. Some simple steps I’ve found particularly helpful are:

  • Use a diverse group of people to brainstorm the key things you want to know about your process.
  • Narrow the list down to the top 10 or so, making sure that list is balanced, that is includes measures that cover financial health, customer satisfaction, internal efficiency, as well as preparedness for the future.
  • Get help from someone who is good with graphing to make graphs of how the above perform over time. Depending on the level of management required, the time basis may be monthly, daily or even hourly.
  • Identify which problems have chronic common causes, and which have special causes, and choose the right improvement action for the situation. Realize that the majority of problems are built into the companies processes and systems and not the failure of one particular individual.
  • Employ Pareto charts to decide how to approach a problem. Break the problem down into categories and look for situations where just a few categories account for the majority of the problems.


Bits and bites of information are not knowledge; they do not reveal what is really going on. Making the assumption that every bad bit or good bit of data comes from a special cause means you will be wrong quite often. Simply asking the question, "Hmmm, is there really a special cause or is what I’m seeing built into our processes and systems?" will mean you are much more likely to avoid the default tendency to jump to special cause solutions and thus be right more of the time.

Some helpful questions are:

  • Could this have happened to someone else?
  • Could this happen again under our usual conditions?

If the answer is yes, then treat the event as resulting from common cause. As a result your actions are less likely to cause a witch hunt for false culprits and more likely to help you and others understand how to improve the system. Some examples of common cause actions are: creating better work instructions, studying the failure rates on certain parts, and breaking down barriers between departments. With these guidelines in mind you can keep variation from biting you.

So just what did the mathematician say to the parrot? Polynomial.

Author’s Note: Thank you to Kelly Allan and Stuart Finn for their thoughtful comments and additions.

Copyright 2011 by Lynda M. Finn


Deming's SoPK Series on PEX Network

System of Profound Knowledge (SoPK) is the main subject of Deming’s second book on management, The New Economics. SoPK is Deming's system of management and has four interdependent areas:

  • Appreciation for a System –how to lead a system, and systems thinking
  • Knowledge about Variation, including statistical variation
  • Theory of Knowledge -the study of how we know what we know
  • Psychology - understanding the human aspect of management, and especially intrinsic motivation vs. extrinsic motivation.

Dr. Deming pointed out that one need not be an expert in any of the four elements, but viewing the world through the lens of the four elements with some proficiency would provide the viewer with profound knowledge of how to lead, diagnose data and issues, plan for the future, have everyone work together to optimize the system, innovate, and create an exciting win/win environment for customers, suppliers, employees, and managers alike.

Read Part I: Systems Thinking and the Three Musketeers

Read Part II: The Trouble with Motivation

Read Part IV: How Do We Know What We Know?


Editor’s Note: The columns published in THE DEMING FILES have been written under the Editorial Guidelines set by The W. Edwards Deming Institute. The Institute views these columns as opportunities to enhance, extend, and illustrate Dr. Deming’s theories. The authors have knowledge of Dr. Deming’s body of work, and the content of each column is the expression of each author’s interpretation of the subject matter.