The Perils of Polling: Statistical Universes, Non-Sampling Errors & Market Analysis Blunders [Most Popular]

Add bookmark

PEX Network Editorial
06/11/2015

Editor's Note:

"The results of polls/sample surveys" are hotly debated these days by political strategists, business' friends and critics, in management seminars, and in marketing journals.

Of late, we have been annoyed by the frequency with which bogus statistical evidence collected via polling is used intentionally and unintentionally by some to sell their pet ideas and false narratives.

Further, almost every business desires to grow by converting noncustomers into customers, as "Salesforce.com did with its on-demand CRM software which opened up a new market space by winning over small and midsize firms that had previously rejected CRM enterprise software".

[EventPDF]

As this article will soon discuss, many executives, especially those in marketing, have been reasonably brought up to believe the customer is king. And right they are.

But this orientation/approach reflexively causes them to fixate on existing customers and neglect noncustomers (i.e., people/organizations that should be a customer but are not).

Keep all this in mind as you read this article which discusses the importance of correctly defining what statisticians call a "statistical universe ". Incomplete/incorrect definitions of a statistical universe leads to faulty polling conclusions and market analysis blunders.

Introduction

More than 100 years ago H.G. Wells said: "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write."

Polling of public opinion has become an industry of its own, one of our more thriving growth industries.

Today we ask people what they think about collective bargaining… public service unions… the ever-growing budget deficit… the entitlement crisis …the homeland threat of radical groups… the welfare mess and the like, regardless of whether he or she has given the matter serious thought (or knows anything about the subject).

We ask workers what they find wrong with their jobs, with perhaps no better result than to activate their natural human instinct for dissatisfaction.

We ask CEOs and CFOs what they see for the immediate economic future, knowing that they cannot know––and when there's been little time for sober thought.

Media headlines quote daily the result of polls. But––and this is a big but––the public's "opinion" on almost any issue will be a function of three factors, namely, the questions asked, the responses, and the analysis.

Many books and articles have been devoted to the subject of misuses and abuses of polling and sample surveys. But in today's fast-paced media world, the science of correctly collecting, summarizing, analyzing, and using data seems bothersome.

Lawmakers seem more interested in the results of polls, not in how they were obtained. Until recently, it did not occur to them that the results of polls and how they are obtained are inseparable.

A Neglected Truth About Opinion Polls

Neil Postman, in a marvelous book entitled Technopoly, delightfully exposed an inherent weakness in interpreting survey results:

"Pollsters ask questions that will elicit yes or no answers. Polling ignores what people know about the subjects they are queried on.

In a culture that is not obsessed with measuring and ranking things, this omission would probably be regarded as bizarre.

But let us imagine what we would think of opinion polls if the questions came in pairs, indicating what people 'believe' and what they 'know' about the subject.

If I may make up some figures, let us suppose we read the following:

The latest poll indicates that 72 percent of the American public believes we should withdraw economic aid from Nicaragua…

Of those who expressed this opinion; 28 percent thought Nicaragua was in Central Asia, 18 percent thought it was an island near New Zealand... and 27.4 percent believed that ‘Africans should help themselves, obviously confusing Nicaragua with Nigeria...

Moreover, of those polled, 61.8 percent did not know that Americans give economic aid to Nicaragua, and 23 percent did not know what ‘economic aid’ means…

Were pollsters inclined to provide such information, the prestige and power of polling would be considerably reduced."

Polls Show Support for Embattled Public Sector Workers

On February 28, 2011, a New York Times/CBS poll reported a majority of people opposed efforts to weaken collective bargaining rights… and were against cutting the pay or benefits of public employees to reduce state budget deficits.

Similarly, a recent national Gallup survey suggested that a majority of Americans oppose measures like the one proposed in Wisconsin that restricts collective bargaining rights for public employees… a result near-identical to the New York Times/CBS News poll.

It would be interesting to apply Postman's comments relating to what those sampled knew about the issues involved. Readers can draw their own conclusions.

We should say, however, that one sharp commentator on the New York Times website summed it up this way: "Americans prefer to have their taxes raised rather than decrease benefits to public employees… that would certainly be news if it were true"

How to Guarantee Making the Wrong Conclusions from Polls or Surveys

Let's get right to it. Classically trained statisticians call it the statistical universe or population.

What is a statistical universe? The statistical universe consists of all things about which conclusions are to be drawn.

For example, a study of price fluctuations in the price of heating oil in New York City from 2007 to 2010, the statistical universe would include every price change which had occurred during the specified time interval.

If the scope of the study were expanded to cover a larger territory or a longer period of time, the statistical universe is correspondingly enlarged.

Obviously the term statistical universe is an elastic one and varies in its precise connotation every statistical undertaking.

In general, the statistical universe may be defined as a totality embracing every item which might have been brought under observation had a complete enumeration been effected

The Need for Sampling

Lack of time and money render it impossible to make a complete survey of most statistical universes. Thankfully, it's not necessary to survey the entire statistical universe.

Why? Because hard-working, brilliant statisticians discovered how to get the same information from carefully selected, relatively small samples.

Will making the correct inferences on small size samples taken from large or sometimes infinite statistical universes is the subject matter basic and advanced statistics.

If the sampling process is properly carried out, an analysis of the samples makes it possible to infer information about the statistical universe within the limits of unavoidable chance errors of sampling––the so-called margin of error.

Unfortunately, many courses in basic statistics fail to emphasize one critical point, namely, if the statistical universe is improperly defined, the powerful techniques of inferential statistics (making inferences on the basis of samples) are of little or no value.

The term for making inferences from an improperly defined statistical universe is called non-sampling error. Statistical techniques designed to measure sampling error are valueless if the group conducting the poll/survey has committed the biggest statistical error of them all – non-sampling error.

An Example of a Poorly Defined Statistical Universe

Undoubtedly the most widely publicized illustration of a poorly defined statistical universe is the one concerning the Literary Digest's error in predicting the winner of the presidential election of 1936. Indeed, this is the example most often cited by W. Edwards Deming.

During the 1936 election campaign between Democrat Franklin D. Roosevelt and Republican Alfred M. Landon, the Literary Digest magazine sent mock ballots to a large list of people whose names appeared in telephone directories and automobile registration records. (Their lists also included their own magazine subscribers, club members, and the like)

Over 10 million mock ballots were sent out; 2.4 million ballots were returned by respondents; 7.6 million were not returned.

On the basis of 2.4 million returned ballots, the Digest predicted Landon would win by a comfortable margin––indeed, a landslide.

As it turned out, however, Roosevelt received 61% of the votes cast, a proportion representing one of the largest majorities in American presidential history.

How Could They be so Wrong?

Polls only represent the people who are in a statistical universe and who respond to them. Despite the sample's huge size, this election became a textbook case of a biased sample: all the sample's component groups were heavily Republican.

Let's get more specific. There were two important reasons for the erroneous prediction--namely:(1) an incorrectly defined statistical universe and;(2) NON-response bias.

Everyone with telephones and automobiles, in 1936, were in a higher economic group than those people without these two " luxuries.". There was a bias inherent in the statistical universe.

A large percentage of the voting population would not show up in telephone directories, automobile registrations, and club memberships.

The statistical universe was improperly defined––it tended to be biased in favor of higher income groups. Higher income groups tended to be Republican.

In the 1936 election there was a strong relationship between income and party preference. Lower income groups tended to be Democratic.

Bias in the Sample Selection Process

Classically trained statisticians define a bias as a persistent error in one direction. What does this mean?

No matter who you sampled from the Literary Digest' statistical universe, there was a high probability a relatively affluent person would be selected.

To repeat: the statistical universe selected was slanted towards middle and upper-class voters… and excluded most lower-income voters. And, in reality, there were a great many low income voters in 1936.

Nine million people were unemployed in 1936.

"With regard to economic status, the Literary Digest poll was far from being a representative cross-section of the population. Then as now, voters are generally known to vote with their pocketbooks."

It should be mentioned––indeed, emphasized––that George Gallup was able to predict a victory for Roosevelt using a much smaller sample of about 50,000 people.

His statistical universe consisted of a representative cross-section on the population.

The Literary Digest poll sample size was 2.4 million people. This illustrates that a poorly defined statistical universe cannot be cured by increasing the size of the sample, which in fact just compounds the mistakes.

Non-Response Bias

The second problem with the Literary Digest poll was that out of the 10 million people whose names were on the original mailing list, only 2.4 million responded to the survey

It was then a fact that individuals of higher educational and higher economic status were more likely to respond to mail questionnaires than those of lower economic and educational status.

Therefore, the non-response group--7.6 million people-- contained a high percentage of the lower economic status group. The 2.4 million people who responded to the questionnaire tended to be from a higher educational and economic status group.

A case study involving the Roosevelt/ Landon poll from the University of Pennsylvania's Wharton school describes the situation as follows:

"When the response rate is low (as it was in this case, 24%), a survey is said to suffer from non-response bias. This is a special type of selection bias where reluctant and non-responsive people are excluded from the sample."

Another Example of a Poorly Defined Statistical Universe: Neglecting the Inclusion of NonCustomers

Peter F. Drucker taught us the importance of studying both customers and non-customers. A noncustomer can be defined as somebody who should be a customer but is not.

Most companies are focused on existing customers. Endless surveys are conducted on what existing customers buy and how they buy. But – and this is a very big but – non-customers are neglected. Yet noncustomers always outnumber customers.

Today's new emphasis on big data and predictive analytics focuses on knowing as much as possible about one's existing customers – the area, perhaps, where exploratory data analysis and information technology is making the most rapid advances.

Said Drucker: " But the first sign of fundamental change rarely appear within one's own organization or among one's own customers. Almost always they show up first among one's noncustomers…

… In fact, the best…example of the importance of the non-customer is U.S. department stores… At their peak… department stores served 30% of the U.S. nonfood retail market… But… paid no attention to the 70% of the market who were not their customers… … They questioned their customers constantly, studied them, surveyed them…

[But failed to question noncustomers]…… They saw no reason why they should… Their theory of the business assumed that most people who could afford to shop in department stores did… Many years ago, that assumption fit reality…

… [When baby boomers came of age, the game changed]… For the dominant group among baby boomers – women in educated two-income families – it was not money that determine where to shop… Time was the primary factor, and this generation's women could not afford to spend their time shopping in department stores…

… Because department stores looked only at their own customers, they did not recognize this change until [it was almost too late]… By then, business was already drying up… And it was [very difficult] to get a significant share of this baby boomer market back…"

Still Another Example of Failing to Include Noncustomers: Sony's Colossal E-Reader Market Blunder

Ina recent Harvard business review article (Red Ocean Traps, March 2015) W. Chan Kim and Renêe Mauborn provided this extremely insightful example about (what we would call) an incomplete definition of the statistical universe: I

"Consider Sony's launch of the Portable Reader System (PRS) in 2006. The company's aim was to unlock a new market space in books by opening the e-reader market to a wide customer base…

… To figure out how to realize that goal, it looked to the experience of existing e-reader customers, who were dissatisfied with the size and poor display quality of current products…

… Sony's response was a thin, lightweight device with an easy-to-read screen..."

Granted, the media praised Sony's PRS and customers were delighted beyond expectations. But the PSR suffered the ultimate defeat on the corporate battlefield – they failed to convert indifferent buyers (noncustomers) into solid paying customers

Why? They sampled a poorly defined (i.e., incomplete) statistical universe. They neglected to drill down on the reasons noncustomers initially rejected e- readers.

Bottom line: Noncustomers were not purchasing e- readers because there was a shortage of worthwhile books.

By sampling already existing customers of e- books, Sony misdirected their efforts by focusing primarily on improving the user device experience of e- readers.

Stated differently, noncustomers wanted a rich choice of titles and a quick and easy way to download them. That was the unfulfilled need that had to be addressed. Sony failed to meet that need.

The result? Noncustomers stuck to print books.

Amazon asked the right question: Who is the noncustomer, the person who does not buy an e-reader, even though he/she is (or might be) in the market? And can we find out why he/she is a noncustomer?

Said Kim and Mauborn: " Amazon... {answered this question} when it launched the Kindle in 2007, offering more than four times the number of e-titles available from the PRS making them easily downloadable over Wi-Fi…"

… Within six hours of their release, kindles sold out, as print book customers rapidly became e- reader customers as well."

Sony exited the e-reader market. Amazon's Kindle is growing by leaps and bounds. It now offers 2.5 million e-titles and is credited with growing the e- book market from a "mere 2% of total book buyers in 2008 to 28% in 2014.

Summary and Conclusions

It's a major error to take a sample from statistical universe which differs considerably from the "ideal" statistical universe about which you want to draw valid conclusions.

Considerable time must be spent in defining the ideal statistical universe and every attempt must be made to draw a representative sample from that universe.

Every day we hear about making, say, pricing decisions (to attract more customers) based on sample information derived from pollingexisting customers.

The statistical universe in many of these cases should/must also include noncustomers, those people who do not buy the company's product even though they are (or might be) in the market.

Existing customers are already buying the product. Would noncustomers, buy the product if the price were lowered? If the existing customers buy the product, but think the price is too high, what does that really mean?

Always be on the lookout for a poorly defined statistical universe. Fancy calculations relating to sampling errors, interval estimates, and statistical tests are meaningless if the statistical universe from which conclusions are drawn is incorrect.

Finally, beware of non-response bias. Said the Wharton school case study: "People who respond to surveys are different from people who don't, not only in the obvious way (their attitude toward surveys) but also in more subtle and significant ways."

Topics: Lean Six Sigma Process Excellence