# Using Regression Analysis to Enhance Cause and Effect Analysis

Raymond Ernst
01/19/2010

Intuitively, problems or opportunities associated with a process, product or service are commonly known; however, proving and quantifying the problem or opportunity and moving the organization to change are many times difficult. Traditional Cause and Effect diagrams provide a good qualitative picture. Applying a simple regression analysis model using basic features of Excel can provide the quantitative data. Additionally, the results can be used in a cost benefit analysis to get a more definitive and convincing value proposition for those Six Sigma problems and projects.

A Regression Analysis Example

Table 1 contains sales data from a sample of repeat customers. Interactions include exposure to cross-selling opportunities and ads as measured by the website and Customer Relationship Management (CRM). Questions include: "Are sales correlated to interactions, and, if so, how strong is the correlation and how can it be used?" (Click on diagram to enlarge.)

Figure 1 provides a scatter diagram of the data showing a near linear relation. As interactions increase, sales go up. (Click on diagram to enlarge.)

Using the Excel "Data Analysis" add-in, the regression analysis in Table 2 is produced. "Multiple R," the coefficient of correlation, of .85 shows that there is a strong positive correlation of sales to interactions. [Note: This value goes from -1 perfect negative correlation to +1 a perfect positive correlation. A value of zero indicates no correlation.] The value "R Square" in Table 2 is the coefficient of determination that is the proportion of the total variation that is attributed to the independent variable, Interactions. That is, 73.9 percent of the variation in Sales is attributed to Interactions. This analysis shows that the relationship between Interactions and Sales is not perfect, although there is a strong correlation. (Click on diagram to enlarge.)

The Excel Data Analysis add-in can be used to generate the descriptive statistics for both sales and Interactions as presented in Table 3. The mean of the Interactions is 4.08 and the mean of Sales is \$34.58; that is, on average, there are four interactions and \$34.58 of sales per customer for the period. (Click on diagram to enlarge.)

Using the Regression Analysis Model for Projected Sales

Suppose by spending "x" dollars, the average number of interactions can be increased to 6. What would be the projected sales? Regression analysis can provide the values to calculate the sales. As shown in the scatter diagram (Figure 1), the relationship of sales to interactions is near linear. The regression analysis, as illustrated in Table 2, calculates a y-intercept of \$6.74 and the slope (interactions) at 6.82 (see highlighted values in Table 2). Using the standard formula for a straight line the projected value of sales can be calculated as follows:

 Projected Sales at 6 Interactions: Sales = slope Ø Interactions + Intercept or Sales = \$6.82 Ø 6.00 + \$6.74 = \$47.66

That is, by spending "x" dollars to raise the average interactions from 4 to 6 per customer, average sales should increase to from \$34.58 to \$47.66. This data can be used to generate the cost benefit analysis and return on investment.

This same model can be used in "what if" analysis. For example, suppose the interactions could be changed to 12 for select customers. What would be the forecast of sales? Assuming a linear relationship, the projected sales would be \$88.58.

 Projected Sales at 12 Interactions: Sales = slope Ø Interactions + Intercept or Sales = \$6.82 Ø 12.00 + \$6.74 = \$88.58

The assumption on the near-linearity needs to be scrutinized as linearity can change with plus or minus extensions of the range; that is, the trend could turn up, down, or flatten.

Conclusion: Regression Analysis Will Benefit Your Six Sigma Analysis

Causal relationships are important aspects of Six Sigma analysis and business improvements. In many processes, the output or effect (dependent variable) can be measured and plotted for each input or cause (independent variable) as in Figure 1. If the plot shows that the relationship is near linear, regression analysis can be used to quantify the relationship of the variables and to project the effect of changes as in the example presented. Cases involving multiple different causes (for example, Web, radio, TV) to an effect (for example, sales) can be modeled using a multiple regression analysis process that is also supported in Excel.