Question

In: Statistics and Probability

Regression analysis is an important statistical method for the analysis of business data. It enables the...

Regression analysis is an important statistical method for the analysis of business data. It enables the identification and characterization of relationships among factors and enables the identification of areas of significance.

The performance and interpretation of linear regression analysis are subject to a variety of pitfalls. Comment on what these pitfalls may be and how you would avoid them. Use an example if it helps to clarify the point.

Solutions

Expert Solution

Pitfalls and solutions for regression analysis

We often have information on two numeric characteristics for each member of a group and believe that these are related to each other – i.e. values of one characteristic vary depending on the values of the other. For instance, in a recent study, researchers had data on body mass index (BMI) and mid-upper arm circumference (MUAC) on 1373 hospitalized patients, and they decided to determine whether there was a relationship between BMI and MUAC.[1] In such a situation, as we discussed in a recent piece on “Correlation” in this series,[2] the researchers would plot the data on a scatter diagram. If the dots fall roughly along a straight line, sloping either upwards or downwards, they would conclude that a relationship exists. As a next step, they may be tempted to ask whether, knowing the value of one variable (MUAC), it is possible to predict the value of the other variable (BMI) in the study group. This can be done using “simple linear regression” analysis, also sometimes referred to as “linear regression.” The variable whose value is known (MUAC here) is referred to as the independent (or predictor or explanatory) variable, and the variable whose value is being predicted (BMI here) is referred to as the dependent (or outcome or response) variable. The independent and dependent variables are, by convention, referred to as “x” and “y” and are plotted on horizontal and vertical axes, respectively.

At times, one is interested in predicting the value of a numerical response variable based on the values of more than one numeric predictors. For instance, one study found that whole-body fat content in men could be predicted using information on thigh circumference, triceps and thigh skinfold thickness, biceps muscle thickness, weight, and height.[3] This is done using “multiple linear regression.” We will not discuss this more complex form of regression.

Although the concepts of “correlation” and “linear regression” are somewhat related and share some assumptions, these also have some important differences, as we discuss later in this piece.

ASSUMPTIONS

Regression analysis makes several assumptions, which are quite akin to those for correlation analysis, as we discussed in a recent issue of the journal.[1] To recapitulate, first, the relationship between x and y should be linear. Second, all the observations in a sample must be independent of each other; thus, this method should not be used if the data include more than one observation on any individual. Furthermore, the data must not include one or a few extreme values since these may create a false sense of relationship in the data even when none exists. If these assumptions are not met, the results of linear regression analysis may be misleading.

CORRELATION VERSUS REGRESSION

Correlation and regression analyses are similar in that these assess the linear relationship between two quantitative variables. However, these look at different aspects of this relationship. Simple linear regression (i.e., its coefficient or “b”) predicts the nature of the association – it provides a means of predicting the value of dependent variable using the value of predictor variable. It indicates how much and in which direction the dependent variable changes on average for a unit increase in the latter. By contrast, correlation (i.e., correlation coefficient or “r”) provides a measure of the strength of linear association – a measure of how closely the individual data points lie on the regression line. The values of “b” and “r” always carry the same sign – either both are positive or both are negative. However, their magnitudes can vary widely. For the same value of “b,” the magnitude of “r” can vary from 1.0 to close to 0.

ADDITIONAL CONSIDERATIONS

Some points must be kept in mind when interpreting the results of regression analysis. The absolute value of regression coefficient (“b”) depends on the units used to measure the two variables. For instance, in a linear regression equation of BMI (independent) versus MUAC (dependent), the value of “b” will be 2.54-fold higher if the MUAC is expressed in inches instead of in centimeters (1 inch = 2.54 cm); alternatively, if the MUAC is expressed in millimeters, the regression coefficient will become one-tenth of the original value (1 mm = 1/10 cm). A change in the unit of “y” will also lead to a change in the value of the regression coefficient. This must be kept in mind when interpreting the absolute value of a regression coefficient.

Similarly, the value of “intercept” also depends on the unit used to measure the dependent variable. Another important point to remember about the “intercept” is that its value may not be biologically or clinically interpretable. For instance, in the MUAC-BMI example above, the intercept was −0.042, a negative value for BMI which is clearly implausible. This happens when, in real-life, the value of independent variable cannot be 0 as was the case for the MUAC-BMI example above (think of MUAC = 0; it simply cannot occur in real-life).

Furthermore, a regression equation should be used for prediction only for those values of the independent variable that lie within in the range of the latter's values in the data originally used to develop the regression equation.


Related Solutions

Predictive analytics in business is an important application of multiple regression analysis. Generally speaking, what is...
Predictive analytics in business is an important application of multiple regression analysis. Generally speaking, what is meant by predictive analytics? As a business owner, how could you use regression analysis and predictive analytics to increase your company's sales?
Predictive analytics in business is an important application of multiple regression analysis. Generally speaking, what is...
Predictive analytics in business is an important application of multiple regression analysis. Generally speaking, what is meant by predictive analytics? As a business owner, how could you use regression analysis and predictive analytics to increase your company's sales?
1- Regression analysis can be described as ________. A. a statistical hypothesis test in which the...
1- Regression analysis can be described as ________. A. a statistical hypothesis test in which the test statistic follows a Student's t-distribution if the null hypothesis is supported B. a collection of statistical models in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation C. a statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true D. a tool...
In business management, how important is it to learn and use statistics and data analysis to...
In business management, how important is it to learn and use statistics and data analysis to analyze trends, patterns, and relationships for making data-driven managerial decisions?
An important application of regression analysis in accounting is cost estimation. By developing an estimated regression...
An important application of regression analysis in accounting is cost estimation. By developing an estimated regression equation relating volume and cost, an analyst can estimate the cost associated with a particular manufacturing volume. Consider the following sample production volumes and total cost data. Production Volume (units) Total Cost ($) 400 6590 450 8235 550 8895 600 9720 700 10,540 750 11,530 a. Use these data to develop an estimated regression equation that could be used to predict the total cost...
for stat students, model ( linear regression, multiple regression,factorial experiments,liner model) for each statistical method ,...
for stat students, model ( linear regression, multiple regression,factorial experiments,liner model) for each statistical method , why is the underlying statistical model important ? more than 4 reasons. please explain in clear way , i will discuss that with my class . Thx
7. Conduct the appropriate statistical test that will answer your hypothesis - Such as regression analysis,...
7. Conduct the appropriate statistical test that will answer your hypothesis - Such as regression analysis, single t-test, independent t-test, cross-tabulations, Chi-square, or One-Way ANOVA. Explain your justification for using the test based on the type of data and the level of measurement Living arrangement pf Seniors Sense of isolation Housing development Integrated Neighborhood Totals Low 80 30 110 High 20 120 140 Totals                     100                        150 250
Regression Analysis is a statistical technique to which we correlate (or attempt to correlate) a relationship...
Regression Analysis is a statistical technique to which we correlate (or attempt to correlate) a relationship between 2 variables that are predicated on an interval level of measurement. In essence, we are using Regression to Predict the strength of a relationship between 2 variables that may or may not be related. The Regression is measured by a Correlation Coefficient. What is a Correlation Coefficient?
Q1a Discuss the techniques of demand forecasting based on statistical (regression) method. Q1b If a monopolist...
Q1a Discuss the techniques of demand forecasting based on statistical (regression) method. Q1b If a monopolist supplies goods at a price; P=170-Q, with marginal cost; MC = 52. Find the quantity and price.
for stat students, model ( linear regression, multiple regression,factorial experiments,liner model) For one statistical method, give...
for stat students, model ( linear regression, multiple regression,factorial experiments,liner model) For one statistical method, give at least three reasons why the underlying statistical model is important. three reasons for each one
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT