Question

In: Statistics and Probability

Illustrate why outliers are so important to identify in the correlation/regression context.

Illustrate why outliers are so important to identify in the correlation/regression context.

Solutions

Expert Solution

Hello,

Regression analysis is a statistical technique for analysing and modelling the relationship between dependent variable and one or more independent variables. This technique uses the mathematical equation to establish the relationship between variables. It is a predictive modelling technique used for forecasting and to find casual effect relationship between the variables.

The equation of a straight line relating these two variables is given by y=a+bx..

The difference between the observed value of y and the fitted straight line is a statistical error ε. It is a random variable that accounts for the failure of the model to fit the data exactly.

The major assumptions of the regression analysis are as follows: [4].

i. The relationship between the response y and the regressor’s x is linear, at least approximately.

ii. The error term ε has zero mean.

iii. The error term ε has constant variance σ2 .

iv. The errors are uncorrelated. v. The errors are normally distributed

Outliers

Data points that diverge in a big way from the overall pattern are called outliers. There are four ways that a data point might be considered an outlier.

  • It could have an extreme X value compared to other data points.
  • It could have an extreme Y value compared to other data points.
  • It could have extreme X and Y values.
  • It might be distant from the rest of the data, even without extreme X or Y values.

Each type of outlier is depicted graphically in the scatterplots below.

Influential Points

An influential point is an outlier that greatly affects the slope of the regression line. One way to test the influence of an outlier is to compute the regression equation with and without the outlier.

This type of analysis is illustrated below. The scatterplots are identical, except that one plot includes an outlier. When the outlier is present, the slope is flatter (-4.10 vs. -3.32); so this outlier would be considered an influential point.

The charts below compare regression statistics for another data set with and without an outlier. Here, one chart has a single outlier, located at the high end of the X axis (where x = 24). As a result of that single outlier, the slope of the regression line changes greatly, from -2.5 to -1.6; so the outlier would be considered an influential point.

Sometimes, an influential point will cause the coefficient of determination to be bigger; sometimes, smaller. In the first example above, the coefficient of determination is smaller when the influential point is present (0.94 vs. 0.55). In the second example, it is bigger (0.46 vs. 0.52).

If your data set includes an influential point, here are some things to consider.

  • An influential point may represent bad data, possibly the result of measurement error. If possible, check the validity of the data point.
  • Compare the decisions that would be made based on regression equations defined with and without the influential point. If the equations lead to contrary decisions

Related Solutions

Discussion Prompt 2: Correlation and Regression Correlation and regression are two important terms in statistics. Select...
Discussion Prompt 2: Correlation and Regression Correlation and regression are two important terms in statistics. Select an area that interests you and use it to answer the following: Explain the difference between correlation and regression using examples Explain the different types of regression using examples
What is identify theft and why is it so important to understand it? How does identify...
What is identify theft and why is it so important to understand it? How does identify theft impact businesses and individuals.
Using examples to illustrate your answer, explain why it is so important for financial services organisations...
Using examples to illustrate your answer, explain why it is so important for financial services organisations to consider the influences of the macro-marketing environment in developing their marketing strategies
Using examples to illustrate your answer, explain why it is so important for financial services organisations...
Using examples to illustrate your answer, explain why it is so important for financial services organisations to consider the influences of the macro-marketing environment in developing their marketing strategies.
Why is business continuity planning so important? Identify several reasons why testing the plan is a...
Why is business continuity planning so important? Identify several reasons why testing the plan is a good idea.
Decide by taking at least 12 data with simple correlation and regression analysis so that there...
Decide by taking at least 12 data with simple correlation and regression analysis so that there is a relationship between the two variables. When X = 8 Y =? Guess
5. Why is planning so important? If this is so important why do businesses, choose to...
5. Why is planning so important? If this is so important why do businesses, choose to neglect planning? ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ 6. What is a competitive edge? Why must businesses have them? What if they don’t? ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ 7. What is marketing research? How can it benefit a business? What all does it involve? How do you develop a strategy? Why have a target market? ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
a. What is data transformation in the context of linear regression and why it is needed?...
a. What is data transformation in the context of linear regression and why it is needed? b. Please list different transformation techniques with a brief explanation for each.
Give three ways to check for outliers in a regression analysis.
Give three ways to check for outliers in a regression analysis.
Please explain what the assumption of Cov(x,u)=0 means and why it is so important for regression...
Please explain what the assumption of Cov(x,u)=0 means and why it is so important for regression to return unbiased estimates of the relationship of interest.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT