Question

In: Statistics and Probability

On this worksheet, make an XY scatter plot linked to the following data:1.01,2.8482, 1.48, 4.2772, 1.8,...

On this worksheet, make an XY scatter plot linked to the following data:1.01,2.8482, 1.48, 4.2772, 1.8, 4.788, 1.81, 5.3757, 1.07, 2.5252, 1.53, 3.0906, 1.46, 4.3362, 1.38, 3.2016, 1.77, 4.3542, 1.88, 4.8692, 1.32, 3.8676, 1.75, 3.9375, 1.94, 5.7424, 1.19, 2.4752, 1.31, 26.2, 1.56, 4.5708, 1.16, 2.842, 1.22, 2.44, 1.72, 5.1256, 1.45, 4.3355, 1.43, 4.2471, 1.19, 3.5343, 2, 5.46, 1.6, 3.84, 1.58, 3.8552 Add trendline, regression equation and r squared to the plot.Add this title. ("Scatterplot of X and Y Data"). The scatterplot reveals a point outside the point pattern. Copy the data to a new location in the worksheet. You now have 2 sets of data. Data that are more tha 1.5 IQR below Q1 or more than 1.5 IQR above Q3 are considered outliers and must be investigated. It was determined that the outlying point resulted from data entry error. Remove the outlier in the copy of the data. Make a new scatterplot linked to the cleaned data without the outlier, and add title ("Scatterplot without Outlier,") trendline, and regression equation label. Compare the regression equations of the two plots. How did removal of the outlier affect the slope and R2?

Solutions

Expert Solution

First of all select the whole data and then go to 'Insert' tab. Then look for 'Charts' section and from the dropdown select scatter plot as shown -

This would give you the scatter plot.

Then click on the plot and then find the '+' sign on the right side to add extra features to your graph. Mark the 'Axis Titles' and then look for 'Trendline'. You will find 'More Options' in trendline as shown -

When you click on this, a panel will open up on the right hand side titled 'Format Trendline'. Look at the bottom to find 'Display Equation on Chart' and ' Display R-squared value on chart' option. Mark both the options as shown -

Then edit the axis titles as 'X' and 'Y', add a title to the scatter plot by simply clicking on the text part of those parts. Finally you should get the following plot -

_________________________________

We can clearly see that there is a point lying far away from the cluster of points. That point is (1.31, 26.2).

So, we remove the data to get new data as -

X Y
1.01 2.8482
1.48 4.2772
1.8 4.788
1.81 5.3757
1.07 2.5252
1.53 3.0906
1.46 4.3362
1.38 3.2016
1.77 4.3542
1.88 4.8692
1.32 3.8676
1.75 3.9375
1.94 5.7424
1.19 2.4752
1.56 4.5708
1.16 2.842
1.22 2.44
1.72 5.1256
1.45 4.3355
1.43 4.2471
1.19 3.5343
2 5.46
1.6 3.84
1.58 3.8552

Then follow the same steps as earlier to get the following scatter plot -

We note that the equation of regression line was: Y = 0.6346(X) + 3.9309

And the new equation of regression line without outlier is: Y = 2.9907(X) - 0.526.

We see that the slope coefficient has changed drastically after removing the outlier. It has become approximately 5 times larger than the slope of data with outlier.

Also, the R-square value before was 0.0015 while the new R-square value is 0.7536 which is a very very huge change. As the R-square value tells us about the amount of variation included in the model as explained by the set of independent variables, so the model with outlier suggests that we were only able to include 0.15% of variation in Y in that model while if we remove the outlier, we can say that the new model includes about 75.36% of variation in Y as explained by X.

Thus, a single outlier can affect the whole model drastically.

_________________________________________________________


Related Solutions

On this worksheet, make an XY scatter plot linked to the following data: X 22 48...
On this worksheet, make an XY scatter plot linked to the following data: X 22 48 37 30 24 10 42 30 41 29 16 36 45 11 31 26 31 33 46 22 13 22 32 49 35 Y 3872 9312 5217 4230 4536 1820 8274 121 6314 3828 2448 6156 7515 1309 3534 4576 5797 4983 6670 2464 2197 3278 5408 7497 5705 Add trendline, regression equation and r squared to the plot. Add this title. ("Scatterplot of...
On this worksheet, make an XY scatter plot linked to the following data: X Y 92...
On this worksheet, make an XY scatter plot linked to the following data: X Y 92 22 87 23 102 23 80 25 91 27 100 20 95 21 109 19 77 28 100 221 98 25 89 27 97 23 93 22 89 27 91 22 97 21 105 21 88 22 83 24 86 27 89 26 79 30 88 22 94 24 18) Add trendline, regression equation and r squared to the plot. Add this title. ("Scatterplot...
17) On this worksheet, make an XY scatter plot linked to the following data: X Y...
17) On this worksheet, make an XY scatter plot linked to the following data: X Y 92 22 87 23 102 23 80 25 91 27 100 20 95 21 109 19 77 28 100 221 98 25 89 27 97 23 93 22 89 27 91 22 97 21 105 21 88 22 83 24 86 27 89 26 79 30 88 22 94 24 18) Add trendline, regression equation and r squared to the plot. Add this title....
For the following data​ (a) display the data in a scatter​ plot, (b) calculate the correlation...
For the following data​ (a) display the data in a scatter​ plot, (b) calculate the correlation coefficient​ r, and​ (c) make a conclusion about the type of correlation. The ages​ (in years) of 6 children and the number of words in their vocabulary ​ Age, x 1 2 3 4 5 6 Vocabulary​ size, y 150 1100 1150 1800 2050 2700 A] The correlation coefficient r is
Below are four bivariate data sets and the scatter plot for each. (Note that each scatter...
Below are four bivariate data sets and the scatter plot for each. (Note that each scatter plot is displayed on the same scale.) Each data set is made up of sample values drawn from a population. x y 1.0 10.0 2.0 9.0 3.0 8.0 4.0 7.0 5.0 6.0 6.0 5.0 7.0 4.0 8.0 3.0 9.0 2.0 10.0 1.0 x 1 2 3 4 5 6 7 8 9 10 11 y 1 2 3 4 5 6 7 8 9...
1.) Sketch a scatter plot from the following data, and determine the equation of the regression...
1.) Sketch a scatter plot from the following data, and determine the equation of the regression line. x 125 119 103 91 50 29 24 y 2.)Investment analysts generally believe the interest rate on bonds is inversely related to the prime interest rate for loans; that is, bonds perform well when lending rates are down and perform poorly when interest rates are up. Can the bond rate be predicted by the prime interest rate? Use the following data to construct...
a. Construct a scatter plot of the data. Determine the order of the polynomial that is represented by this data.
  Consider the following data: x 1 4 5 7 8 12 11 14 19 20 y 1 54 125 324 512 5,530 5,331 5,740 7,058 7,945 Use Excel to resolve: a. Construct a scatter plot of the data. Determine the order of the polynomial that is represented by this data. b. Obtain an estimate of the model identified in part a. c. Conduct a test of hypothesis to determine if a third- order, as opposed to a first-order, polynomial...
Find the equation of the regression line for the given data. Then construct a scatter plot...
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line. (The pair of variables have a significant correlation.) Then use the regression equation to predict the value of y for each of the given x-values, if meaningful. The table below shows the heights (in feet) and the number of stories of six notable buildings in a city. Height : 772, 628, 518, 508, 496, 483, y:...
Find the equation of the regression line for the given data. Then construct a scatter plot...
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line.​ (The pair of variables have a significant​ correlation.) Then use the regression equation to predict the value of y for each of the given​ x-values, if meaningful. The table below shows the heights​ (in feet) and the number of stories of six notable buildings in a city. Height comma x 762 621 515 508 491 480...
Find the equation of the regression line for the given data. Then construct a scatter plot...
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line.​ (The pair of variables have a significant​ correlation.) Then use the regression equation to predict the value of y for each of the given​ x-values, if meaningful. The table below shows the heights​ (in feet) and the number of stories of six notable buildings in a city. Height comma xHeight, x 766766 620620 520520 508508 494494...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT