Question

In: Statistics and Probability

On this worksheet, make an XY scatter plot linked to the following data: X 22 48...

On this worksheet, make an XY scatter plot linked to the following data:

X 22 48 37 30 24 10 42 30 41 29 16 36 45 11 31 26 31 33 46 22 13 22 32 49 35

Y 3872 9312 5217 4230 4536 1820 8274 121 6314 3828 2448 6156 7515 1309 3534 4576 5797 4983 6670 2464 2197 3278 5408 7497 5705

Add trendline, regression equation and r squared to the plot. Add this title. ("Scatterplot of X and Y Data") The scatterplot reveals a point outside the point pattern.

Copy the data to a new location in the worksheet. You now have 2 sets of data. Data that are more tha 1.5 IQR below Q1 or more than 1.5 IQR above Q3 are considered outliers and must be investigated.

It was determined that the outlying point resulted from data entry error. Remove the outlier in the copy of the data.

Make a new scatterplot linked to the cleaned data without the outlier, and add title ("Scatterplot without Outlier,") trendline, and regression equation label. Compare the regression equations of the two plots.

How did removal of the outlier affect the slope and R2?

Solutions

Expert Solution

This is a simple problem of visualization of a data with the help of a scatter plot and to appriciate the change in the quality of the data inference based on removal of outliers from it.

We shall start with a plot (scatter) of the raw data and measure the various attributes of the curve/straight line .

Then we take out the outliers and then try to see the improvement in the quality of the prediction of the curve.

Now that we have developed a scatter plot

let us make a scatter plot based on clean data.

This will be done as per the requirement of the question

We need to weed off all values lower than Q1-1.5IQR and higher than Q3+1.5IQR

IQR (inter quartile range) is given by Q3-Q1

I took the help of QUARTILE function in excel to find out the quartiles of X and Y as shown below.

Quartile X Y
Q1 22 3278
Q3 37 6156

IQR (X) =37-22=15

IQR(Y)=6156-3278=2878

Now for X

Q1-1.5IQR=22-1.5*15 =-0.50

Q3+1.5IQR =37+1.5*15=59.50

So we take all values between -0.50 to 59.50

For Y

Q1-1.5IQR=3278-1.5*2878 =-1039

Q3+1.5IQR =6156+1.5* 2878=10473

We take all values between -1039 to 10473

Clearly the X =30 ,Y =121 is the outlier and has to taken out.

We again make a new scatter plot as below

now compare the two plots.

clearly the R squared has moved up from 73.19% to 88.04 % meaning that the scatter best fit trend line can now explain more variablity between the actual and predicted value .Hence the straight line in second plot is a better fit .


Related Solutions

On this worksheet, make an XY scatter plot linked to the following data: X Y 92...
On this worksheet, make an XY scatter plot linked to the following data: X Y 92 22 87 23 102 23 80 25 91 27 100 20 95 21 109 19 77 28 100 221 98 25 89 27 97 23 93 22 89 27 91 22 97 21 105 21 88 22 83 24 86 27 89 26 79 30 88 22 94 24 18) Add trendline, regression equation and r squared to the plot. Add this title. ("Scatterplot...
17) On this worksheet, make an XY scatter plot linked to the following data: X Y...
17) On this worksheet, make an XY scatter plot linked to the following data: X Y 92 22 87 23 102 23 80 25 91 27 100 20 95 21 109 19 77 28 100 221 98 25 89 27 97 23 93 22 89 27 91 22 97 21 105 21 88 22 83 24 86 27 89 26 79 30 88 22 94 24 18) Add trendline, regression equation and r squared to the plot. Add this title....
On this worksheet, make an XY scatter plot linked to the following data:1.01,2.8482, 1.48, 4.2772, 1.8,...
On this worksheet, make an XY scatter plot linked to the following data:1.01,2.8482, 1.48, 4.2772, 1.8, 4.788, 1.81, 5.3757, 1.07, 2.5252, 1.53, 3.0906, 1.46, 4.3362, 1.38, 3.2016, 1.77, 4.3542, 1.88, 4.8692, 1.32, 3.8676, 1.75, 3.9375, 1.94, 5.7424, 1.19, 2.4752, 1.31, 26.2, 1.56, 4.5708, 1.16, 2.842, 1.22, 2.44, 1.72, 5.1256, 1.45, 4.3355, 1.43, 4.2471, 1.19, 3.5343, 2, 5.46, 1.6, 3.84, 1.58, 3.8552 Add trendline, regression equation and r squared to the plot.Add this title. ("Scatterplot of X and Y Data"). The...
For the following data​ (a) display the data in a scatter​ plot, (b) calculate the correlation...
For the following data​ (a) display the data in a scatter​ plot, (b) calculate the correlation coefficient​ r, and​ (c) make a conclusion about the type of correlation. The ages​ (in years) of 6 children and the number of words in their vocabulary ​ Age, x 1 2 3 4 5 6 Vocabulary​ size, y 150 1100 1150 1800 2050 2700 A] The correlation coefficient r is
Below are four bivariate data sets and the scatter plot for each. (Note that each scatter...
Below are four bivariate data sets and the scatter plot for each. (Note that each scatter plot is displayed on the same scale.) Each data set is made up of sample values drawn from a population. x y 1.0 10.0 2.0 9.0 3.0 8.0 4.0 7.0 5.0 6.0 6.0 5.0 7.0 4.0 8.0 3.0 9.0 2.0 10.0 1.0 x 1 2 3 4 5 6 7 8 9 10 11 y 1 2 3 4 5 6 7 8 9...
1.) Sketch a scatter plot from the following data, and determine the equation of the regression...
1.) Sketch a scatter plot from the following data, and determine the equation of the regression line. x 125 119 103 91 50 29 24 y 2.)Investment analysts generally believe the interest rate on bonds is inversely related to the prime interest rate for loans; that is, bonds perform well when lending rates are down and perform poorly when interest rates are up. Can the bond rate be predicted by the prime interest rate? Use the following data to construct...
a. Construct a scatter plot of the data. Determine the order of the polynomial that is represented by this data.
  Consider the following data: x 1 4 5 7 8 12 11 14 19 20 y 1 54 125 324 512 5,530 5,331 5,740 7,058 7,945 Use Excel to resolve: a. Construct a scatter plot of the data. Determine the order of the polynomial that is represented by this data. b. Obtain an estimate of the model identified in part a. c. Conduct a test of hypothesis to determine if a third- order, as opposed to a first-order, polynomial...
Find the equation of the regression line for the given data. Then construct a scatter plot...
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line. (The pair of variables have a significant correlation.) Then use the regression equation to predict the value of y for each of the given x-values, if meaningful. The table below shows the heights (in feet) and the number of stories of six notable buildings in a city. Height : 772, 628, 518, 508, 496, 483, y:...
Find the equation of the regression line for the given data. Then construct a scatter plot...
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line.​ (The pair of variables have a significant​ correlation.) Then use the regression equation to predict the value of y for each of the given​ x-values, if meaningful. The table below shows the heights​ (in feet) and the number of stories of six notable buildings in a city. Height comma x 762 621 515 508 491 480...
Find the equation of the regression line for the given data. Then construct a scatter plot...
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line.​ (The pair of variables have a significant​ correlation.) Then use the regression equation to predict the value of y for each of the given​ x-values, if meaningful. The table below shows the heights​ (in feet) and the number of stories of six notable buildings in a city. Height comma xHeight, x 766766 620620 520520 508508 494494...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT