In: Statistics and Probability
CHAPTER 7: REGRESSION/PREDICITION
Key Terms
----------------------------------------------------------------------------------------------------------------------------
Least squares prediction equation --- The equation that minimizes the total of all squared prediction errors for known Y scores in the original correlation analysis.
Standard error of prediction --- A rough measure of the average amount of predictive error
Squared correlation coefficient --- The proportion of the total variance in one variable that is predictable from its relationship with the other variable
Variance interpretation of r²---The proportion of variance explained by, or predictable from, the existing correlation
Text Review - FILL IN THE BLANKS
By “predicting” what is known in a correlation of two variables, we can predict what is _______. This is accomplished by placing a ________ line in such a way that it passes through the main cluster of dots in the scatterplot. Positive and negative errors avoided by squaring the difference between the predicted value and the actual value. Thus, the prediction line is referred to as the ______ ______________________ line or the ____________________________________ line.
The search for the least squares prediction line would be a frustrating trial and error process, except for the prediction of the least squares equation, ___________________. In this equation, Y' represents the ________________ value, and X represents the _________________ value. The other values in the equation must be computed. When the computation is complete, the equation has the very desirable property of minimizing the total of all squared predictive errors for known values of Y in the original correlation analysis. Note the Key Terms section that this essentially defines the least squares prediction equation.
Two limitations exist for the application of the least squares prediction equation. One is that predictions may not be reliable if extended beyond the maximum value of X in the original data. Second, since there is no proof of cause-effect in correlation, the desired effect simply may not occur.
Graphs may be constructed to depict the prediction equation. However, this should be done for __________________________ purposes and not for prediction. It is more accurate to make the actual prediction from the ___________________________________________________________.
The least squared prediction equation is designed to reduce error in prediction, but it does not eliminate it. Therefore, we must estimate the amount of error, understanding that the smaller the error, the more accurate our prediction. The estimated predictive error is expressed by the _______________
________________________________________________. This represents a rough measure of the average amount by which known Y values deviate from their predicted Y ' values.
The value of r is extremely important in relation to predictive error. When r = 1, the predictive error will be __________________________. The most accurate predictions can be made when r values represent _______________________ relationships, whether positive or negative. Prediction should not be attempted when r values are _______________, representing weak or nonexistent relationships.
There are some _____________________ that must be met in order to apply the concepts of prediction we have been discussing. One is that using the prediction equation requires the underlying relationship to be __________________. Therefore, if the scatterplot for an original correlation analysis is _______________________, this procedure would not be appropriate. A second assumption is that the dots in the original scatterplot will be dispersed equally about all segments of the prediction line. This is known as ________________________. The third assumption is that for any given value of X, the corresponding distribution of Y values is _______________________ distributed. The final assumption is that the original data set of paired observations must be fairly large, usually in the hundreds.
The square of the _____________________________________, r², indicates the proportion of the total variance in one variable that is predictable from its relationship with other variable. In order to understand r² (the correlation coefficient squared), think for a moment about variability in a single distribution, which we studied in Chapter 5. In a single variable, we described variability with the measure of the standard deviation. We studied the variability graphically by looking at the shape of the frequency polygon. In a scatterplot, two variables are depicted graphically. Imagine a frequency polygon along the horizontal axis of the scatterplot. This shows the shape of the distribution for variable X. Imagine also a frequency polygon along the vertical axis of the scatterplot. This shows the shape of the distribution for variable Y. As we examine the relationship between the two variables, we know that some of the variability in Y must be due to the variability in X.
Let’s substitute real data and think this through. Variable X represents SAT scores. Variable Y represents college GPA. These variables constitute a strong positive relationship, approximately .57. There must be many reasons why GPA in college would vary. Some of that variability is probably due to preparation for college as reflected by SAT scores. The remaining variability might be due to such factors as whether the student works, how the adjustment is made to living away from home, and how many hours per week are devoted to studying or partying. The actual amount of variability in college GPA, variable Y, which can be explained by the variability in X, SAT scores, is reflected by the value of r². Thus we compute .57 squared = .3249, or .32. We interpret this value by saying that 32 percent of the variability of Y can be explained by the variability in X. The remaining 68% of the variability of Y would be explained by a combination of many other factors or variables, probably some of those mentioned earlier.
The value of r² supplies a direct measure of the ____________________ of the relationship.
Sol:
Least squares prediction equation " Outcome variable or Respondent variable " The equation that minimizes the total of all squared prediction errors for known Y scores in the original correlation analysis.
Standard error of prediction " Regression line " A rough measure of the average amount of predictive error
Squared correlation coefficient " Best fir line or regression line " The proportion of the total variance in one variable that is predictable from its relationship with the other variable
Variance interpretation of r² " Trend line (y= a+ b x) " The proportion of variance explained by, or predictable from, the existing correlation
Text Review - FILL IN THE BLANKS
By “predicting” what is known in a correlation of two variables, we can predict what is " y represents dependent variable " This is accomplished by placing a " x represents independent variable " line in such a way that it passes through the main cluster of dots in the scatterplot. Positive and negative errors avoided by squaring the difference between the predicted value and the actual value. Thus, the prediction line is referred to as the
" Graphing " line or the " Research " line.
The search for the least squares prediction line would be a frustrating trial and error process, except for the prediction of the least squares equation, " Regression line " In this equation, Y' represents the" Mean absolute percentage error " value, and X represents the " Prediction error is 0 " value. The other values in the equation must be computed. When the computation is complete, the equation has the very desirable property of minimizing the total of all squared predictive errors for known values of Y in the original correlation analysis. Note the Key Terms section that this essentially defines the least squares prediction equation.
Two limitations exist for the application of the least squares prediction equation. One is that predictions may not be reliable if extended beyond the maximum value of X in the original data. Second, since there is no proof of cause-effect in correlation, the desired effect simply may not occur.
Graphs may be constructed to depict the prediction equation. However, this should be done for " r value > 0.7 " purposes and not for prediction. It is more accurate to make the actual prediction from the "r value near zero ".
The least squared prediction equation is designed to reduce error in prediction, but it does not eliminate it. Therefore, we must estimate the amount of error, understanding that the smaller the error, the more accurate our prediction. The estimated predictive error is expressed by the " conditions " .
" linear " This represents a rough measure of the average amount by which known Y values deviate from their predicted Y ' values.
If you Satisfy with Answer, Please give me "Thumb
Up". It is very useful for me.
Thank you for your support.