Question

In: Statistics and Probability

CHAPTER 7: REGRESSION/PREDICITION Key Terms ---------------------------------------------------------------------------------------------------------------------------- Least squares prediction equation --- The equation that minimizes the...

CHAPTER 7: REGRESSION/PREDICITION

Key Terms

----------------------------------------------------------------------------------------------------------------------------

Least squares prediction equation --- The equation that minimizes the total of all squared prediction errors for known Y scores in the original correlation analysis.

Standard error of prediction --- A rough measure of the average amount of predictive error

Squared correlation coefficient --- The proportion of the total variance in one variable that is predictable from its relationship with the other variable

Variance interpretation of r²---The proportion of variance explained by, or predictable from, the existing correlation

Text Review - FILL IN THE BLANKS

By “predicting” what is known in a correlation of two variables, we can predict what is _______. This is accomplished by placing a ________ line in such a way that it passes through the main cluster of dots in the scatterplot. Positive and negative errors avoided by squaring the difference between the predicted value and the actual value. Thus, the prediction line is referred to as the ______ ______________________ line or the ____________________________________ line.

The search for the least squares prediction line would be a frustrating trial and error process, except for the prediction of the least squares equation, ___________________. In this equation, Y' represents the ________________ value, and X represents the _________________ value. The other values in the equation must be computed. When the computation is complete, the equation has the very desirable property of minimizing the total of all squared predictive errors for known values of Y in the original correlation analysis. Note the Key Terms section that this essentially defines the least squares prediction equation.

Two limitations exist for the application of the least squares prediction equation. One is that predictions may not be reliable if extended beyond the maximum value of X in the original data. Second, since there is no proof of cause-effect in correlation, the desired effect simply may not occur.

Graphs may be constructed to depict the prediction equation. However, this should be done for __________________________ purposes and not for prediction. It is more accurate to make the actual prediction from the ___________________________________________________________.

The least squared prediction equation is designed to reduce error in prediction, but it does not eliminate it. Therefore, we must estimate the amount of error, understanding that the smaller the error, the more accurate our prediction. The estimated predictive error is expressed by the _______________

________________________________________________. This represents a rough measure of the average amount by which known Y values deviate from their predicted Y ' values.

The value of r is extremely important in relation to predictive error. When r = 1, the predictive error will be __________________________. The most accurate predictions can be made when r values represent _______________________ relationships, whether positive or negative. Prediction should not be attempted when r values are _______________, representing weak or nonexistent relationships.

There are some _____________________ that must be met in order to apply the concepts of prediction we have been discussing. One is that using the prediction equation requires the underlying relationship to be __________________. Therefore, if the scatterplot for an original correlation analysis is _______________________, this procedure would not be appropriate. A second assumption is that the dots in the original scatterplot will be dispersed equally about all segments of the prediction line. This is known as ________________________. The third assumption is that for any given value of X, the corresponding distribution of Y values is _______________________ distributed. The final assumption is that the original data set of paired observations must be fairly large, usually in the hundreds.

The square of the _____________________________________, r², indicates the proportion of the total variance in one variable that is predictable from its relationship with other variable. In order to understand r² (the correlation coefficient squared), think for a moment about variability in a single distribution, which we studied in Chapter 5. In a single variable, we described variability with the measure of the standard deviation. We studied the variability graphically by looking at the shape of the frequency polygon. In a scatterplot, two variables are depicted graphically. Imagine a frequency polygon along the horizontal axis of the scatterplot. This shows the shape of the distribution for variable X. Imagine also a frequency polygon along the vertical axis of the scatterplot. This shows the shape of the distribution for variable Y. As we examine the relationship between the two variables, we know that some of the variability in Y must be due to the variability in X.

Let’s substitute real data and think this through. Variable X represents SAT scores. Variable Y represents college GPA. These variables constitute a strong positive relationship, approximately .57. There must be many reasons why GPA in college would vary. Some of that variability is probably due to preparation for college as reflected by SAT scores. The remaining variability might be due to such factors as whether the student works, how the adjustment is made to living away from home, and how many hours per week are devoted to studying or partying. The actual amount of variability in college GPA, variable Y, which can be explained by the variability in X, SAT scores, is reflected by the value of r². Thus we compute .57 squared = .3249, or .32. We interpret this value by saying that 32 percent of the variability of Y can be explained by the variability in X. The remaining 68% of the variability of Y would be explained by a combination of many other factors or variables, probably some of those mentioned earlier.

The value of r² supplies a direct measure of the ____________________ of the relationship.

Expert Solution

Sol:

Least squares prediction equation " Outcome variable or Respondent variable " The equation that minimizes the total of all squared prediction errors for known Y scores in the original correlation analysis.

Standard error of prediction " Regression line " A rough measure of the average amount of predictive error

Squared correlation coefficient " Best fir line or regression line " The proportion of the total variance in one variable that is predictable from its relationship with the other variable

Variance interpretation of r² " Trend line (y= a+ b x) " The proportion of variance explained by, or predictable from, the existing correlation

Text Review - FILL IN THE BLANKS

By “predicting” what is known in a correlation of two variables, we can predict what is " y represents dependent variable " This is accomplished by placing a " x represents independent variable " line in such a way that it passes through the main cluster of dots in the scatterplot. Positive and negative errors avoided by squaring the difference between the predicted value and the actual value. Thus, the prediction line is referred to as the

" Graphing " line or the " Research " line.

The search for the least squares prediction line would be a frustrating trial and error process, except for the prediction of the least squares equation, " Regression line " In this equation, Y' represents the" Mean absolute percentage error " value, and X represents the " Prediction error is 0 " value. The other values in the equation must be computed. When the computation is complete, the equation has the very desirable property of minimizing the total of all squared predictive errors for known values of Y in the original correlation analysis. Note the Key Terms section that this essentially defines the least squares prediction equation.

Two limitations exist for the application of the least squares prediction equation. One is that predictions may not be reliable if extended beyond the maximum value of X in the original data. Second, since there is no proof of cause-effect in correlation, the desired effect simply may not occur.

Graphs may be constructed to depict the prediction equation. However, this should be done for " r value > 0.7 " purposes and not for prediction. It is more accurate to make the actual prediction from the "r value near zero ".

The least squared prediction equation is designed to reduce error in prediction, but it does not eliminate it. Therefore, we must estimate the amount of error, understanding that the smaller the error, the more accurate our prediction. The estimated predictive error is expressed by the " conditions " .

" linear " This represents a rough measure of the average amount by which known Y values deviate from their predicted Y ' values.

If you Satisfy with Answer, Please give me "Thumb Up". It is very useful for me.
Thank you for your support.

orchestra answered 1 year ago

In simple linear regression analysis, the least squares regression line minimizes the sum of the squared...

In simple linear regression analysis, the least squares regression line minimizes the sum of the squared differences between actual and predicted y values. True False

Q1. Write down the equation of the regression straight line (the least-squares line)

Q1. Write down the equation of the regression straight line (the least-squares line) Q2. For an increase of 1 mg of fertiliser applied, what is the average change in the wet weight of maize plants? Q3. How are the two variables associated with each other? (Answer in 1 or 2 sentences)Q4. Determine the average weight of plants grown with 100mg of fertiliser applied. (round up your answer to 2 decimal places)Q5. Determine the average weight of plants grown with...

Find the equation of the least-squares regression line ŷ and the linear correlation coefficient r for...

Find the equation of the least-squares regression line ŷ and the linear correlation coefficient r for the given data. Round the constants, a, b, and r, to the nearest hundredth. {(0, 10.8), (3, 11.3), (5, 11.2), (−4, 10.7), (1, 9.3)}

3) Derive the matrix equation used to solve for the coefficients for least-squares polynomial regression for...

3) Derive the matrix equation used to solve for the coefficients for least-squares polynomial regression for a quadratic model. 4) Derive the matrix equation used to solve for the coefficients for least-squares multiple linear regression for a function of 2 variables.

Find an equation for the least-squares regression line for the following data. Round answers to 3...

Find an equation for the least-squares regression line for the following data. Round answers to 3 decimal places (i.e. y = 1.234x -0.123) Advertising Expenses in 1000's of $ (x): 2.4, 1.6, 2, 2.6, 1.4, 1.6, 2, 2.2 Company Sales in 1000's of $ (y ):225, 184, 220, 240, 180, 184, 186, 215 y= ? x+ ? What would the company sales be if $2500 is spent on advertising?

what is the regression technique of ordinary least squares?

What are Least Squares Assumptions for simple linear regression? For each least squares assumption, provide an...

What are Least Squares Assumptions for simple linear regression? For each least squares assumption, provide an example in which the assumption is valid, then provide an example in which the assumption fails.

For the dataset describing year, US Return, and Overseas Return 1. Find the least-squares regression equation...

For the dataset describing year, US Return, and Overseas Return 1. Find the least-squares regression equation of overseas returns on U.S. returns. 2. In 1997, the return on U.S. stocks was 33.4%. Use the regression line to predict the return on overseas stocks. (You may either calculate this by hand or use SAS output.) The actual overseas return was 2.1%. Are you confident that predictions using the regression line will be quite accurate? Why? DATA 1971 29.6 14.6 1972 36.3...

Exercise 5A-2 Least-Squares Regression [LO5-11]

Exercise 5A-2 Least-Squares Regression [LO5-11] Bargain Rental Car offers rental cars in an off-airport location near a major tourist destination in California. Management would like to better understand the variable and fixed portions of it car washing costs. The company operates its own car wash facility in which each rental car that is returned is thoroughly cleaned before being released for rental to another customer. Management believes that the variable portion of its car washing costs relates to the...

A simple linear least squares regression of the heights (in feet) of a building on the...

A simple linear least squares regression of the heights (in feet) of a building on the number of stories in the building was performed using a random sample of 30 buildings. The associated ANOVA F statistic was 5.60. What is the P-value associated with this ANOVA F test? a.) greater than 0.10 b.) between 0.001 and 0.01 c.) between 0.01 and 0.025 d.) between 0.05 and 0.10 e.) between 0.025 and 0.05 f.) less than 0.001