Question

In: Statistics and Probability

a. Develop a scatter plot with income as the dependent variable and age as the independent...

a. Develop a scatter plot with income as the dependent variable and age as the independent variable. Include the estimated regression equation and the coefficient of determination on your scatter plot. Briefly comment on the relationship between the two variables, and fully interpret the coefficient of determination. b. Using the Excel’s Regression Tool, develop the estimated regression equation to show how income (y annual income in $1000s) is related to the independent variables education (?_1level of education attained in number of years), age (?_2 ?? ?????), and gender (?_3 dummy variable, 1= female, 0 = male). Develop the dummy variable for the gender variable first. c. Test whether the coefficients obtained in part (b) are significant at 5%. What is your conclusion? [5 points] d. Fully interpret the meaning of the coefficient on gender, ?_3. e. Predict the annual income for a female aged 45 with 10 years of education. How much would the predicted income have changed for a male? f. Plot the standardized residuals against predicted income, ? ̂ from regression in part (b). Check for outliers and explain whether the residual plot supports the assumptions about Ɛ. What is your conclusion? Submit the graph to earn full points.

ATTENTION - COULD ONLY FIT HALF OF DATA. OTHER HALF IS POSTED BY ME AS ANOTHER QUESTION

GENDER EDUCATION X3 DUMMY AGE Income ($1000)
male 12 0 42 120
female 17 1 28 32.5
female 16 1 36 6.5
male 4 0 52 16.25
female 13 1 35 55
male 12 0 36 55
female 13 1 47 45
male 12 0 55 67.5
male 14 0 54 67.5
female 16 1 45 100
male 15 0 22 18.75
female 13 1 44 9
female 14 1 63 55
male 16 0 40 67.5
female 14 1 42 45
male 18 0 62 67.5
male 11 0 52 67.5
female 12 1 49 45
female 17 1 27 32.5
female 14 1 30 45
female 18 1 29 100
female 18 1 51 175
female 16 1 57 175
male 16 0 44 175
male 16 0 68 175

Solutions

Expert Solution

a) scatter plot is under

The relationship between the variables seems not to be linear as R square = 18.1% only. The amount of the variation which is explained by the model or regression equation is R-square.

b) The regression equation is
Income ($1000) Y = - 131.77 + 8.85 EDUCATION X1 - 19.21 DUMMY X3 + 2.00 AGE X2

The excel output is given below

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.639925889
R Square 0.409505143
Adjusted R Square 0.325148735
Standard Error 43.567122
Observations 25
ANOVA
df SS MS F p.value
Regression 3 27642.68849 9214.229498 4.854463961 0.010164928
Residual 21 39859.97651 1898.094119
Total 24 67502.665
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -131.7712464 59.02824031 -2.232342446 0.036613045 -254.5271917 -9.015301033 -254.5271917 -9.015301033
EDUCATION (X1) 8.848443265 3.115728165 2.839927875 0.009809024 2.368931861 15.32795467 2.368931861 15.32795467
DUMMY (X3 ) -19.21093781 18.90789047 -1.01602756 0.321180021 -58.53204845 20.11017283 -58.53204845 20.11017283
AGE (X2) 2.002108147 0.761463744 2.629288868 0.015675596 0.418557607 3.585658686 0.418557607 3.585658686

c) Since P.value obtained in F.test is 0.01016 is less then 0.05 hence we conclude that the model or regression coefficients are significant.

d) The coefficient on gender, ?_3 = -19.21, which means a unit change in gender will results -19.21 units change in the response variable Y.

e) The regression equation is
Income ($1000) Y = - 131.77 + 8.85 EDUCATION X1 - 19.21 DUMMY X3 + 2.00 AGE X2

Income ($1000) Y = - 131.77 + 8.85 *10 - 19.21*0 + 2.00 * 45 = 46.73

f)

Residual plot does not supports the assumptions about error term as the error terms are not randomly distributed in the above graph and are concentrated in the center of the residual plot. Further test of homoscedasticity is also not met.

No outlier has been detected from the data as given below through qq plot


Related Solutions

Which variable should be the dependent variable and which should be the independent variable? Why? Plot...
Which variable should be the dependent variable and which should be the independent variable? Why? Plot the points on the scatterplot graph. NOTE: In a scatterplot the x axis will always reflect 1, 2, 3… not the Quarters specifically. Label both axes with words. In order to analyze the data we will use the line of best fit. Answer with 1-3 complete sentences. Does the line seem to fit the data? Why? Calculate the following use excel trendline: y-intercept to...
The data below are for 30 people. The independent variable is “age” and the dependent variable...
The data below are for 30 people. The independent variable is “age” and the dependent variable is “systolic blood pressure.” Also, note that the variables are presented in the form of vectors that can be used in R. age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69) systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175) Using R, develop and show a scatterplot of systolic blood pressure (dependent variable) by age (independent variable), and calculate the correlation between these two variables. Assume that these data are “straight enough” to model using a linear regression line. Develop...
The data below are for 30 people. The independent variable is “age” and the dependent variable...
The data below are for 30 people. The independent variable is “age” and the dependent variable is “systolic blood pressure.” Also, note that the variables are presented in the form of vectors that can be used in R. age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69) systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175) Using R, develop and show a scatterplot of systolic blood pressure (dependent variable) by age (independent variable), and calculate the correlation between these two variables. Assume that these data are “straight enough” to model using a linear regression line. Develop...
1.) Use Excel to plot the dependent vs the independent variable. Show the regression equation from...
1.) Use Excel to plot the dependent vs the independent variable. Show the regression equation from the computer output. Lannie Karner- GPA 3.6 Income 75k Courtney Sheperd Gpa 3.3 Income 74K Zenobia Roussel- GPA 2.9 Income 66K Elaine Doody- GPA 3.8 Income 80k Maudie Hocker-GPA 3.1 Income 65k Rick Hoover-GPA 3.2 Income 53k Franinca Ortez-GPA 2.7 Income 65k Li Kinder-GPA 3.3 Income 71k Brad Clem-GPA 3.8 Income 80k Soon Nettleton-GPA 4.0 Income 95k Vertie Yousesef-GPA 3.9 Income 110k Love Au-GPA...
Having data of two independent variables and a dependent variable, how do I plot a linear...
Having data of two independent variables and a dependent variable, how do I plot a linear graph (trendline fit)? (Linear equation) Possible to use excel.
Identify the independent and dependent variable.
A researcher is studying relationship between speed of cycling and the presence of people. Formulate a relevant hypothesis and identify the independent and dependent variable.
What is the purpose of a scatter diagram (Scatter plot)? (Please provide Scatter plot examples)
What is the purpose of a scatter diagram (Scatter plot)? (Please provide Scatter plot examples)
15. In an experiment, an independent variable is _______ and a dependent variable is _______. Group...
15. In an experiment, an independent variable is _______ and a dependent variable is _______. Group of answer choices Manipulated, measured Measured, manipulated Discrete, summation Continuous, manipulated 16. Outliers are Group of answer choices The lowest and highest scores in a data set Extreme or unusual values All options present The lowest value in a data set 17. Assume that we have the following set of data:     Score                 11, 12, 17, 18, 19, 20, 21, 22, 23, 24,...
Identify the dependent variable and the independent variable (or variables) and comment on the validity of...
Identify the dependent variable and the independent variable (or variables) and comment on the validity of the implied causal relationship: A study that looks at students’ college GPAs and how they are affected by the students’ number of weekly study hours, high school GPA, and number of weekly hours of extra-curricular activities. A young pitcher wants to understand how she can improve her pitches. She collects data on the number of strikes thrown in a pitched game, the average number...
. Identify the independent variable, dependent variable and direction of the hypotheses. a)         The higher the...
. Identify the independent variable, dependent variable and direction of the hypotheses. a)         The higher the income, the less likely a person will vote Republican. b)         The more often a student attends class, the higher the student’s score on the final exam. c)         The smaller the automobile’s engine, the higher the automobile’s gas mileage. d)         If an undergraduate student visits an advisor for course scheduling, the more likely the             student will graduate within four years. e)         The more spinach...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT