In: Statistics and Probability
You are conducting a study to determine if there is a relationship between annual household income and a high school student’s GPA. The school district you are studying is diverse and lower income. a) Before you conduct the study, do you expect there to be an association between these two variables? Why or why not? Which should be the explanatory variable? b) You collect data from a random sample of 15 students. The first row of the table is household income of a particular student (in thousands of dollars) and the second row is the GPA of that particular student. 42 30 82 19 29 44 90 55 17 62 51 30 9 39 42 3.1 2.6 3.8 2.7 2.3 3.5 3.8 3.2 2.4 3.3 3.1 2.8 1.6 3.4 3.2 c) Does the data have a scatterplot that shows a linear association? What is the correlation coefficient? What does it tell you about the association between these two variables? d) Use the above data to make a linear (regression) model. e) Use the model to predict the GPA of a high-schooler that comes from a family that has a household income of $48,000. f) How accurate is the model’s prediction of GPA for the family that makes $44,000? g) If a family’s income increases by $10,000, what is the amount of change in a student’s GPA, as predicted by the model? h) Statisticians often state “correlation is not necessarily causation.” Would it be correct to conclude that household income is “causing” GPA? Is it possible that there are other variables that are “lurking,” causing GPA and household income to be correlated? What might these variables be?
Answer(a):
Yes, we expect there to be an association between these two variables because if a household has high annual income, the student may get better resources for study (i.e. better school, better facilities of learning etc.) which will definitely affect the high school’s GPA of student.
In this study the Annual household income is independent variable(x) and student’s high school GPA is dependent variable(y).
Answer(b):
SN |
Household income(x) |
GPA(y) |
|||||
1 |
42 |
3.1 |
-0.733 |
0.113 |
-0.083 |
0.538 |
0.013 |
2 |
30 |
2.6 |
-12.733 |
-0.387 |
4.924 |
162.138 |
0.150 |
3 |
82 |
3.8 |
39.267 |
0.813 |
31.937 |
1541.871 |
0.662 |
4 |
19 |
2.7 |
-23.733 |
-0.287 |
6.804 |
563.271 |
0.082 |
5 |
29 |
2.3 |
-13.733 |
-0.687 |
9.430 |
188.604 |
0.472 |
6 |
44 |
3.5 |
1.267 |
0.513 |
0.650 |
1.604 |
0.264 |
7 |
90 |
3.8 |
47.267 |
0.813 |
38.444 |
2234.138 |
0.662 |
8 |
55 |
3.2 |
12.267 |
0.213 |
2.617 |
150.471 |
0.046 |
9 |
17 |
2.4 |
-25.733 |
-0.587 |
15.097 |
662.204 |
0.344 |
10 |
62 |
3.3 |
19.267 |
0.313 |
6.037 |
371.204 |
0.098 |
11 |
51 |
3.1 |
8.267 |
0.113 |
0.937 |
68.338 |
0.013 |
12 |
30 |
2.8 |
-12.733 |
-0.187 |
2.377 |
162.138 |
0.035 |
13 |
9 |
1.6 |
-33.733 |
-1.387 |
46.777 |
1137.938 |
1.923 |
14 |
39 |
3.4 |
-3.733 |
0.413 |
-1.543 |
13.938 |
0.171 |
15 |
42 |
3.2 |
-0.733 |
0.213 |
-0.156 |
0.538 |
0.046 |
Total |
641 |
44.8 |
0 |
0 |
164.247 |
7258.933 |
4.977 |
Mean |
42.733 |
2.987 |
Answer(c):
To have an idea about the linear relationship between two variables we generally the scatter plot between two variables. Below is the scatter plot between Annual household income and student’s high school.
By scatter plot we can say there is positive linear relationship between Annual household income and student’s high school.
The correlation coefficient between two variables is given by following formula
The value of correlation coefficient is 0.8641 which indicates that there is a strong positive association between the two variables.
Answer(d):
The regression equation between two variables can be given by following equation
The estimate of b0 and b1 can be obtained by least square method.
The least square estimate of b1 can be given by following expression
The estimate of b0 can be given by
The final estimated equation of regression line can be given as
Answer(e): we have to predict the GPA of a high-schooler that comes from a family that has a household income of $48,000 that means we have to predict the value of y if the value of x is 48 using the above estimated regression equation.
Hence the GPA of a high-schooler that comes from a family that has a household income of $48,000 is 3.1
Answer(f):
The model’s prediction of GPA for the family that makes $44,000 is
The actual GPA of high-schooler for the family that makes $44,000 is 3.5, which means the residual is 0.5 which is large value in this context, so we can say that the is prediction is not so accurate.
Answer(g):
If a family’s income increases by $10,000, the expected amount of change in a student’s GPA can be given as below:
Change in GPA = 10*0.0226
Change in GPA = 0.226
If a family’s income increases by $10,000, the amount of change in a student’s GPA, as predicted by the model is 0.226
Answer(h):
Based on the above evidence and result, it would be correct to conclude that household income is “causing” GPA.
It is also possible that there are other variables that are “lurking,” causing GPA and household income to be correlated and these variables may be number of hours student study, number of days student was present in school, if the student is taking extra tutorials, the type of school in which student is studying etc. There are several variables which affects the students GPA.