Question

In: Statistics and Probability

You are conducting a study to determine if there is a relationship between annual household income...

You are conducting a study to determine if there is a relationship between annual household income and a high school student’s GPA. The school district you are studying is diverse and lower income. a) Before you conduct the study, do you expect there to be an association between these two variables? Why or why not? Which should be the explanatory variable? b) You collect data from a random sample of 15 students. The first row of the table is household income of a particular student (in thousands of dollars) and the second row is the GPA of that particular student. 42 30 82 19 29 44 90 55 17 62 51 30 9 39 42 3.1 2.6 3.8 2.7 2.3 3.5 3.8 3.2 2.4 3.3 3.1 2.8 1.6 3.4 3.2 c) Does the data have a scatterplot that shows a linear association? What is the correlation coefficient? What does it tell you about the association between these two variables? d) Use the above data to make a linear (regression) model. e) Use the model to predict the GPA of a high-schooler that comes from a family that has a household income of $48,000. f) How accurate is the model’s prediction of GPA for the family that makes $44,000? g) If a family’s income increases by $10,000, what is the amount of change in a student’s GPA, as predicted by the model? h) Statisticians often state “correlation is not necessarily causation.” Would it be correct to conclude that household income is “causing” GPA? Is it possible that there are other variables that are “lurking,” causing GPA and household income to be correlated? What might these variables be?

Solutions

Expert Solution

Answer(a):

Yes, we expect there to be an association between these two variables because if a household has high annual income, the student may get better resources for study (i.e. better school, better facilities of learning etc.) which will definitely affect the high school’s GPA of student.

In this study the Annual household income is independent variable(x) and student’s high school GPA is dependent variable(y).

Answer(b):

SN

Household income(x)

GPA(y)

1

42

3.1

-0.733

0.113

-0.083

0.538

0.013

2

30

2.6

-12.733

-0.387

4.924

162.138

0.150

3

82

3.8

39.267

0.813

31.937

1541.871

0.662

4

19

2.7

-23.733

-0.287

6.804

563.271

0.082

5

29

2.3

-13.733

-0.687

9.430

188.604

0.472

6

44

3.5

1.267

0.513

0.650

1.604

0.264

7

90

3.8

47.267

0.813

38.444

2234.138

0.662

8

55

3.2

12.267

0.213

2.617

150.471

0.046

9

17

2.4

-25.733

-0.587

15.097

662.204

0.344

10

62

3.3

19.267

0.313

6.037

371.204

0.098

11

51

3.1

8.267

0.113

0.937

68.338

0.013

12

30

2.8

-12.733

-0.187

2.377

162.138

0.035

13

9

1.6

-33.733

-1.387

46.777

1137.938

1.923

14

39

3.4

-3.733

0.413

-1.543

13.938

0.171

15

42

3.2

-0.733

0.213

-0.156

0.538

0.046

Total

641

44.8

0

0

164.247

7258.933

4.977

Mean

42.733

2.987

Answer(c):

To have an idea about the linear relationship between two variables we generally the scatter plot between two variables. Below is the scatter plot between Annual household income and student’s high school.

By scatter plot we can say there is positive linear relationship between Annual household income and student’s high school.

The correlation coefficient between two variables is given by following formula

The value of correlation coefficient is 0.8641 which indicates that there is a strong positive association between the two variables.

Answer(d):

The regression equation between two variables can be given by following equation

The estimate of b0 and b1 can be obtained by least square method.

The least square estimate of b1 can be given by following expression

The estimate of b0 can be given by

The final estimated equation of regression line can be given as

Answer(e): we have to predict the GPA of a high-schooler that comes from a family that has a household income of $48,000 that means we have to predict the value of y if the value of x is 48 using the above estimated regression equation.

Hence the GPA of a high-schooler that comes from a family that has a household income of $48,000 is 3.1

Answer(f):

The model’s prediction of GPA for the family that makes $44,000 is

The actual GPA of high-schooler for the family that makes $44,000 is 3.5, which means the residual is 0.5 which is large value in this context, so we can say that the is prediction is not so accurate.

Answer(g):

If a family’s income increases by $10,000, the expected amount of change in a student’s GPA can be given as below:

Change in GPA = 10*0.0226

Change in GPA = 0.226

If a family’s income increases by $10,000, the amount of change in a student’s GPA, as predicted by the model is 0.226

Answer(h):

Based on the above evidence and result, it would be correct to conclude that household income is “causing” GPA.

It is also possible that there are other variables that are “lurking,” causing GPA and household income to be correlated and these variables may be number of hours student study, number of days student was present in school, if the student is taking extra tutorials, the type of school in which student is studying etc. There are several variables which affects the students GPA.


Related Solutions

You are conducting a study to determine if there is a relationship between annual household income...
You are conducting a study to determine if there is a relationship between annual household income and a high school student’s GPA. The school district you are studying is diverse and lower income. a) Before you conduct the study, do you expect there to be an association between these two variables? Why or why not? Which should be the explanatory variable? b) You collect data from a random sample of 15 students. The first row of the table is household...
QUESTION 3 A research firm is conducting a study to determine if there is a relationship...
QUESTION 3 A research firm is conducting a study to determine if there is a relationship between an individual’s age and the individual’s preferred source of news. The research firm asked 1,000 individuals to list their preferred source for news: newspaper, radio and television, or the Internet. The following results were obtained: Individual News Preference Preferred News Source Age of Respondent 20-30 31-40 41-50 Over 50 Newspaper 19 62 95 147 Radio/TV 27 125 168 88 Internet 104 113 37...
A sociologist conducted a study to determine whether there is a linear relationship between family income...
A sociologist conducted a study to determine whether there is a linear relationship between family income level (in thousand of dollars) and percent of income donated to charities. Income level                           42        48        50        59        65        72 (in 1000’s of $) Donating percent                    9          10        8          5          6          3 A) Calculate the correlation coefficient B) Using = 0.05 is there enough evidence to conclude that there is a significant linear correlation between the income level and the donating percent?...
You are conducting a case control study to determine if an association exists between melanoma and...
You are conducting a case control study to determine if an association exists between melanoma and indoor tanning. From a statewide cancer registry, you identify 1,107 people who were diagnosed with melanoma during the last three years. You select 1,500 controls for your study. Through a follow-up survey, you find that 696 of those with melanoma had a history of indoor tanning. 48.2% of the control group reported no exposure to indoor tanning. Part I: Calculate the odds ratio. Part...
Ahmad was conducting some tests to determine if there was a relationship between taking sleeping pills...
Ahmad was conducting some tests to determine if there was a relationship between taking sleeping pills and the amount of sleep a person gets. His null hypothesis was that taking sleeping pills did not have any effect on the amount of sleep a person receives. Ahmad's alternate hypothesis was that taking sleeping pills increases the amount of sleep a person will get. If Ahmad completed his study, which of the following statements would indicate a Type II error? Ahmad's results...
A scientist is conducting a statistical study on the relationship between the ages versus seat-belt usage....
A scientist is conducting a statistical study on the relationship between the ages versus seat-belt usage. The result is illustrated as the following table. 18-29 30-49 50-64 65 and over Total Wear Seat Belts 51 63 51 28 Don’t Wear Seat Belts 14 7 4 12 Total According to the above two-way contingency table, do we have enough evidence to conclude that ages is not related to seat-belt usage at 10% significance level?
Researchers are interested in conducting an observational study investigating the relationship between hepatitis C and early...
Researchers are interested in conducting an observational study investigating the relationship between hepatitis C and early mortality in HIV positive patients. The following table represents the findings from the study after a follow-up time period of 15 years. Deceased Not Deceased Total Hepatitis C + 421 348 769 Hepatitis C - 425 820 1245 Total 846 1168 2014 1)   What is the mortality rate in the exposed group and unexposed group? Exposed: 0.5475; unexposed: 0.3414 None of the other choices...
1. The relationship between the monthly expenditure on housing (E) and household income (Y) for three...
1. The relationship between the monthly expenditure on housing (E) and household income (Y) for three age groups are as follows: (1) Age less than 30: E = α0 + β1 Y + u1 (2) Age 31 to 55: E = β0 + β2 Y + u2 (3) Age 56 and over: E = γ0 + β3 Y + u3 a. Explain the meaning of β1, β2, and β3. b. What is the economic interpretation of the hypothesis β1 =...
The following data was collected during a study to determine if there is a relationship between...
The following data was collected during a study to determine if there is a relationship between the vertical drop of the mountain and the number of trails at the resort in New York State. (1200, 30), (700, 50), (700, 24), (1500, 62), (1010, 23), (3350, 67), (400, 15), (1600, 34) a.) Generate the regression equation. b.) Determine the correlation coefficient. c.) Is there enough evidence to conclude that the slope of the regression line is not zero at the 98%...
Q 6       A STUDY WAS CONDUCTED TO DETERMINE IF THERE IS A SIGNIFICANT RELATIONSHIP BETWEEN...
Q 6       A STUDY WAS CONDUCTED TO DETERMINE IF THERE IS A SIGNIFICANT RELATIONSHIP BETWEEN THE FREQUENCY OF MEAT SERVED AS A MAIN MEAL PER MONTH FOR INDIVIDUALS LIVING IN THE FOUR SECTIONS OF THE UNITED STATES. A QUESTIONAIRE WAS ADMINISTERED TO A RANDOM SAMPLE OF 400 FAMILIES AND THE RESULTS ARE SUMMARIZED BELOW                                                                                                    ...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT