In: Statistics and Probability
After examining these data for all the jurisdictions, someone notes that certain areas have an unusually high “percent of 18-64 yr-olds with no high school diploma.” Based on this finding, this individual concludes that the high percentages are due to the rising population of immigrants in those areas. Further, the individual argues that any estimates of the associated “percent of low-income working families” in those areas should be recalculated after removing this sub-population from the data set, as they are causing the area to “look bad”. In addition to thinking critically, use the key rules about linear regression and extrapolation to write a statistically appropriate and socially responsible response to the individual’s conclusion and argument.
2011 Data |
||
Jurisdiction |
Percent of low income working families (<200% poverty level) |
Percent of 18-64 year olds with no HS diploma |
Alabama |
37.3 |
15.3 |
Alaska |
25.9 |
8.6 |
Arizona |
38.9 |
14.8 |
Arkansas |
41.8 |
14 |
California |
34.3 |
17.6 |
Colorado |
27.6 |
10.1 |
Connecticut |
21.1 |
9.5 |
Delaware |
27.8 |
11.9 |
District of Columbia |
23.2 |
10.8 |
Florida |
37.3 |
13.1 |
Georgia |
36.6 |
14.9 |
Hawaii |
25.8 |
7.2 |
Idaho |
38.6 |
10.7 |
Illinois |
30.4 |
11.5 |
Indiana |
31.9 |
12.2 |
Iowa |
28.8 |
8.1 |
Kansas |
32 |
9.7 |
Kentucky |
34.1 |
13.6 |
Louisiana |
36.3 |
16.1 |
Maine |
30.4 |
7.1 |
Maryland |
19.5 |
9.7 |
Massachusetts |
20.1 |
9.1 |
Michigan |
31.6 |
10 |
Minnesota |
24.2 |
7.3 |
Mississippi |
43.6 |
17 |
Missouri |
32.7 |
11.1 |
Montana |
36 |
7 |
Nebraska |
31.1 |
8.7 |
Nevada |
37.4 |
16.6 |
New Hampshire |
19.7 |
7.3 |
New Jersey |
21.2 |
10.1 |
New Mexico |
43 |
16.2 |
New York |
30.2 |
13 |
North Carolina |
36.2 |
13.6 |
North Dakota |
27.2 |
5.9 |
Ohio |
31.8 |
10.3 |
Oklahoma |
37.4 |
13.2 |
Oregon |
33.9 |
10.8 |
Pennsylvania |
26 |
9.4 |
Rhode Island |
26.9 |
12 |
South Carolina |
38.3 |
14.2 |
South Dakota |
31 |
8.7 |
Tennessee |
36.6 |
12.7 |
Texas |
38.3 |
17.8 |
Utah |
32.3 |
9.9 |
Vermont |
26.2 |
6.6 |
Virginia |
23.3 |
10.2 |
Washington |
26.4 |
10.2 |
West Virginia |
36.1 |
12.9 |
Wisconsin |
28.7 |
8.5 |
Wyoming |
28.1 |
8 |
ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
total sum | 1595.1 | 574.8 | 1943.091765 | 486.6 | 681.10 |
mean | 31.28 | 11.27 | SSxx | SSyy | SSxy |
sample size , n = 51
here, x̅ = Σx / n= 31.28 ,
ȳ = Σy/n = 11.27
SSxx = Σ(x-x̅)² = 1943.0918
SSxy= Σ(x-x̅)(y-ȳ) = 681.1
estimated slope , ß1 = SSxy/SSxx = 681.1
/ 1943.092 = 0.3505
intercept, ß0 = y̅-ß1* x̄ =
0.3074
so, regression line is Ŷ =
0.3074 + 0.3505 *x
SSE= (SSxx * SSyy - SS²xy)/SSxx =
247.861
std error ,Se = √(SSE/(n-2)) =
2.249
correlation coefficient , r = Sxy/√(Sx.Sy)
= 0.7005
Relation is positive and moderate.
THANKS
revert back for doubt
please upvote