In: Statistics and Probability
After examining these data for all the jurisdictions, someone notes that certain areas have an unusually high “percent of 18-64 yr-olds with no high school diploma.” Based on this finding, this individual concludes that the high percentages are due to the rising population of immigrants in those areas. Further, the individual argues that any estimates of the associated “percent of low-income working families” in those areas should be recalculated after removing this sub-population from the data set, as they are causing the area to “look bad”. In addition to thinking critically, use the key rules about linear regression and extrapolation to write a statistically appropriate and socially responsible response to the individual’s conclusion and argument.
| 
 2011 Data  | 
||
| 
 Jurisdiction  | 
 Percent of low income working families (<200% poverty level)  | 
 Percent of 18-64 year olds with no HS diploma  | 
| 
 Alabama  | 
 37.3  | 
 15.3  | 
| 
 Alaska  | 
 25.9  | 
 8.6  | 
| 
 Arizona  | 
 38.9  | 
 14.8  | 
| 
 Arkansas  | 
 41.8  | 
 14  | 
| 
 California  | 
 34.3  | 
 17.6  | 
| 
 Colorado  | 
 27.6  | 
 10.1  | 
| 
 Connecticut  | 
 21.1  | 
 9.5  | 
| 
 Delaware  | 
 27.8  | 
 11.9  | 
| 
 District of Columbia  | 
 23.2  | 
 10.8  | 
| 
 Florida  | 
 37.3  | 
 13.1  | 
| 
 Georgia  | 
 36.6  | 
 14.9  | 
| 
 Hawaii  | 
 25.8  | 
 7.2  | 
| 
 Idaho  | 
 38.6  | 
 10.7  | 
| 
 Illinois  | 
 30.4  | 
 11.5  | 
| 
 Indiana  | 
 31.9  | 
 12.2  | 
| 
 Iowa  | 
 28.8  | 
 8.1  | 
| 
 Kansas  | 
 32  | 
 9.7  | 
| 
 Kentucky  | 
 34.1  | 
 13.6  | 
| 
 Louisiana  | 
 36.3  | 
 16.1  | 
| 
 Maine  | 
 30.4  | 
 7.1  | 
| 
 Maryland  | 
 19.5  | 
 9.7  | 
| 
 Massachusetts  | 
 20.1  | 
 9.1  | 
| 
 Michigan  | 
 31.6  | 
 10  | 
| 
 Minnesota  | 
 24.2  | 
 7.3  | 
| 
 Mississippi  | 
 43.6  | 
 17  | 
| 
 Missouri  | 
 32.7  | 
 11.1  | 
| 
 Montana  | 
 36  | 
 7  | 
| 
 Nebraska  | 
 31.1  | 
 8.7  | 
| 
 Nevada  | 
 37.4  | 
 16.6  | 
| 
 New Hampshire  | 
 19.7  | 
 7.3  | 
| 
 New Jersey  | 
 21.2  | 
 10.1  | 
| 
 New Mexico  | 
 43  | 
 16.2  | 
| 
 New York  | 
 30.2  | 
 13  | 
| 
 North Carolina  | 
 36.2  | 
 13.6  | 
| 
 North Dakota  | 
 27.2  | 
 5.9  | 
| 
 Ohio  | 
 31.8  | 
 10.3  | 
| 
 Oklahoma  | 
 37.4  | 
 13.2  | 
| 
 Oregon  | 
 33.9  | 
 10.8  | 
| 
 Pennsylvania  | 
 26  | 
 9.4  | 
| 
 Rhode Island  | 
 26.9  | 
 12  | 
| 
 South Carolina  | 
 38.3  | 
 14.2  | 
| 
 South Dakota  | 
 31  | 
 8.7  | 
| 
 Tennessee  | 
 36.6  | 
 12.7  | 
| 
 Texas  | 
 38.3  | 
 17.8  | 
| 
 Utah  | 
 32.3  | 
 9.9  | 
| 
 Vermont  | 
 26.2  | 
 6.6  | 
| 
 Virginia  | 
 23.3  | 
 10.2  | 
| 
 Washington  | 
 26.4  | 
 10.2  | 
| 
 West Virginia  | 
 36.1  | 
 12.9  | 
| 
 Wisconsin  | 
 28.7  | 
 8.5  | 
| 
 Wyoming  | 
 28.1  | 
 8  | 
| ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
| total sum | 1595.1 | 574.8 | 1943.091765 | 486.6 | 681.10 | 
| mean | 31.28 | 11.27 | SSxx | SSyy | SSxy | 
sample size ,   n =   51  
       
here, x̅ = Σx / n=   31.28   ,
    ȳ = Σy/n =   11.27  
          
       
SSxx =    Σ(x-x̅)² =    1943.0918  
       
SSxy=   Σ(x-x̅)(y-ȳ) =   681.1  
       
          
       
estimated slope , ß1 = SSxy/SSxx =   681.1  
/   1943.092   =   0.3505
          
       
intercept,   ß0 = y̅-ß1* x̄ =  
0.3074          
          
       
so, regression line is   Ŷ =  
0.3074   +   0.3505   *x
          
       
SSE=   (SSxx * SSyy - SS²xy)/SSxx =   
247.861          
          
       
std error ,Se =    √(SSE/(n-2)) =   
2.249          
          
       
correlation coefficient ,    r = Sxy/√(Sx.Sy)
=   0.7005     
   
Relation is positive and moderate.
THANKS
revert back for doubt
please upvote