In: Statistics and Probability
Y1 Y2 X3 X4 X5 X6 X7
478 184 40 74 11 31 20
494 213 32 72 11 43 18
643 347 57 70 18 16 16
341 565 31 71 11 25 19
773 327 67 72 9 29 24
603 260 25 68 8 32 15
484 325 34 68 12 24 14
546 102 33 62 13 28 11
424 38 36 69 7 25 12
548 226 31 66 9 58 15
506 137 35 60 13 21 9
819 369 30 81 4 77 36
541 109 44 66 9 37 12
491 809 32 67 11 37 16
514 29 30 65 12 35 11
371 245 16 64 10 42 14
457 118 29 64 12 21 10
437 148 36 62 7 81 27
570 387 30 59 15 31 16
432 98 23 56 15 50 15
619 608 33 46 22 24 8
357 218 35 54 14 27 13
623 254 38 54 20 22 11
547 697 44 45 26 18 8
792 827 28 57 12 23 11
799 693 35 57 9 60 18
439 448 31 61 19 14 12
867 942 39 52 17 31 10
912 1017 27 44 21 24 9
462 216 36 43 18 23 8
859 673 38 48 19 22 10
805 989 46 57 14 25 12
652 630 29 47 19 25 9
776 404 32 50 19 21 9
919 692 39 48 16 32 11
732 1517 44 49 13 31 14
657 879 33 72 13 13 22
1419 631 43 59 14 21 13
989 1375 22 49 9 46 13
821 1139 30 54 13 27 12
1740 3545 86 62 22 18 15
815 706 30 47 17 39 11
760 451 32 45 34 15 10
936 433 43 48 26 23 12
863 601 20 69 23 7 12
783 1024 55 42 23 23 11
715 457 44 49 18 30 12
1504 1441 37 57 15 35 13
1324 1022 82 72 22 15 16
940 1244 66 67 26 18 16
Y1 = Total reported crimes per million inhabitants Y2 = Crimes of violence reported per 100,000 inhabitants X3 = Annual budget for the police dollars per capita X4 =% of people 25 years old or older who finished high school X5 =% of young people between 16 and 19 years old who do not attend high school nor have graduated from it. X6 =% of young people between the ages of 18 and 24 who attend university X7 =% of people with 25 years or more who achieved a 4-year university career
The attached Excel document presents the crime statistics in a city. Other important information about education is also presented.
The purpose of this exercise is to create two models of multiple linear regression where we try to predict
(1) Y1 using as predictors X3, X5, X6
(2)) Y2 using as predictors X3, X4, X7
In each case you need:
A. The model (all beta coefficients) and the interpretation of each coefficient.
B. How significant are each of the coefficients
C. The coefficient of determination of the model (R squared)
D. The interpretation of R squared
E. In case (a) predict: What will be the rate of total crimes reported per million inhabitants if $ 50 per year are assigned per capita to the police, there is a 10% of young people between 16 and 19 who do not attend the high school (they have not completed it) and there is 50% of young people between 18 and 24 years old who attend university.
F. In case (b) predict: How many crimes of violence will be reported if 20 dollars per capita per year are allocated to the police, 60% of people over 25 years old have finished high school and there are 5% of people 25 years or older who achieved a 4-year university career.
G. After doing all this analysis, draw practical conclusions about the findings made in this city.
H. If you are a counselor for the authorities in that city, please write a paragraph of recommendations to follow to try to reduce crime
##First create a data file with ".txt" extension
d=read.table("data.txt",header=TRUE)
head(d)
#to create two models of multiple linear regression where we try
to predict
#(1) Y1 using as predictors X3, X5, X6
#(2) Y2 using as predictors X3, X4, X7
Y1=d[,1]
X3=d[,3]
X5=d[,5]
X6=d[,6]
Y2=d[,2]
X3=d[,3]
X4=d[,4]
X7=d[,7]
mod1=lm(Y1~X3+X5+X6)
mod1
Call:
lm(formula = Y1 ~ X3 + X5 + X6)
Coefficients:
(Intercept) X3 X5 X6
79.303 10.407 11.686 2.198
##Coefficients of Intercept=79.303, X3=10.407, X5=11.686,
X6=2.198
##Multiple linear regression equation is:
## Y1=79.303+ 10.407* X3+ 11.686*X5 + 2.198*X6
##If Xi (i=3,5,6) changes by one unit then Y1 changes by
coefficient of Xi units.
s1=summary(mod1)
s1
Call:
lm(formula = Y1 ~ X3 + X5 + X6)
Residuals:
Min 1Q Median 3Q Max
-333.6 -168.8 -85.3 114.7 787.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.303 213.860 0.371 0.712475
X3 10.407 2.741 3.797 0.000427 ***
X5 11.686 7.755 1.507 0.138650
X6 2.198 3.122 0.704 0.484833
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 250.4 on 46 degrees of freedom
Multiple R-squared: 0.3187, Adjusted R-squared: 0.2743
F-statistic: 7.174 on 3 and 46 DF, p-value: 0.0004747
##H0: beta=0 i.e. beta insgnificant
##Since,for X3, p-value=0.000427=0 which is less than alpha, H0 is
rejected.
##Variable X3 has significant impact on Y1.
##Coefficient of Determination= R squared=0.3187
##This means 31.87 % of variation in the response variable (Y1) is
explained by the independent variables (predictors).
mod2=lm(Y2~X3+X4+X7)
mod2
Call:
lm(formula = Y2 ~ X3 + X4 + X7)
Coefficients:
(Intercept) X3 X4 X7
702.18 22.20 -18.51 11.88
##Coefficients of Intercept=702.18, X3=22.20, X4=
-18.51, X7=11.88
##Multiple linear regression equation is:
## Y2=702.18 + 22.20* X3 -18.51 *X4 + 11.88 *X7
##If Xi (i=3,4,7) changes by one unit then Y2 changes by
coefficient of Xi units.
##Note: if X4 increases,then Y2 decreases as coefficient of X4 has
negative sign..
s2=summary(mod2)
s2
Call:
lm(formula = Y2 ~ X3 + X4 + X7)
Residuals:
Min 1Q Median 3Q Max
-814.88 -305.33 -84.35 199.02 1903.21
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 702.176 459.124 1.529 0.1330
X3 22.198 5.068 4.380 6.79e-05 ***
X4 -18.511 9.524 -1.944 0.0581 .
X7 11.884 18.411 0.645 0.5218
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 485.9 on 46 degrees of freedom
Multiple R-squared: 0.3268, Adjusted R-squared: 0.2829
F-statistic: 7.443 on 3 and 46 DF, p-value: 0.0003653
##H0: beta=0 i.e. beta insgnificant
##Since,for X3, p-value=6.79e-05=0 which is less than alpha, H0 is
rejected.
##Variable X3 has significant impact on Y2.
##Coefficient of Determination= R squared=0.3268
##This means 32.68% of variation in the response variable (Y2) is
explained by the independent variables (predictors).
#What will be the rate of total crimes reported per million
inhabitants if $ 50 per year are assigned per capita to the police,
there is a 10% of young people between 16 and 19 who do not attend
the high school (they have not completed it) and there is 50% of
young people between 18 and 24 years old who attend
university.
#To find Y1 when X3=50,X5=10,X6=50
#Multiple Linear Regression equation is: Y1=79.303+ 10.407*
X3+ 11.686*X5 + 2.198*X6
#We just plug in the values of given variables in this equation to
obtain predicted value
E=79.303+ 10.407* (50)+ 11.686*(10) +
2.198*(50)
E
#[1] 826.413
round(E)
#[1] 826
#826 crimes per million inhabitants are reported.
#95 crimes of violence will be reported if 20
dollars per capita per year are allocated to the police, 60% of
people over 25 years old have finished high school and there are 5%
of people 25 years or older who achieved a 4-year university
career.
#To find Y2 when X3=20,X4=60,X7=5
#Similar procedure
F=702.18 + 22.20* (20) -18.51 *(60) + 11.88
*(5)
F
#[1] 94.98
round(F)
#[1] 95
Conclusion and
Recommendations:
##Overall it is observed that variable X3 has significant impact on
both responses Y1 and Y2
##As annual budget for police increase, total crimes and Crimes of
violence increase
##"X4 =% of people 25 years old or older who finished high
school" has negative impact on Y2=crimes of violence
##This means X4 reduces crimes of violence. Therefore, people must
be encouraged to complete their education i.e. finish high
school.
##Several schemes must be implemented so that X5 decreases.
Encourage more and more people between 16-19 years to attend high
school and complete graduation. (X5 =% of young people between 16
and 19 years old who do not attend high school nor have graduated
from it.)
##Several schemes may be: