In: Advanced Math
Make a numerical and graphical summary of the data, commenting on any features that you find interesting. Limit the output you present to a quality that a busy reader would find sufficient to get a basic understanding of the data. PLEASE SHOW ME HOW TO USE R TO SOLVE THIS.
race - racial composition in percent minority
fire - fires per 100 housing units
theft - theft per 1000 population
age - percent of housing units built before 1939
volact - new homeowner policies plus renewals minus cancellations and non renewals per 100 housing units
involact - new FAIR plan policies and renewals per 100 housing units
income - median family income
Note that first column of the data set is ZIP code in Chicago
race fire theft age volact involact income
60626 10.0 6.2 29 60.4 5.3 0.0 11744
60640 22.2 9.5 44 76.5 3.1 0.1 9323
60613 19.6 10.5 36 73.5 4.8 1.2 9948
60657 17.3 7.7 37 66.9 5.7 0.5 10656
60614 24.5 8.6 53 81.4 5.9 0.7 9730
60610 54.0 34.1 68 52.6 4.0 0.3 8231
60611 4.9 11.0 75 42.6 7.9 0.0 21480
60625 7.1 6.9 18 78.5 6.9 0.0 11104
60618 5.3 7.3 31 90.1 7.6 0.4 10694
60647 21.5 15.1 25 89.8 3.1 1.1 9631
60622 43.1 29.1 34 82.7 1.3 1.9 7995
60631 1.1 2.2 14 40.2 14.3 0.0 13722
60646 1.0 5.7 11 27.9 12.1 0.0 16250
60656 1.7 2.0 11 7.7 10.9 0.0 13686
60630 1.6 2.5 22 63.8 10.7 0.0 12405
60634 1.5 3.0 17 51.2 13.8 0.0 12198
60641 1.8 5.4 27 85.1 8.9 0.0 11600
60635 1.0 2.2 9 44.4 11.5 0.0 12765
60639 2.5 7.2 29 84.2 8.5 0.2 11084
60651 13.4 15.1 30 89.8 5.2 0.8 10510
60644 59.8 16.5 40 72.7 2.7 0.8 9784
60624 94.4 18.4 32 72.9 1.2 1.8 7342
60612 86.2 36.2 41 63.1 0.8 1.8 6565
60607 50.2 39.7 147 83.0 5.2 0.9 7459
60623 74.2 18.5 22 78.3 1.8 1.9 8014
60608 55.5 23.3 29 79.0 2.1 1.5 8177
60616 62.3 12.2 46 48.0 3.4 0.6 8212
60632 4.4 5.6 23 71.5 8.0 0.3 11230
60609 46.2 21.8 4 73.1 2.6 1.3 8330
60653 99.7 21.6 31 65.0 0.5 0.9 5583
60615 73.5 9.0 39 75.4 2.7 0.4 8564
60638 10.7 3.6 15 20.8 9.1 0.0 12102
60629 1.5 5.0 32 61.8 11.6 0.0 11876
60636 48.8 28.6 27 78.1 4.0 1.4 9742
60621 98.9 17.4 32 68.6 1.7 2.2 7520
60637 90.6 11.3 34 73.4 1.9 0.8 7388
60652 1.4 3.4 17 2.0 12.9 0.0 13842
60620 71.2 11.9 46 57.0 4.8 0.9 11040
60619 94.1 10.5 42 55.9 6.6 0.9 10332
60649 66.1 10.7 43 67.5 3.1 0.4 10908
60617 36.4 10.8 34 58.0 7.8 0.9 11156
60655 1.0 4.8 19 15.2 13.0 0.0 13323
60643 42.5 10.4 25 40.8 10.2 0.5 12960
60628 35.1 15.6 28 57.8 7.5 1.0 11260
60627 47.4 7.0 3 11.4 7.7 0.2 10080
60633 34.0 7.1 23 49.2 11.6 0.3 11428
60645 3.1 4.9 27 46.6 10.9 0.0 13731
ANSWER:
Description
The sat data frame has 50 rows and 7 columns. Data were collected to study the relationship between expenditures on public education and test results.
This data contains the following columns:
expend
Current expenditure per pupil in average daily attendance in public
elementary and secondary schools, 1994-95 (in thousands of
dollars)
ratio
Average pupil/teacher ratio in public elementary and secondary
schools, Fall 1994
salary
Estimated average annual salary of teachers in public elementary
and secondary schools, 1994-95 (in thousands of dollars)
takers
Percentage of all eligible students taking the SAT, 1994-95
verbal
Average verbal SAT score, 1994-95
math
Average math SAT score, 1994-95
total
Average total score on the SAT, 1994-95
Source
"Getting What You Pay For: The Debate Over Equity in Public School Expenditures" D. Guber, Journal of Statistics Education, 1999
The basic summary statistics of the data are as follows :
expend ratio salary takers verbal math total
Min. 3.656 13.80 25.99 4.00 401.0 443.0 844.0
1st Qu. 4.882 15.22 30.98 9.00 427.2 474.8 897.2
Median 5.768 16.60 33.29 28.00 448.0 497.5 945.5
Mean 5.905 16.86 34.83 35.24 457.1 508.8 965.9
3rd Qu. 6.434 17.58 38.55 63.00 490.2 539.5 1032.0
Max. 9.774 24.30 50.04 81.00 516.0 592.0 1107.0
Now, let us see these graphically using boxplots,
Clearly, it is visible from the above plots that expend and ratio have more or less a normal bell shaped curve, but some extreme values are present in them.The variables salary and takers have a positivel skewed distribution whereas takers have much more dispersion than the salary. On the other hand, verbal and maths have same dispersion more or less same but maths scores are greater than that of verbal scores, but both the variables have a skewed distribution. Lastly, total also has a positively skewed distribution.
Now let's check for the pairwise relationships.
Following is the pairwise scatter plot of the variables :
Note that, expend has a strong positive relationship with salary but has a weak negative correlation with the other variables except takers. Again takers has a strong negative relationship with verbal, maths and total. On the other hand, verbal & maths, verbal & total , and, maths and total have a strong positive relationship among themselves.
CODE
library(faraway)
x<-sat
summary(x)
plot(x)
descr<-matrix(nrow=6,ncol=7)#row.names = c("Min.","1st
Qu.","Median","Mean","3rd Qu.","Max."))
for(i in 1:7)
{
descr[,i]<-summary(x[,i])
}
descr<-data.frame(descr,row.names = c("Min.","1st
Qu.","Median","Mean","3rd Qu.","Max."))
colnames(descr)<-colnames(x)
print(descr)
boxplot(x[,1:2])
boxplot(x[,3:4])
boxplot(x[,5:6])
boxplot(x[,7],xlab="total")
plot(x)