Question

In: Advanced Math

Make a numerical and graphical summary of the data, commenting on any features that you find...

Make a numerical and graphical summary of the data, commenting on any features that you find interesting. Limit the output you present to a quality that a busy reader would find sufficient to get a basic understanding of the data. PLEASE SHOW ME HOW TO USE R TO SOLVE THIS.

race - racial composition in percent minority

fire - fires per 100 housing units

theft - theft per 1000 population

age - percent of housing units built before 1939

volact - new homeowner policies plus renewals minus cancellations and non renewals per 100 housing units

involact - new FAIR plan policies and renewals per 100 housing units

income - median family income

Note that first column of the data set is ZIP code in Chicago

race fire theft age volact involact income
60626 10.0 6.2 29 60.4 5.3 0.0 11744
60640 22.2 9.5 44 76.5 3.1 0.1 9323
60613 19.6 10.5 36 73.5 4.8 1.2 9948
60657 17.3 7.7 37 66.9 5.7 0.5 10656
60614 24.5 8.6 53 81.4 5.9 0.7 9730
60610 54.0 34.1 68 52.6 4.0 0.3 8231
60611 4.9 11.0 75 42.6 7.9 0.0 21480
60625 7.1 6.9 18 78.5 6.9 0.0 11104
60618 5.3 7.3 31 90.1 7.6 0.4 10694
60647 21.5 15.1 25 89.8 3.1 1.1 9631
60622 43.1 29.1 34 82.7 1.3 1.9 7995
60631 1.1 2.2 14 40.2 14.3 0.0 13722
60646 1.0 5.7 11 27.9 12.1 0.0 16250
60656 1.7 2.0 11 7.7 10.9 0.0 13686
60630 1.6 2.5 22 63.8 10.7 0.0 12405
60634 1.5 3.0 17 51.2 13.8 0.0 12198
60641 1.8 5.4 27 85.1 8.9 0.0 11600
60635 1.0 2.2 9 44.4 11.5 0.0 12765
60639 2.5 7.2 29 84.2 8.5 0.2 11084
60651 13.4 15.1 30 89.8 5.2 0.8 10510
60644 59.8 16.5 40 72.7 2.7 0.8 9784
60624 94.4 18.4 32 72.9 1.2 1.8 7342
60612 86.2 36.2 41 63.1 0.8 1.8 6565
60607 50.2 39.7 147 83.0 5.2 0.9 7459
60623 74.2 18.5 22 78.3 1.8 1.9 8014
60608 55.5 23.3 29 79.0 2.1 1.5 8177
60616 62.3 12.2 46 48.0 3.4 0.6 8212
60632 4.4 5.6 23 71.5 8.0 0.3 11230
60609 46.2 21.8 4 73.1 2.6 1.3 8330
60653 99.7 21.6 31 65.0 0.5 0.9 5583
60615 73.5 9.0 39 75.4 2.7 0.4 8564
60638 10.7 3.6 15 20.8 9.1 0.0 12102
60629 1.5 5.0 32 61.8 11.6 0.0 11876
60636 48.8 28.6 27 78.1 4.0 1.4 9742
60621 98.9 17.4 32 68.6 1.7 2.2 7520
60637 90.6 11.3 34 73.4 1.9 0.8 7388
60652 1.4 3.4 17 2.0 12.9 0.0 13842
60620 71.2 11.9 46 57.0 4.8 0.9 11040
60619 94.1 10.5 42 55.9 6.6 0.9 10332
60649 66.1 10.7 43 67.5 3.1 0.4 10908
60617 36.4 10.8 34 58.0 7.8 0.9 11156
60655 1.0 4.8 19 15.2 13.0 0.0 13323
60643 42.5 10.4 25 40.8 10.2 0.5 12960
60628 35.1 15.6 28 57.8 7.5 1.0 11260
60627 47.4 7.0 3 11.4 7.7 0.2 10080
60633 34.0 7.1 23 49.2 11.6 0.3 11428
60645 3.1 4.9 27 46.6 10.9 0.0 13731

Solutions

Expert Solution

ANSWER:

Description

The sat data frame has 50 rows and 7 columns. Data were collected to study the relationship between expenditures on public education and test results.

This data contains the following columns:

expend
Current expenditure per pupil in average daily attendance in public elementary and secondary schools, 1994-95 (in thousands of dollars)

ratio
Average pupil/teacher ratio in public elementary and secondary schools, Fall 1994

salary
Estimated average annual salary of teachers in public elementary and secondary schools, 1994-95 (in thousands of dollars)

takers
Percentage of all eligible students taking the SAT, 1994-95

verbal
Average verbal SAT score, 1994-95

math
Average math SAT score, 1994-95

total
Average total score on the SAT, 1994-95

Source

"Getting What You Pay For: The Debate Over Equity in Public School Expenditures" D. Guber, Journal of Statistics Education, 1999

The basic summary statistics of the data are as follows :

expend ratio salary takers verbal math total
Min. 3.656 13.80 25.99 4.00 401.0 443.0 844.0
1st Qu. 4.882 15.22 30.98 9.00 427.2 474.8 897.2
Median 5.768 16.60 33.29 28.00 448.0 497.5 945.5
Mean 5.905 16.86 34.83 35.24 457.1 508.8 965.9
3rd Qu. 6.434 17.58 38.55 63.00 490.2 539.5 1032.0
Max. 9.774 24.30 50.04 81.00 516.0 592.0 1107.0

Now, let us see these graphically using boxplots,

Clearly, it is visible from the above plots that expend and ratio have more or less a normal bell shaped curve, but some extreme values are present in them.The variables salary and takers have a positivel skewed distribution whereas takers have much more dispersion than the salary. On the other hand, verbal and maths have same dispersion more or less same but maths scores are greater than that of verbal scores, but both the variables have a skewed distribution. Lastly, total also has a positively skewed distribution.

Now let's check for the pairwise relationships.

Following is the pairwise scatter plot of the variables :

Note that, expend has a strong positive relationship with salary but has a weak negative correlation with the other variables except takers. Again takers has a strong negative relationship with verbal, maths and total. On the other hand, verbal & maths, verbal & total , and, maths and total have a strong positive relationship among themselves.

CODE

library(faraway)
x<-sat
summary(x)
plot(x)
descr<-matrix(nrow=6,ncol=7)#row.names = c("Min.","1st Qu.","Median","Mean","3rd Qu.","Max."))
for(i in 1:7)
{
descr[,i]<-summary(x[,i])
}
descr<-data.frame(descr,row.names = c("Min.","1st Qu.","Median","Mean","3rd Qu.","Max."))
colnames(descr)<-colnames(x)
print(descr)
boxplot(x[,1:2])
boxplot(x[,3:4])
boxplot(x[,5:6])
boxplot(x[,7],xlab="total")
plot(x)


Related Solutions

For this exercise, you will need to use the package `mosaic` to find numerical and graphical...
For this exercise, you will need to use the package `mosaic` to find numerical and graphical summaries. ```{r warning=FALSE, message=FALSE} # install packages if necessary if (!require(mosaic)) install.packages(`mosaic`) if (!require(dplyr)) install.packages(`dplyr`) if (!require(gapminder)) install.packages(`gapminder`) # load the package in R library(mosaic) # load the package mosaic to use its functions library(dplyr) # load the package dplyr to use its functions library(gapminder) # load the package gapminder for question 1 ``` 1. Using the gapminder data in the lesson, do the...
1: A numerical measurement that describes a population is called a ___________. 2: For any data...
1: A numerical measurement that describes a population is called a ___________. 2: For any data set, approximately what percentage of the data is less than the 60th percentile? 4: Sally’s z-score for her test was ? = 1.4. Her actual score was ? = 84. The class standard deviation was ? = 2.0. What was the mean of the test? 5: The batting average of the Macomb College baseball team is normally distributed with a mean of 0.272 and...
Find two examples of current (within last five years) graphical misrepresentations of data.
Find two examples of current (within last five years) graphical misrepresentations of data.
Find quantitative data about something that you are interested in. Make sure to get data on...
Find quantitative data about something that you are interested in. Make sure to get data on at least 50 individuals. 50 football players height a. You don’t need to collect the data yourself, but you do need to find out and explain how the data was collected. b. In order to be useful, this sample needs to be representative of some population.        i. What population is represented by your sample   ii. Describe biases that may result from your sampling method....
essay on the best firewall on the market. what features do you find important/ what features...
essay on the best firewall on the market. what features do you find important/ what features make the beat firewall.
Find the mean, median, mode, five number summary, and make a box plot for the following...
Find the mean, median, mode, five number summary, and make a box plot for the following data set: 3, 6, 8, 12, 9, 14, 16, 7, 4, 6
Statistics 1. Find two examples of current (within last five years) graphical misrepresentations of data. Create...
Statistics 1. Find two examples of current (within last five years) graphical misrepresentations of data. Create a Word document containing an image of each graph and a paragraph explaining how each graph misrepresents data.
Find the five number summary for the data given in the following stem and leaf plot....
Find the five number summary for the data given in the following stem and leaf plot. 4 0 2 7 8 5 8 6 0 7 9 9 7 0 2 3 7 8 9 8 1 1 4 8 Min = Q1 = Median = Q3 = Max =
Using the following stem & leaf plot, find the five number summary for the data by...
Using the following stem & leaf plot, find the five number summary for the data by hand. 1|7 9 2|1 4 8 3|1 4 4|1 4 6 7 5|2 4 5 6 7 6|0 2 Min = Q1 = Med = Q3 = Max =
You are allowed to use any sources you like. Make sure to cite any sources you...
You are allowed to use any sources you like. Make sure to cite any sources you use (any reasonable citation style is fine). Do not copy text from any source (short, cited quotes OK). Question: In a species of deer, you find an allele that causes females to always produce an equal number of male and female offspring. Present an argument affirming that this sex ratio is an evolutionary stable strategy. Then, try to present a plausible scenario in this...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT