In: Statistics and Probability
1. Use the R command X <- iris to assign Fishers’ iris dataset to the data matrix X. Using the head(X) command summarize what each column of the dataset is measuring and represents. Assign Y as a new matrix of dimension 150 by 4 which has the values of X without the species label.
2. Compute and interpret (in summary English) each of the summary statistics X,S,R using R.
3. Visualize the dataset by making a scatterplot of Sepal Length vs. Sepal Width, a scatterplot of Petal Length vs. Petal Width. The pairs function is useful here. Use your plots and stats from #2 to comment on any evident correlations.
Rcode.R
X=iris #assigning iris
datset to data matrix X
head(X)
## Sepal.Length Sepal.Width
Petal.Length Petal.Width Species
##
1
5.1
3.5
1.4 0.2
setosa
##
2
4.9
3.0
1.4 0.2
setosa
##
3
4.7
3.2
1.3 0.2
setosa
##
4
4.6
3.1
1.5 0.2
setosa
##
5
5.0
3.6
1.4 0.2
setosa
##
6
5.4
3.9
1.7 0.4
setosa
# the first column
shows the variable Sepal length, which meausres the length of
Sepals
# the
second column shows the variable Sepal length, which meausres the
width of Sepals
# the
third column shows the variable Petal width, which meausres the
length of Petals
# the
fourth column shows the variable Petal width, which meausres the
width of Petals
Y=X[,-5]
#assigning Y as a
matrix of dimension 150*4 having columns of X except the last
column(Species) of X
summary(X) #computes the summary for each column of X
##
Sepal.Length Sepal.Width
Petal.Length Petal.Width
## Min.
:4.300 Min. :2.000
Min. :1.000 Min.
:0.100
## 1st
Qu.:5.100 1st Qu.:2.800 1st
Qu.:1.600 1st Qu.:0.300
## Median
:5.800 Median :3.000 Median
:4.350 Median :1.300
## Mean
:5.843 Mean :3.057
Mean :3.758 Mean
:1.199
## 3rd
Qu.:6.400 3rd Qu.:3.300 3rd
Qu.:5.100 3rd Qu.:1.800
## Max.
:7.900 Max. :4.400 Max.
:6.900 Max. :2.500
##
Species
## setosa
:50
##
versicolor:50
## virginica
:50
##
##
##
attach(X)
#Scatter
plots
plot(Sepal.Length,Sepal.Width)
plot(Petal.Length,Petal.Width)
##for correlation
interpretaion
pairs(X[1:2])
## The
graph shows no correlation between Sepal length and Sepal
width
pairs(X[3:4]) ## The graph shows higher positive correlation between Petal length and Petal width