Question

In: Math

Q.1: The iris dataset (included with R) contains four measurements for 150 flowers representing three species...

Q.1: The iris dataset (included with R) contains four measurements for 150 flowers representing three species of iris (Iris setosa, versicolor and virginica). 1. Inspect the Iris data in R. 2. Use the summary code in R to perform descriptive analysis. Paste Summary statistics in your report. 3. Draw a scatter plot, for petal length vs petal width. 4. Find all possible correlation between quantitative variables. 5. Use Function lm for developing a regression model and paste the summary of the regression model in your report----Petal.Width ~ Petal.Lengt and for Sepal.Length ~ Sepal.Width

Solutions

Expert Solution

>
> data = iris
>
> ### Data inspection
>
> dim(data)
[1] 150 5
>
> head(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
>
> tail(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
>
> ### Exploraory Analysis
>
> summary(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100  
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300  
Median :5.800 Median :3.000 Median :4.350 Median :1.300  
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199  
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800  
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500  
Species  
setosa :50  
versicolor:50  
virginica :50  
  
  
  
>
>
> ### scatter plot, for petal length vs petal width
>
> plot(data$Petal.Length, data$Petal.Width, main = "Scatter Plot")
>


> ### all possible correlation between quantitative variables
>
> cor(data$Sepal.Length, data$Sepal.Width)
[1] -0.1175698
> cor(data$Sepal.Length, data$Petal.Length)
[1] 0.8717538
> cor(data$Sepal.Length, data$Petal.Width)
[1] 0.8179411
> cor(data$Sepal.Width, data$Petal.Length)
[1] -0.4284401
> cor(data$Sepal.Width, data$Petal.Width)
[1] -0.3661259
> cor(data$Petal.Length, data$Petal.Width)
[1] 0.9628654
>
> ### Regression model
>
> Model_1 = lm(data$Sepal.Length ~ data$Sepal.Width)
> summary(Model_1)

Call:
lm(formula = data$Sepal.Length ~ data$Sepal.Width)

Residuals:
Min 1Q Median 3Q Max
-1.5561 -0.6333 -0.1120 0.5579 2.2226

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 6.5262 0.4789 13.63 <2e-16 ***
data$Sepal.Width -0.2234 0.1551 -1.44 0.152   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8251 on 148 degrees of freedom
Multiple R-squared: 0.01382, Adjusted R-squared: 0.007159
F-statistic: 2.074 on 1 and 148 DF, p-value: 0.1519

>
> Model_2 = lm(data$Petal.Width ~ data$Petal.Length)
> summary(Model_1)

Call:
lm(formula = data$Sepal.Length ~ data$Sepal.Width)

Residuals:
Min 1Q Median 3Q Max
-1.5561 -0.6333 -0.1120 0.5579 2.2226

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 6.5262 0.4789 13.63 <2e-16 ***
data$Sepal.Width -0.2234 0.1551 -1.44 0.152   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8251 on 148 degrees of freedom
Multiple R-squared: 0.01382, Adjusted R-squared: 0.007159
F-statistic: 2.074 on 1 and 148 DF, p-value: 0.1519

>


Related Solutions

1. Use the R command X <- iris to assign Fishers’ iris dataset to the data...
1. Use the R command X <- iris to assign Fishers’ iris dataset to the data matrix X. Using the head(X) command summarize what each column of the dataset is measuring and represents. Assign Y as a new matrix of dimension 150 by 4 which has the values of X without the species label. 2. Compute and interpret (in summary English) each of the summary statistics X,S,R using R. 3. Visualize the dataset by making a scatterplot of Sepal Length...
1 An urn contains three green chips and four blue chips. Two chips are removed is...
1 An urn contains three green chips and four blue chips. Two chips are removed is succession. What is the probability both are blue if: a.) The chips are sampled with replacement?   b.) The chips are sampled without replacement? 2 An urn contains 5 orange and 4 red balls. Three balls are removed. Lex R be the random variable (R.V.) denoting the number of red balls in the sample of three balls. What is the probability two of the balls...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT