In: Math
Q.1: The iris dataset (included with R) contains four measurements for 150 flowers representing three species of iris (Iris setosa, versicolor and virginica). 1. Inspect the Iris data in R. 2. Use the summary code in R to perform descriptive analysis. Paste Summary statistics in your report. 3. Draw a scatter plot, for petal length vs petal width. 4. Find all possible correlation between quantitative variables. 5. Use Function lm for developing a regression model and paste the summary of the regression model in your report----Petal.Width ~ Petal.Lengt and for Sepal.Length ~ Sepal.Width
>
> data = iris
>
> ### Data inspection
>
> dim(data)
[1] 150 5
>
> head(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
>
> tail(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
>
> ### Exploraory Analysis
>
> summary(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st
Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median
:1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd
Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
>
>
> ### scatter plot, for petal length vs petal width
>
> plot(data$Petal.Length, data$Petal.Width, main = "Scatter
Plot")
>
> ### all possible correlation between quantitative
variables
>
> cor(data$Sepal.Length, data$Sepal.Width)
[1] -0.1175698
> cor(data$Sepal.Length, data$Petal.Length)
[1] 0.8717538
> cor(data$Sepal.Length, data$Petal.Width)
[1] 0.8179411
> cor(data$Sepal.Width, data$Petal.Length)
[1] -0.4284401
> cor(data$Sepal.Width, data$Petal.Width)
[1] -0.3661259
> cor(data$Petal.Length, data$Petal.Width)
[1] 0.9628654
>
> ### Regression model
>
> Model_1 = lm(data$Sepal.Length ~ data$Sepal.Width)
> summary(Model_1)
Call:
lm(formula = data$Sepal.Length ~ data$Sepal.Width)
Residuals:
Min 1Q Median 3Q Max
-1.5561 -0.6333 -0.1120 0.5579 2.2226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5262 0.4789 13.63 <2e-16 ***
data$Sepal.Width -0.2234 0.1551 -1.44 0.152
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8251 on 148 degrees of freedom
Multiple R-squared: 0.01382, Adjusted R-squared: 0.007159
F-statistic: 2.074 on 1 and 148 DF, p-value: 0.1519
>
> Model_2 = lm(data$Petal.Width ~ data$Petal.Length)
> summary(Model_1)
Call:
lm(formula = data$Sepal.Length ~ data$Sepal.Width)
Residuals:
Min 1Q Median 3Q Max
-1.5561 -0.6333 -0.1120 0.5579 2.2226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5262 0.4789 13.63 <2e-16 ***
data$Sepal.Width -0.2234 0.1551 -1.44 0.152
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8251 on 148 degrees of freedom
Multiple R-squared: 0.01382, Adjusted R-squared: 0.007159
F-statistic: 2.074 on 1 and 148 DF, p-value: 0.1519
>