In: Statistics and Probability
Beneckea natriegens, a halophilic bacterium that grows very rapidly in optimal conditions, was observed over a period of seven hours. The number of cell counts per cubic centimeter in successive samples during its growth was recorded.
1: Construct a two-way scatter plot for “incubation time” against the “bacterial count” and on a separate graph construct a two-way scatter plot for “incubation time” against log of the “bacterial count”. Looking at the two graphs you plotted, explain as to which of these two do you consider to be closest to a linear relationship?
2: Compute r, the Pearson correlation coefficient
3: At the 0.05 level of significance, test the null hypothesis that the (“incubation time” and “bacterial count”) population correlation coefficient [ρ] is equal to 0.
4: Use the regression equation to predict the “bacterial count” for “incubation time” of 15 minutes.
Incubation time (min) |
20 |
40 |
60 |
90 |
120 |
180 |
240 |
300 |
360 |
420 |
Bacterial count |
47 |
62 |
73 |
103 |
220 |
537 |
1580 |
4500 |
9200 |
12800 |
1. I will draw the scatter plots using the R software. I will attach the codes along with output here.
Two-way scatter plot for “incubation time” against the “bacterial count”.
> x<-c(20,40,60,90,120,180,240,300,360,420)
> y<-c(47,62,73,103,220,537,1580,4500,9200,12800)
>
> plot(x,y,main="Scatterplot 1",xlab="Incubation
Time",ylab="Bacterial Count",pch=16)
Here, x indicates Incubation Time and y indicates Bacterial Count.
Two-way scatter plot for “incubation time” against log of the “bacterial count”.
> z<-log(y);z
[1] 3.850148 4.127134 4.290459 4.634729 5.393628 6.285998
7.365180
[8] 8.411833 9.126959 9.457200
> plot(x,z,main="Scatterplot 2",xlab="Incubation Time",ylab="log
of Bacterial Count",pch=16)
Here, z indicates log of Bacterial Count.
The scatterplot 2, i.e., the scatter plot between "Incubation Time" and "log of Bacterial Count" shows linear relationship better than the scatter plot 1 as the points lie almost on a straight line in the scatter plot 2.
2.
We choose the variables as x - incubation time and z - log of bacterial count for further computations as it will give us better results.
Pearson correlation coefficient between X and Z is given by,
,
Where Cov (X,Z) is the sample covariance given by
Here, n – sample size = 10
Where and
– sample standard deviation of X = 141.1107
– sample standard deviation of Z =
Cor(X,Y) = =
Therefore, the Pearson correlation coefficient, r = 0.9946
3.
We test the hypothesis,
H0 (NULL HYPOTHESIS): The population correlation coefficient is equal to zero, versus
H1 (ALTERNATIVE HYPOTHESIS): The population correlation coefficient is significantly different from zero, i.e. there exists a linear relationship between the two variables “incubation time” and “log of bacterial count”.
Let level of significance, α =0.05.
Test statistic:
Under H0, the test statistic is
Where, r-sample correlation coefficient = 0.9946 (already calculated in question no.2)
n - sample size= 10
Hence,
Critical value:
The critical value is
At 5% level of significance,
Critical value approach:
Since, , we reject H0 and conclude that the population correlation coefficient is significantly different from zero and there exists a linear relationship between “Incubation Time” and “log of Bacterial Count”.
P-value approach:
p-value = Prob(t > tcalc) = P(t >27.1062) = 1.848018e-09 I have calculated it using R. I have attached the codes below. > 1-pt(27.1062,8) [1] 1.848018e-09
Since, p-value is very much less than the level of significance, we reject H0 and conclude that the
population correlation coefficient is significantly different from zero.
4.
Let the regression equation be .
The least squares formula for regression coefficients is given by:
and
Where, and
– sample mean of Z = 6.294327 and – sample mean of X = 183
n – sample size.
Here, n=10
Then, and
Hence the estimated regression equation is
When x = incubation time = 15,
But, implies bacterial count,
Therefore,
Hence, the “bacterial count” for “incubation time” of 15 minutes is approximately 42.129.