In: Statistics and Probability
Two teams in a laboratory are using different methods of performing biological tests. Success rate of those tests depends on the size of the sample material used for the test. The data is below. In the second column you have success rate for a given size of a sample. First 10 observations describe success rates of the first team, while another 10 observations describe success rates of the second team.
Size | Success | Team 2 | Team 2*Size |
1 | 0.219438 | 0 | 0 |
2 | 0.332883 | 0 | 0 |
3 | 0.304965 | 0 | 0 |
4 | 0.406052 | 0 | 0 |
5 | 0.414167 | 0 | 0 |
6 | 0.482654 | 0 | 0 |
7 | 0.595429 | 0 | 0 |
8 | 0.550863 | 0 | 0 |
9 | 0.663525 | 0 | 0 |
10 | 0.712554 | 0 | 0 |
1 | 0.284065 | 1 | 1 |
2 | 0.246359 | 1 | 2 |
3 | 0.219077 | 1 | 3 |
4 | 0.172007 | 1 | 4 |
5 | 0.165857 | 1 | 5 |
6 | 0.096087 | 1 | 6 |
7 | 0.082457 | 1 | 7 |
8 | 0.057341 | 1 | 8 |
9 | 0.019099 | 1 | 9 |
10 | 0.021757 | 1 | 10 |
I used R software to solve this question.
R codes and output:
> d=read.table('data.csv',header=T,sep=',')
> head(d)
Size Success Team.2 Team.2.Size
1 1 0.219438 0 0
2 2 0.332883 0 0
3 3 0.304965 0 0
4 4 0.406052 0 0
5 5 0.414167 0 0
6 6 0.482654 0 0
> attach(d)
The following objects are masked from d (pos = 3):
Size, Success, Team.2, Team.2.Size
> fit_1=lm(Success[which(Team.2==0)]~Size[which(Team.2==0)])
> summary(fit_1)
Call:
lm(formula = Success[which(Team.2 == 0)] ~ Size[which(Team.2 ==
0)])
Residuals:
Min 1Q Median 3Q Max
-0.047976 -0.024417 -0.001235 0.015226 0.048825
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.180965 0.023685 7.641 6.07e-05 ***
Size[which(Team.2 == 0)] 0.052234 0.003817 13.684 7.84e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03467 on 8 degrees of freedom
Multiple R-squared: 0.959, Adjusted R-squared: 0.9539
F-statistic: 187.3 on 1 and 8 DF, p-value: 7.836e-07
> plot(Size[which(Team.2==0)],Success[which(Team.2==0)])
> abline(fit_1)
> fit_2=lm(Success[which(Team.2==1)]~Size[which(Team.2==1)])
> summary(fit_2)
Call:
lm(formula = Success[which(Team.2 == 1)] ~ Size[which(Team.2 ==
1)])
Residuals:
Min 1Q Median 3Q Max
-0.0248730 -0.0087686 -0.0000112 0.0078027 0.0244016
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.306367 0.010182 30.09 1.61e-09 ***
Size[which(Team.2 == 1)] -0.030901 0.001641 -18.83 6.53e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0149 on 8 degrees of freedom
Multiple R-squared: 0.9779, Adjusted R-squared: 0.9752
F-statistic: 354.6 on 1 and 8 DF, p-value: 6.534e-08
> plot(Size[which(Team.2==1)],Success[which(Team.2==1)])
> abline(fit_2)
Que.a
Can you please tell, which variables are used as independent variable for estimated regression equation.
Que.b
Regression equation for first team:
Success = 0.180965 + 0.052234 size
Que.c
Regression equation for second team:
Success = 0.306367 - 0.030901 size
Que.d
Scatter plot for first team:
Scatter plot for second team: