In: Statistics and Probability
suspects that when the temperature outside is higher more failures occur. data for 18 days and the temperatures in C
temp |
failure |
21.9 |
13 |
22.8 |
12 |
17.4 |
16 |
21.9 |
17 |
22.7 |
15 |
19.8 |
16 |
19.5 |
17 |
18.1 |
15 |
21.8 |
16 |
22.5 |
16 |
18.6 |
15 |
21.7 |
16 |
20.5 |
17 |
19.3 |
16 |
19.0 |
16 |
21.1 |
14 |
20.2 |
17 |
18.5 |
14 |
Is there evidence that more computer failures occur during hotter weather?
1. Null hypothesis ?
2. linear regression using R and interpret
1. Notice that here we want to test whether there is evidence that more computer failures occur during hotter weather. Note that the null hypothesis is always a hypothesis of no difference and what researcher wants to test or researcher's claim is alternative hypothesis. So
Null hypothesis Ho is
Ho: There is no relation between computer failures and temperature.
Note that the we want to test whether there is a significant correlation between temperature and computer failures. Hence the null hypothesis is as stated above.
2. Linear regression using R:
Observe that in this case it is believed that computer failures depends on temperature so
Independent variable(X) : Temperature recorded on a day.
Dependent Variable(Y) : Number of computer failures on a day.
We will use R software to fit the regression equation with above stated variables.
R software commands
>x=c(21.9,22.8,17.4,21.9,22.7,19.8,19.5,18.1,21.8,22.5,18.6,21.7,20.5,19.3,19,21.1,20.2,18.5)
>y=c(13,12,16,17,15,16,17,15,16,16,15,16,17,16,16,14,17,14)
>model=lm(y~x)
>summary(model)
We have created data vectors x and y. We used lm(y~x) function to fit the regression model. summary() function is used to get the summary statistics.
R software output
Observe the above output. Regression equation is
= 19.2865 - 0.1883 x
In words,
Computer failures= 19.2865-0.1883(temperature).
Observe above output. Multiple R-squared measures the percentage of variation in the dependent variable explained by independent variable and it is very helpful in determining whether the fitted regression model is good. Multiple R-squared is 0.05108 which is interpreted as 5.108% of variation in the dependent variable is explained by independent variable. Since only 5.108% of variation is explained which implies the fit is not good.
If you have any doubt, please do comment.
This answers your questions. If you understood, please rate positively.