In: Statistics and Probability
You are a frequent flyer and you have been wondering whether the number of flights delayed is influenced by how busy the airport is that you are flying into. You have collected data for the number of delayed flights and the total number of flights on the airport at your hometown on 20 consecutive days and the data are presented below:
Day | Total Number of Flight | Number of Flights Delayed |
1 | 490 | 42 |
2 | 494 | 47 |
3 | 510 | 60 |
4 | 515 | 62 |
5 | 480 | 48 |
6 | 518 | 56 |
7 | 498 | 48 |
8 | 540 | 59 |
9 | 505 | 41 |
10 | 515 | 49 |
11 | 497 | 48 |
12 | 532 | 51 |
13 | 503 | 48 |
14 | 509 | 55 |
15 | 514 | 58 |
16 | 481 | 36 |
17 | 495 | 41 |
18 | 505 | 51 |
19 | 478 | 40 |
20 | 510 | 49 |
Use the data above, answer the following questions.
3. Predict the number of flight delayed if total flights is 485.
4. What is the value of the t test statistic when testing whether the number of flights delayed is influenced by the total number of flight?
5. What is the p-value of the t test statistic when testing whether the number of flight delayed is influenced by the total number of flight?
6. G. True or False: The null hypothesis for testing whether the number of flights delayed is influenced by the total number of flight cannot be rejected if a 5% probability of committing a type I error is desired. Explain WHY.
10. Should the Durbin-Watson statistic be calculated? Explain.
12. Construct a 95 confidence interval for the mean number of flights delayed when the total number of flights is 500. Interpret the confidence interval.
13. Construct a 95% prediction interval for the number of flights delayed when the total number of flights is 500. Interpret the prediction interval.
3. The number of flight can be delayed is 43 when total number of flights is 485
predict(flightmodel,data.frame(TotalFlightX=485))
1
43.19568
4.The value of the t-test statistic is 4.373 when testing whether the number of flights delayed is influenced by the total number of flight.
5. The p-value of the t -test statistic(4.373) is 0.000367 when testing whether the number of flight delayed is influenced by the total number of flight
6. FALSE
The null hypothesis for testing whether the number of flights delayed is influenced by the total number of flight cannot be rejected if a 5% probability of committing a type I error is desired.
Because our main goal is to minimize the type 1 error and maximize power. And if the p-value is less than alpha then we reject the null hypothesis otherwise do not reject null hypothesis.
10. Durbin Watson D test used to check autocorrelation in the data. We can use this test to check whether the error terms are correlated with each other or not. It's all depends on us whether we want to study the error terms or not.
8. 95% confidence interval for the mean number of flights delayed is (45.50 , 50.55) when the total number of flights is 500 .
predict(flightmodel,data.frame(TotalFlightX=500),interval =
"confidence",level = 0.95)
fit lwr upr
1 48.01906 45.49178 50.54635
Interpretation : we are 95% confident that when the total number of flights is 500 the mean number of flights delayed will lie between (45.50 , 50.55)
13a9 5% prediction interval for the number of flights delayed is (36.85 , 59.19) when the total number of flights is 500
predict(flightmodel,data.frame(TotalFlightX=500),interval =
"prediction",level = 0.95)
fit lwr upr
1 48.01906 36.85307 59.18506
Interpretation: There is a 95% probability that the number of flights delayed will lie between (36.85 , 59.19) when the total number of flights is 500.
library(readxl)
> flightdata <- read_excel("flightdata.xlsx")
> View(flightdata)
> attach(flightdata)
> flightmodel <- lm(FlightsDelayedY~TotalFlightX)
summary(flightmodel)
Call:
lm(formula = FlightsDelayedY ~ TotalFlightX)
Residuals:
Min 1Q Median 3Q Max
-8.6269 -3.0632 -0.1604 2.6664 9.1576
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -112.76025 37.10995 -3.039 0.007067 **
TotalFlightX 0.32156 0.07353 4.373 0.000367 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.177 on 18 degrees of freedom
Multiple R-squared: 0.5151, Adjusted R-squared:
0.4882
F-statistic: 19.12 on 1 and 18 DF, p-value: 0.0003666