In: Math
You started taking the bus to work. The local transit authority says that a bus should arrive at your bus stop every five minutes. After a while, you notice you spend a lot more than five minutes waiting for the bus, so you start to keep a record.
You spend the next two months recording how long it takes for the bus to arrive to the bus stop. This give a total of sixty observations that denote the number of minutes it took for the bus to arrive (rounded to the nearest minute). These observations are hosted at
https://mattbutner.github.io/data/bus_stop_time.csv
Load these data into R as a data frame titled bus_stop_time
Create a histogram of the time_until_bus varaible. Would you say that five minutes is a reasonable guess for the average arrival time based on this picture alone?
Create 95% confidence interval for the bus arrival times using the Z distribution. Does 5 minutes fall within the 95% confidence interval?
How would you communicate your finding to the local transit authority?
Please see the complete R code below alsong with the results
The confidenc einterval is given as
mean +- z*sd/sqrt(n), where z = 1.96 from the z table , n is number of samples
bus_stop_time <- read.csv("https://mattbutner.github.io/data/bus_stop_time.csv",header=TRUE)
# histogram
hist(bus_stop_time$time_until_bus,col="skyblue")
abline(v = mean(bus_stop_time$time_until_bus),
col = "red",
lwd = 2)
## 95% confidence interval
## upper limit
mean(bus_stop_time$time_until_bus) +
1.96*sd(bus_stop_time$time_until_bus)/sqrt(nrow(bus_stop_time))
## lowerlimit
mean(bus_stop_time$time_until_bus) -
1.96*sd(bus_stop_time$time_until_bus)/sqrt(nrow(bus_stop_time))
The results are
> ## upper limit
> mean(bus_stop_time$time_until_bus) +
1.96*sd(bus_stop_time$time_until_bus)/sqrt(nrow(bus_stop_time))
[1] 6.31626
>
>
> ## lowerlimit
> mean(bus_stop_time$time_until_bus) -
1.96*sd(bus_stop_time$time_until_bus)/sqrt(nrow(bus_stop_time))
[1] 5.28374
Based on the histogram , it is reasonable to assume 5 minutes as the average arrival time as the mean line is close to 5
we are 95% confident that the true value of the average arrival time would lie in the interval 5.28 and 6.31
Hope this helps . Please rate