In: Statistics and Probability
You have been contracted by the City of Sydney Council to inspect an issue that has been reported by residents living in a western suburb. The residents claim that one of the roads in their suburb has lately had issues with traffic speeding. The City of Sydney Council wants you to investigate if traffic speeding is an issue and, if this is the case, they will invest in building asphalt speed bumps to enforce reduced traffic speeds on that particular road. In this problem you may assume that the speed of a vehicle follows a normal distribution. (a.) The reason the residents complained is because they expect that the mean speed is at most 25km per hour, especially considering that there is a kindergarten nearby. The council thinks that this is a reasonable figure. Write down the null and alternative hypothesis that you would test to carry out the task the City of Sydney Council has assigned you. (b.) Given your hypothesis in (a.), can you explain in words the two different errors that may result from your investigation? Which of these errors do you think is worse to commit in this case? Explain. Take-home exam, Part B Page 2 out of 3 (c.) Before collecting data, you want to ensure that you will take a sample size large enough to conduct a reliable hypothesis test. Given that there is a kindergarten nearby, you would like your test to have a 0.99 probability to conclude that traffic speeding is an issue if the mean speed is 30km per hour. Moreover, if traffic speeding is not an issue, you want a probability of 0.05 to falsely conclude that speeding is an issue. Based on previous observations of speeds on similar roads in Sydney, you may assume a known standard deviation of 5 km per hour for the speeds. How many observations should be included in your sample to meet these requirements? Carefully state the Type I and II error probabilities implied from the question. (d.) You obtain the following n = 16 measurements of car speeds (km/h): 35.13, 25.37, 27.15, 18.39, 31.56, 26.63, 19.73, 23.29, 35.73, 29.73, 37.55, 29.67, 25.63, 19.04, 31.38, 25.76. For these data, Pn i=1 xi = 441.74 and Pn i=1(xi − x¯) 2 = 512.5844. Compute the sample mean and sample variance from the information given. Test your hypothesis in (a.) using the significance level implied from your answer in (c.). What is your advice to the City of Sydney Council? Carefully state your assumptions, however, note that you may not assume that the variance is known, but you may still assume that the speeds of vehicles follow a normal distribution. (e.) During a meeting to present your results to the council committee, you are asked to provide an interval for the true mean of the traffic speed. Provide such an interval and interpret it in words for the committee members. Keep in mind that they do not know statistical terminology, so you have to express yourself using non-technical terms. Can you use this interval to test the hypothesis in (a.)? Explain why/why not. (f.) One of the committee members points out that the real danger for the residents is if a single vehicle speeds, and not that the average speed is large. Provide the committee members with an interval that can be used to study the speed of a future single vehicle.
(a)
We have to test for null hypothesis
against the alternative hypothesis
(b)
There are two types of errors namely (i) type I error and (ii) type II error. These are defined as follows.
Between these two, type I error is worse to commit. If it is committed, we would conclude that average speed exceeds permissible level (25 km per hour) and Sydney Council would invest in building asphalt speed bumps to enforce reduced traffic speeds on that particular road. This produces two types of drawbacks. Firstly, Sydney Council invested in a project which was not at all required. Secondly, after completion of making asphalt speed bumps to enforce reduced traffic speeds on that particular road, speed of traffic decreases further which was not at all required.
(c)
For the given case, our critical
region is given by
.
Also,
and
Suppose, sample size be n.
We have,
[Using R-code 'qnorm(1-0.05)']
[Using R-code 'qnorm(0.01)']
Comparing (i) and (ii) we get,
Hence, at least 16 should be included in our sample to meet given requirements.
(d)
Suppose, random variable X denotes speed (in km per hour) of a traffic.
We have sample values. But we do not know population standard deviation (or variance). So, we have to perform one sample t-test.
We have to test for null hypothesis
against the alternative hypothesis
Our test statistic is given by
Here,
Sample size
Sample mean is given by
Sample standard deviation is given by
Degrees of freedom
[Using R-code '1-pt(1.785071,15)']
Level of significance
We reject our null hypothesis if
Here, we observe that
So, we reject our null hypothesis.
Hence, based on the given data we can conclude that there is significant evidence that average traffic speed is significantly higher than 25 km per hour.
Sample mean is 27.60875 km per hour.
Sample variance is 34.17229 km per hour.
Our advice to the City of Sydney Council is that average traffic speed is higher than 25 km per hour and so building asphalt speed bumps to enforce reduced traffic speeds on that particular road is required to be done.
(e)
We know,
[Using R-code
'qt(1-(1-0.95)/2,15)']
Hence, 95% confidence interval of average speed (in km per hour) is given by (24.49379, 30.72371).
(f)
We can use 3 standard deviation theory for this. We know that approximately 99.73% of the data values lie within 3 standard deviation about mean.
Thus we can consider cut off speed (in km per hour) as 25+3*5.845707 = 42.53712. Thus we can detect speed of a single vehicle as extreme if its speed is more than 42.53712 km per hour.