In: Statistics and Probability
Reading
Savneet brows furrowed as she examined the Excel file that Noor had emailed her. She let out a sigh as she started to think how this revelation might affect the bottom line of Rent-o-Rama, her company’s budget rental car service. Noor had unintentionally noticed a troubling trend. Now it was up to Savneet to see if there was any validity to the alleged pattern. With car services like Uber showing up, people were renting cars for short trips less and less. What Noor noticed was that lately when someone did rent a car, they were going greater distances than before, say to Banff rather than to Airdrie.
Noor had originally brought this to Savneet’s attention because she thought that it was good news. The more kilometers on the car, the more the customers paid, which meant more revenue. But Savneet knew that though this was true, there were other less obvious consequences. Greater mileage meant more frequent repairs on the rental cars and lower resale value after the car was a year old. Unfortunately, Savneet was not convinced that the increased revenue from customers would be enough to offset these additional costs and loss of revenue.
But before Savneet would go to her supervisor with these concerns, she needed to make sure that mileage on the cars was in fact, on average, going up. This is why she got Noor to send her this Excel file. It was a random sample of 30 car rentals from each month in 2016 (see the Excel file). Savneet felt that it was a good representation of what the population of mileage of all car rentals from 2016.
In addition to the data provided by Noor from 2016, Savneet also had a more recent random sample of the mileage of 20 car rentals from last month. This sample had a mean of 184.1 km with a standard deviation of 75.2 km.
As she stared at her screen, a plan formed in her head. For her analysis, Savneet would first use the data from Noor as her parent sample. Then she would create a sampling distribution from the parent sample. Once she had her sampling distribution, she would determine the probability she would get a sample mean of 184.1 km (from her recent sample), under the assumption that the average mileage of car rentals had not changed. This probability would help her determine if something had changed or not.
Answer the following questions to help Savneet do her analysis.
NOTE: There are two samples described above: a sample from 2016 and a recent sample.
Question
Savneet has a sample provided by Noor containing 30 car rentals from each month in 2016 i.e., this sample describe the car rentals of 2016. We can do Exploratory Data Analysis (calculate mean, standard deviation, variance, etc. and plot histogram, etc.) to understand the distribution of the 30 data points.
Savneet also has the summary statistics, Mean of mileage = 184.1 km and standard deviation of mileage= 75.2 km for a recent sample of 20 car rentals.
Now, to check if there is any average increase in mileage, Savneet is assuming that average mileage for car rentals has not changed i.e., the mean and standard deviation of a sample of 30 car rentals should be equal to the mean and standard deviation of the recent sample,contaning 20 car rentals.
Null hypothesis:
Ho : The average mileage of car rentals is 184.1 km and standard deviation is 75.2 km
H1 : The average mileage of car rentals is not 184.1 km and standard deviation is not 75.2 km
Also, now to validate the hypothesis testing for 30 car rentals sample data.
We will do so by calculating Mean, standard deviation and plotting histogram of various sample means from 30 points sample. This is called Sampling Distribution of sample Mean.
Procedure:
1. First, we will draw 10 samples each of size 10 data points from 30 points sample with replacement. Here, n = 10 will be able to exhaust all possible data points in sample with low bias and size = 10 will be able to explain good variability in the data.
2. We calculate means of these 10 samples, 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 and 10.
3. Now these 10 mean points become another sample and we work on finding their distribution by calculation mean, standard deviation and plotting histogram of these 10 mean points.
This is called the sampling distribution of sample means.
Now, Savneet can find and check if, mean =
and we can calculate standard deviation for these 10 points.
Then, Savneet will compare this mean and standard deviation with 184.1 km and 75.2 km to see and increase in average mileage.
This procedure is more reliable, less bias and has less variability. Therefore, becomes ideal for the use.