In: Statistics and Probability
Q2d13A
After successfully “pitching” his project to management, Angus MacDonald has started to collect the data needed to build his model. Angus first order of business is to collect data on shipping times for steel shipment between Nanticoke, ON and Halifax, NS. To estimate this time, Angus has obtained transit times from two shipping firms (A & B) that currently ship oil between refineries in Nanticoke and Dartmouth, NS. (Dartmouth is located across the harbor from Halifax). Each firm reported its shipping time in days for its last 10 deliveries:
A B
10 | 8 |
8 | 9 |
10 | 7 |
2 | 18 |
11 | 8 |
8 | 12 |
6 | 4 |
11 | 5 |
11 | 6 |
9 | 11 |
a. Assuming transit times between the two ports to be normally distributed, determine if the 4th data entry for Firm A (2 days) can be considered to be an anomaly.
b. Considering only shipping Firm B, how many data samples will Angus need to collect to ensure that he can obtain an estimates for transit time that is within ±1 day 19 times out of 20?
c. Based on the results from this pilot study, which of the two firms should be selected as to deliver steel to Halifax? State any assumption that you make and justify your results.
d. Assume Angus collect 100 data points from Firm B only and obtains the following histogram:
Shipping Time |
Count |
0 – 4.999 |
10 |
5 – 9.999 |
40 |
10 – 14.999 |
30 |
15 – 19.999 |
8 |
20+ | 12 |
Angus hypothesizes that the data comes from a Normal (8, 4) distribution. Apply an appropriate statistical test to determine if Angus assumption is correct.
e. Comment on Angus assumption that the transit time data is normally distributed. Strictly considering what the data represents (transit times), why might we be surprised if the data is normally distributed? Again, based only on what the data represents, what other input distributions should be considered for this data? Finally, if the data was shown not to be normally distributed, why might a t-test be inappropriate for comparing the transit times for Firm A and Firm B? If a t-test is not appropriate, what statistical test (or tests) could or should we apply? Please note, there are no calculations required to complete this question.
NOTE: This is Industrial Engineering question in statistics...
The variable A and B is the delivery time of shipping oil between refineries in Nanticoke and Dartmouth.
This is continuous data so we assume it is normally distributed.
a) In variable firm A the 4th data entry is like an anomaly.
i.e. it is outliers of data or extreme value and in the continuous normal variable we remove outlier using mean of data,
Mean = mean(A)
= 8.6 ~ 9
so 4th entry in firm A would be 9.
b) In this study, the data took 10 sample from firm B.
c) to select the which firm is good or bad for transporting we want to use t-test here.
data is continuous and we compare the average performance between two firms so t-test is useful here.
Test of Hypothesis:
H0: the average shipping time of firm A is less than firm B
against,
H1: the average shipping time of firm A is greater than firm B
t-test by using R:
> t.test(A,B,alternative = "greater",var.equal = FALSE,conf.level = 0.95)
data: A and B
t = -0.12734,
df = 16.058,
p-value = 0.5499
alternative hypothesis: true difference in means is greater than 0
# Decision Rule: If p-value greater than 0.05 level of
significance then we accept the null hypothesis.
# From above results p-value(0.5499) > 0.05
# we accept the null hypothesis here
i.e The average shipping time of firm A is less than firm B
i.e. The firm A is better than B
d) the frequency distribution table for this sample is,
Xi | fi | mi | fi*mi |
0 – 4.999 | 10 | 2.4995 | 24.995 |
5 – 9.999 | 40 | 7.4995 | 299.98 |
10 – 14.999 | 30 | 12.4995 | 374.985 |
15 – 19.999 | 8 | 17.4995 | 139.996 |
20+ | 12 | 20 | 240 |
Total | 100 | 59.998 | 1079.956 |
The mean of data is,
Mean = sum(fi * mi) / sum(fi)
= 1079.956 / 100
= 10.79956
but we have normal distribution N(8,4)
i.e. mean of normal distribution 8 does not matches to sample distribution 10.79956
i.e. sample not from N(8,4)
e) To identify the distribution of the variable follow the normal distribution or not use the Shapiro test,
H0: The sample distribution follows the normal distribution.
against,
H1: The sample distribution does not follow the normal distribution.
Shapiro Test using R:
> shapiro.test(A)
Shapiro-Wilk normality test
data: A
W = 0.90161, p-value = 0.2282
> shapiro.test(B)
Shapiro-Wilk normality test
data: B
W = 0.90757, p-value = 0.2647
# Decision Rule: If p-value greater than 0.05 level of
significance then we accept the null hypothesis.
Both p-values > 0.05
i.e. Accept the null hypothesis.
From the above decision Rule, both variables follow the normal distribution.
e) For identifying the difference in average effect between two variable or for comparison of two variable use t-test.