In: Statistics and Probability
Consider the following sample:
120, 94, 88, 67, 82, 106, 140, 102, 87, 99, 106, 86, 105, 93
a) Calculate the sample mean and the sample standard
deviation
b) Calculate the sample range. What does it
mean?
c) What is the mode of the data
distribution?
d) Construct a box plot and interpret the
result.
e) Identify the 45th percentile and interpret the
result.
f) If the two largest data from above data
distribution is removed, then what will be its impact on the result
that you have obtained in (a)?
> # storing the sample data in variable x
> x <- c(120,94,88,67,82,106,140,102,87,99,106,86,105,93) > #a) Calculating the sample mean and sample standard deviation of the data > mean(x) [1] 98.21429 > sd(x) [1] 17.68171 > #b) calculating the sample range > range(x) [1] 67 140 > max(x)-min(x) [1] 73
#sample range indicating the spread of maximum and minimum value, here the maximum value is 140 and minimum value is 67
# c) calculating the mode of the distribution
> # Create the function. > getmode <- function(v) { + uniqv <- unique(v) + uniqv[which.max(tabulate(match(v, uniqv)))] + } > print(getmode(x)) [1] 106 # hence, the mode is 106 > #d) constructing the boxplot > boxplot(x,main = "boxplot of x")
# from the boxplot we can say that the data is symmetrically
distributed,
#with one outlier whose value is 140.
> #e) computing the 45th percentil > quantile(x,probs = 0.45) 45% 93.85 # the 45th percentile is 93.85, this means that 45% of data lies below 93.85 > # removing two largest data from x > #removing the first largest value > x1 <-x[-which.max(x)] > #removing the second largest value > x2 <- x1[-which.max(x1)] > print (x2) [1] 94 88 67 82 106 102 87 99 106 86 105 93 > mean(x2) [1] 92.91667 > sd(x) [1] 17.68171
# after removing the two maximum value mean of new data is
decrease while
# standard deviation remain same.