Question

In: Statistics and Probability

Question a The following six sales figures (X) were randomly sampled $28, $34, $40, $44, $52,...

Question

a The following six sales figures (X) were randomly sampled
$28, $34, $40, $44, $52, $54, with ∑X =$252 and ∑X2 = $211,096

i Determine the median and mean. What do these suggest about the distribution of the sales data?

ii What is the standard deviation of the sales data?

iii If a frequency distribution for the sales was constructed, what would be the mid-point and frequency and relative frequency for class “Sales $30 to under $40”?

iv Estimate the 1st quartile for the sales data.

b

You have obtained a data set for 2015, that contains all the work place accidents that occurred at a state government department. The data includes: the gender of worker involved, their education level (high school only, college, university), the cause of the accident (stairs, staplers, other), the cost to the department in dollars, the day of the week, and number of stories in the building in which the accident occurred.

i Identify which variables are categorical and state whether they are nominal or ordinal.

ii Identify which variables are numerical and state whether they are discrete or continuous.

iii If you wished to use the data to predict the average cost to the department per accident for all state government departments in 2015, can the data be used for such a purpose? Why or why not? If so, how? If not, what additional data, or variables, would allow this?

iv If you wished to use the data to predict the chance of an accident occurring for Mondays in 2015 at the state government department, can the data be used for such a purpose? Why or why not? If so, how? If not, what additional data or variables, would allow this?

Expert Solution

a) GIVEN:

The following six sales figures (X) were randomly sampled:
$28, $34, $40, $44, $52, $54, with ∑X =$252 and ∑X2 = $11,096

Number of data points in sample (n) = 6

(i) MEAN:

The arithmetic mean is the sum of all of the data points divided by the number of data points.

MEDIAN:

The median is the middle point in a dataset—half of the data points are smaller than the median and half of the data points are larger.

To find the median:

Arrange the data points from smallest to largest.

$28, $34, $40, $44, $52, $54

If the number of data points is odd, the median is the middle data point in the list.
If the number of data points is even, the median is the average of the two middle data points in the list.

In our problem, the number of datapoints is 6 which is even, thus the median is the average of the two middle data points in the list.

Since the mean and median are equal (42), the distribution of the sales data is symmetric and has zero skewness.

STANDARD DEVIATION:

The formula for standard deviation is,

Mean


28	-14	196
34	-8	64
40	-2	4
44	2	4
52	10	100
54	12	144

The standard deviation is,

(iii) FREQUENCY DISTRIBUTION OF SALES DATA:

CLASS	FREQUENCY (f)	RELATIVE FREQUENCY
20-29	1	1/6 = 0.167
30-39	1	1/6 = 0.167
40-49	2	2/6 = 0.333
50-59	2	2/6 = 0.333

Thus the frequency for class “Sales $30 to under $40” is 1 and its relative frequency is 0.167.

The class midpoint is the lower class limit plus the upper class limit divided by 2.

The lower limit for class “Sales $30 to under $40” is 30 and the upper class limit is 39.

Thus the midpoint for class “Sales $30 to under $40” is .

(iv) FIRST QUARTILE:

The first quartile, denoted by Q₁ , is the median of the lower half of the data set. This means that about 25% of the numbers in the data set lie below Q₁ and about 75% lie above Q₁ .

Since the median of sales data is . The lower half of the data below median is $28, $34, $40. The first quartile is the median of $28, $34, $40.

Since the number of datapoints in lower half of the data is 3 which is odd, the median is the middle data point in the list. Thus the first quartile is .

b) GIVEN:

The data includes: the gender of worker involved, their education level (high school only, college, university), the cause of the accident (stairs, staplers, other), the cost to the department in dollars, the day of the week, and number of stories in the building in which the accident occurred.

(i) CATEGORICAL VARIABLES:

A categorical variable also known as discrete or qualitative variable is one that has two or more categories (values). Categorical variables can be further categorized as either nominal, ordinal or dichotomous.

NOMINAL:

Nominal variables are variables that have two or more categories, but which do not have an intrinsic order.

ORDINAL:

Ordinal variables are variables that have two or more categories just like nominal variables only the categories can also be ordered or ranked.

The gender of worker involved is the categorical nominal variable, since it has two categories (Male and Female) which cannot be ordered.
The cause of the accident is the categorical nominal variable, since it has three categories (Stairs, Staplers, Other) which cannot be ordered.
The day of the week is the categorical nominal variable, since it has seven categories (Sunda) which cannot be ordered.
The education level of worker (high school only, college, university) is the categorical ordinal variable, because it has three categories and this can be ordered as high school, college, and university.

(ii) NUMERICAL VARIABLES:

The values of a numerical variable are numbers. They can be further classified into discrete and continuous variables.

DISCRETE:

A variable whose values are whole numbers (counts) is called discrete. For example, the number of items bought by a customer in a supermarket is discrete.

CONTINUOUS:

A variable that may contain any value within some range is called continuous. For example, the time that the customer spends in the supermarket is continuous.

The cost to the department in dollars is continuous numerical variable because it contains values within some range.
The number of stories in the building in which the accident occurred is discrete numerical variable because the values are whole numbers (counts).

(iii) If you wished to use the data to predict the average cost to the department per accident for all state government departments in 2015, can the data be used for such a purpose?

The data can be used to predict the average cost to the department per accident for all state government departments in 2015 using the given data. We can run linear regression model by using cost to the department in dollars as dependent or response variable and other variables "the gender of worker involved, their education level (high school only, college, university), the cause of the accident (stairs, staplers, other), the day of the week, and number of stories in the building in which the accident occurred" as independent variables to predict the average cost to the department per accident for all state government departments in 2015. There is no need of additional variables or data.

(iv) If you wished to use the data to predict the chance of an accident occurring for Mondays in 2015 at the state government department, can the data be used for such a purpose?

To predict the chance of an accident occurring for Mondays in 2015 at the state government department, we should create a new additional categorical variable whether the accidents occurred in mondays or not with two categories (YES OR NO) which should be used as a dependent variable and remaining variables (the gender of worker involved, their education level (high school only, college, university), the cause of the accident (stairs, staplers, other), the cost to the department in dollars, and number of stories in the building in which the accident occurred.) as independent variables and logistic regression model is used to predict the chance of an accident occurring for Mondays in 2015 at the state government department.

orchestra answered 2 years ago

Records of 40 used passenger cars and 40 used pickup trucks were randomly sampled to investigate...

Records of 40 used passenger cars and 40 used pickup trucks were randomly sampled to investigate whether there was any significant difference in the mean time in years that they were kept by the original owner before being sold. For the sampled cars, the mean was 5.3 years with a standard deviation of 2.2 years. For the sampled pickup trucks, the mean was 7.1 years with a standard deviation of 3.0 years. (Assume that the two samples are independent.) a) Construct and interpret...

20XW 20XX 20XY 20XZ change in EBIT 34 44 23 49 change in sales 25 40...

20XW 20XX 20XY 20XZ change in EBIT 34 44 23 49 change in sales 25 40 40 37 DOL 1.36 1.1 0.6 1.3 Degree of Financial Leverage (DFL) = operating profit/EBT 20XW 20XX 20XY 20XZ operating profit 1810 2604 3201 4767 Earning/income before taxes (EBT) 1716 2402 2899 4333 DFL 1.05 1.08 1.10 1.10 DCL 1.43 1.19 0.63 1.46 Given is the companies DOL, DFL and DCL for 20XW, 20XX, 20XY and 20XZ Discuss the risks of the company.

52 randomly selected students were asked how many siblings were in their family. Let X represent...

52 randomly selected students were asked how many siblings were in their family. Let X represent the number of pairs of siblings in the student's family. The results are as follows: # of Siblings 0 1 2 3 4 5 6 Frequency 10 11 7 4 5 10 5 Round all your answers to 4 decimal places where possible. The mean is: The median is: The sample standard deviation is: The first quartile is: The third quartile is: What percent...

X 28 39 32 37 44 22 40 Y 83 108 97 108 107 74 114...

X 28 39 32 37 44 22 40 Y 83 108 97 108 107 74 114 The standard error of the estimate for the above bivariate data is: Question 3 options: 5.45 5.65 5.85 6.05

1. Following the UK referendum in 2016, 910 randomly sampled UK voters were asked which of...

1. Following the UK referendum in 2016, 910 randomly sampled UK voters were asked which of the following options they agreed most with (i) the UK should negotiate an exit deal with the EU, then have a second referendum, (ii) the UK should negotiate an exit deal with the EU then enact Article 50, or (iii) the UK should enact Article 50 immediately. The results of the survey by political ideology are shown below. Po/itico/ ideology Conservative Labour Liberal Total...

Consider the following discrete probability distribution. x 15 22 34 40 P(X = x) 0.08 0.41...

Consider the following discrete probability distribution. x 15 22 34 40 P(X = x) 0.08 0.41 0.28 0.23 a. Is this a valid probability distribution? Yes, because the probabilities add up to 1. No, because the gaps between x values vary. b. What is the probability that the random variable X is less than 36? (Round your answer to 2 decimal places.) c. What is the probability that the random variable X is between 12 and 27? (Round your answer...

Question:-6 people were randomly selected and their systolic (x) and diastolic (y) blood pressures were measured....

Question:-6 people were randomly selected and their systolic (x) and diastolic (y) blood pressures were measured. Systolic 138 130 135 140 120 125 Diastolic 92 91 100 100 80 90 Σx = 788, Σy = 553, Σx 2 = 103794, Σy 2 = 51245, Σxy = 72876 (Data) (A)If systolic blood pressure increases by one unit, then:- (a) diastolic blood pressure decreases by about 15.5 units b) diastolic blood pressure decreases by about 0.82 units. (c) diastolic blood pressure increases...

Question 12 (1 point) X 28 23 30 48 40 25 26 Y 91 106 112...

Question 12 (1 point) X 28 23 30 48 40 25 26 Y 91 106 112 192 155 130 101 The coefficient of determination for the above bivariate data is: Question 12 options: 0.60 0.70 0.80 0.90

The following values were sampled independently from F(x). 0.86 0.48 4.43 0.87 2.51 Find the empirical...

The following values were sampled independently from F(x). 0.86 0.48 4.43 0.87 2.51 Find the empirical cdf for these data, and add 80% confidence band limits. Summarize the information in a table with columns for the estimated cdf Fˆ(x), the range for x, and the lower and upper band limits.

The following data were randomly drawn from an approximately normal population. 37, 40, 43, 45, 49...

The following data were randomly drawn from an approximately normal population. 37, 40, 43, 45, 49 Based on these data, find a 95% confidence interval for the population standard deviation. Then complete the table below. Carry your intermediate computations to at least three decimal places. Round your answers to at least two decimal places. What is the lower limit of the 95% confidence interval? What is the upper limit of the 95% confidence interval?