In: Statistics and Probability
You have just started at a company and they present you with historical data for the time in minutes that it takes to prepare a solar panel. The company has 3 locations and they give you the following sample data:
Site 1 |
Site 3 |
107.55 |
115.29 |
106.2 |
113.94 |
102.96 |
111.15 |
106.74 |
112.95 |
114.21 |
114.57 |
109.8 |
106.47 |
104.49 |
113.04 |
122.85 |
115.38 |
107.55 |
113.49 |
112.23 |
108.54 |
109.53 |
110.97 |
99.18 |
123.66 |
107.1 |
110.43 |
111.33 |
114.3 |
108.45 |
110.97 |
110.25 |
110.61 |
112.41 |
102.87 |
109.71 |
109.17 |
108.63 |
114.39 |
110.16 |
114.84 |
106.74 |
114.93 |
108.45 |
117.54 |
100.44 |
107.37 |
108.54 |
109.98 |
104.58 |
111.42 |
111.15 |
109.53 |
113.76 |
102.24 |
106.65 |
111.15 |
104.4 |
118.71 |
109.89 |
111.24 |
111.15 |
117.36 |
104.04 |
114.3 |
112.32 |
110.79 |
107.37 |
104.31 |
101.79 |
109.71 |
105.12 |
111.51 |
110.7 |
109.53 |
104.22 |
118.44 |
107.37 |
112.32 |
107.28 |
111.51 |
They are interested in characterizing the data at these three locations. The QC manager stated that their goal is 120 min or less because then they could (ideally) make 4 per 8 hour day.
The dataset given is of site 1 and site 3 so the following analysis is based on those 2 data sets only.
To check the normality assumption of the data we use histograms, qq plots and boxplots of the data in R software. The commands for that are hist(a), qqnorm(a), qqline(a), boxplot(a) where we arrange all the values in site 1 to some set a. Similarly we do it for 3rd site.
For site 1,
By looking at the below histogram of site 1 we can clearly see the normal curve can be superimposed on the below histogram. It looks like a bell Curve. Also from the QQ plot of Site one we see most of the points lie on the QQ line for the site 1 data. From boxplot as well the data is symmetrical with mean and median in the centre. So we can verify that data is following Normality assumption.
-------------------------------------------------------------------------------
For Site 3,
After looking at the below plots we see that the histogram of Site 3 we can clearly validate the normality assumption of site 3 by looking at the histogram as it looks like a Bell curve. Also, In the QQ plot of site 3 observations, most of the points in QQ plot are on the QQ line for the site 3. Also from boxplot we can see that the plot is symmetrical with mean and median in the centre. So it is following the assumption of normality.
(While there is no hard and fast rule of verifying normality assumption from histogram but by observing the graph we can assume.)
For checking the normality assumption further we can also go for Shapiro-Wilk Normality test for both sites.
If we perform shapiro.test() on both data we see that
p-value for site 1= 0.07834 > 0.01
p-value for site 3= 0.3778 > 0.01
as p-values are greater than 0.01 we can say that the data is following the Normality Assumption with 1% level of significance.
-----------------------------------------------------------------------------
Mean time(in mins) to make solar panels at site 1 = 108.1822
Mean time(in mins) to make solar panels at site 3 = 112.023
Standard deviation of site 1 = 4.215293
Standard deviation of site 3 = 4.21784
(all values are computed with R-software)
--------------------------------------------------------------------------------
* For testing the Hypothesis of QC manager that mean time should be less than 120 mins we perform Z-test for both sites where Hypothesis is given as
H0 : The mean time to make solar panel is 120 mins v/s H1 : The mean time is less than 120 mins
So, H0 : mean=120 v/s H1 : mean < 120
we use test statistics,
\
here mean is Mean of data and sd is Standard Deviation of data.
We Reject H0 at 1 % level of Significance if
here Zcritical = 2.33
For Site 1, -Z = 17.7311
For Site 2, -Z = 11.96133
Hence we Reject H0,
So we can say that the mean time required to make solar panels is less than 120 mins
---------------------------------------------------------------------------------------------------