In: Statistics and Probability
ou have just started at a company and they present you with historical data for the time in minutes that it takes to prepare a solar panel. The company has 3 locations and they give you the following sample data:
Site 1 |
Site 3 |
107.55 |
115.29 |
106.2 |
113.94 |
102.96 |
111.15 |
106.74 |
112.95 |
114.21 |
114.57 |
109.8 |
106.47 |
104.49 |
113.04 |
122.85 |
115.38 |
107.55 |
113.49 |
112.23 |
108.54 |
109.53 |
110.97 |
99.18 |
123.66 |
107.1 |
110.43 |
111.33 |
114.3 |
108.45 |
110.97 |
110.25 |
110.61 |
112.41 |
102.87 |
109.71 |
109.17 |
108.63 |
114.39 |
110.16 |
114.84 |
106.74 |
114.93 |
108.45 |
117.54 |
100.44 |
107.37 |
108.54 |
109.98 |
104.58 |
111.42 |
111.15 |
109.53 |
113.76 |
102.24 |
106.65 |
111.15 |
104.4 |
118.71 |
109.89 |
111.24 |
111.15 |
117.36 |
104.04 |
114.3 |
112.32 |
110.79 |
107.37 |
104.31 |
101.79 |
109.71 |
105.12 |
111.51 |
110.7 |
109.53 |
104.22 |
118.44 |
107.37 |
112.32 |
107.28 |
111.51 |
They are interested in characterizing the data at these three locations. The QC manager stated that their goal is 120 min or less because then they could (ideally) make 4 per 8 hour day.
For finding the required values and graphs,i used R studio plat form to analyse the data.
###MEAN$ S_D###
mean1=mean(site1)
mean1
mean2=mean(site2)
mean2
SD1=sd(site1)
SD1
SD2=sd(site3)
SD2
output:
> mean1=mean(site1)
> mean1
[1] 108.1912
> mean2=mean(site3)
> mean2
[1] 112.023
> SD1=sd(site1)
> SD1
[1] 4.206597
>
###Checking_normally_distributed_or_not####
to check our data is normally distributed or not we use normal plot in R and shapiro test.
shapiro.test(site1)
> shapiro.test(site1)
Shapiro-Wilk normality test
data: site1
W = 0.94949, p-value = 0.07283
Here,we can observe that the p-value(significance)value is
greater than 0.05 ,so its is normally distributed.
shapiro.test(site3)
> shapiro.test(site3)
Shapiro-Wilk normality test
data: site3
W = 0.97067, p-value = 0.3778
Here,we can observe that the p-value(significance)value is
less than 0.05,so its is not normally distributed.
Q-Q plots:
Q-Q plot for site1
here,we can clearly observe that all points are nearly follow's to the line.So it is following normal distribution.
here,for site3,the points are scattered far from the normal line.So its not following the normal distribution.
2)HISTOGRAM AND BOXPLOT:
A histogram is a graphical representation of a frequency distribution for numeric data. It is a bar chart that is often used as the first step to determine the probability distribution of a data set or a sample. It allows to visually and quickly assess the shape of the distribution, the central tendency, the amount of variation in the data, and the presence of gaps, outliers or unusual data points.
Here,the histogram is for site1,it clearly describes that maximum number of solar panels are fixed at 105-110 time intervels of site1. and also we can notice it follows normal distribution very well.
here,for site3,the maximum number of solar panels are fixed at 110-115 time intervals.
BOXPLOT:
Boxplots are primarily used when comparing several distributions against each other. They summarize key statistics from the data and display them in a box-and-whiskers format. They provide a quick way for examining the variation present in the data. A wider range boxplot indicates more variability. Boxplots are also used to check if there is a significant difference in the process after implementing a process improvement initiative
here,we can observe our boxplot is not too much wider.which means the variablility is very less for site1.and the middle line describes median.And we can see that there is one outlier ,wich means only some cases the solar panels are fixing greater than the average time or very less than the average time
here,three datapoints are falling beyond the whisker's,they are called as outliers.it means there are 3 cases fixing the panels more than the usual time or very less than the usual time