In: Statistics and Probability
Describe the data
Research wants to know what proportion of students graduate of the manicurist class in 2018 from WBI have a California State Board Manicurist License. The researcher asks the following question:
“Are you have a California State Board manicurist License yes or not?”. The Source of Bias is Sampling Bias because the technique used to obtain the individuals to be a sample tends to favor one part of the population over another.
Column |
Std. dev. |
Mean |
Median |
Range |
Min |
Max |
Q1 |
Q3 |
IQR |
students that have Manicurist License |
0.50573633 |
0.475 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
for each of these values write a sentence explaining what it tells you about results in your survey, what each value tell us specifically about your sample
Consider students who have a license are marked as 1, and those who don't are given a 0 score.
Mean is simply the average of those scores. In this case on average 47.5% of the students have the license.
Standard deviation is the average of sum of squared distance from the sample mean. It gives the spread of dataset. Here the std dev is greater than the mean which means the data is not clustered around the mean of the sample.
Meadian is the middle value of the data set. So, if we line the responses of 0's and 1's, the middle value is 0. Here 52.5% values are 0 so middle value of the dataset is 0.
Range is simply the values from 0-1 so 1. Max is the maximum value the data set could produce, that is if everyone had the license max would be 1, 100% of the class/ sample.
Q1 is the middle value of first half of dataset. We arranged 0's and 1's now the middle value of first half of this arrangement is Q1. It tells that the lower 25% of the dataset is 0.
Q3 is the middle value of next half of dataset. Values between 50-75%. That is the values above the median.
IQR is simply the difference of Q3 and Q1. Together they tell the spread of the dataset. Which in our sample is 1, very high.