In: Statistics and Probability
Instructions:
Turn in:
1. Typed paragraph with your name and answers to the following questions: What is the quantitative variable you studied and the range of its values? What is your population and what is the size of your sample? Name the type of sampling you used and explain how you conducted the sampling. Name the type of survey you used and explain how you conducted your study. What question did you ask in your survey or how did you conduct your observations? (For your typed paragraph, do not write a list of questions and answers. The paragraph needs to be written in an essay format. For example: "In this survey, the population was xxxx, and the sample size was xxxx..." You will be graded on providing appropriate answers to each question and the clarity, quality, explanation of each answer).
2. Include a sheet with your name, list of raw data, 5-number summary, boxplot, mean, and standard deviation. Show all work to find the numbers in the 5-number summary.
My survey results were as follows:
How much money do you spend weekly on groceries?
$25 or less: 5 people
$50 or less: 19 people
$100 or less: 3
$150 or less:3
Quantitative data are measures of values or counts and are expressed as numbers . Some examples of quantitative data are your height, your weight , your shoe size, and the length of your fingernails , your exam score etc.
Suppose we choose weight of group of boys in city A between age 18 - 20 years .
We wish to check weight of Boys of age 18 - 20 years and check how many boys are health , overweight or underweight .
" In this survey, the population was of 200 boys in city A , and the sample size was 30 . "
So a simple random sample of size 30 is drawn from population of 200 boys between age 18 - 20 years from city A .
We need to estimate weight of boys for age 18 - 20 in city using sample of size 30.
It is conducted by using data of weights of boys (in Kg) by taking simple random samples of size 30.
Among 200 boys 30 Boys are selected randomly and survey is conducted .
{Note - since cant conduct survey here , so we need to generate some data for our survey . Here you can use statistical table table to generate random data or you can use software like R , excel ect to generate random number and to get samples from that numbers . }
I will use R - Software to generate random data and to collect random samples from generated data.
R-code
{
# to generate data we use runif() so that data is generated uniformly between 40kg to 110 kg >G_Data=runif(200,40,110) # generated data for survey
#Let us round off our data to zero decimals
>Population=round(G_Data,0)
# Will be our data for weight of all
boys
> Population # can be used as our raw data
[1] 91 89 45 75 109 80 96 82 82 44 98 87 48 58 60 97 106
102
[19] 72 61 68 57 70 84 51 82 44 47 110 65 101 102 101 100 45
64
[37] 46 67 53 44 91 99 76 55 63 49 57 80 95 76 78 95 42 94
[55] 110 102 52 55 44 43 82 78 61 89 104 65 47 66 59 42 77 69
[73] 48 88 62 66 76 67 99 49 105 74 83 44 93 91 83 108 108
106
[91] 81 72 72 91 76 88 63 46 69 70 62 88 75 73 51 98 48 50
[109] 88 95 91 103 55 45 47 108 92 108 75 97 59 84 52 99 46
103
[127] 47 76 71 56 72 60 91 76 95 47 66 70 69 46 57 108 88 96
[145] 70 110 72 93 87 100 86 98 69 96 67 70 87 71 42 88 96 53
[163] 64 105 65 100 98 54 82 87 61 76 71 51 100 50 62 49 94
87
[181] 89 66 48 62 93 104 49 44 92 109 81 75 107 67 93 66 81
86
[199] 103 87
#Now we calculate range of our data as to verify it is greater than 10
> max(Population)
# highest observation
[1] 110
> min(Population)
# lowest observations
[1] 42
# note : Range is given by highest observation - lowest observations
>Range=max(Population) - min(Population)
> Range
[1] 68
}
Thus our range is greater than 10 , Our variable " Population " created will act as population of our data
Now we will take a sample survey using simple random sampling of size 30 .
From R
{
# sample("data",n ) is used from drawing sample of size n in R
{
> spl=sample(Population , 30) # to draw 30 samples from population
>
spl
# samples of size 30
[1] 47 91 68 44 63 42 87 48 89 70 105 100 110 101 65 50 89 69
44
[20] 76 87 91 91 106 70 75 58 76 74 97
}
So these is our samples of Weights (Kg) of boys used for or survey .
Sr No. | Weight | Sr No. | Weight | Sr No. | Weight |
1 | 47 | 11 | 105 | 21 | 87 |
2 | 91 | 12 | 100 | 22 | 91 |
3 | 68 | 13 | 110 | 23 | 91 |
4 | 44 | 14 | 101 | 24 | 106 |
5 | 63 | 15 | 65 | 25 | 70 |
6 | 42 | 16 | 50 | 26 | 75 |
7 | 87 | 17 | 89 | 27 | 58 |
8 | 48 | 18 | 69 | 28 | 76 |
9 | 89 | 19 | 44 | 29 | 74 |
10 | 70 | 20 | 76 | 30 | 97 |
To compute 5-number summary, box-plot, mean, and standard deviation
i)
A 5-number summary consists of five values: the most extreme values in the data set (the maximum and minimum values), the lower and upper quartiles, and the median.
From data given above we will sort it in ascending order ( n = 30 even )
42 44 44 47 48 50 58 63 65 68 69 70 70 74 75
76 76 87 87 89 89 91 91 91 97 100 101 105 106 110
From sorted data we can see
Extreme values - maximum values = 110
minimum values = 42
Since it is even data
Median = { ( n/2) th obs + (n/2 + 1 )th obs } /2 = { ( 15 ) th obs + (16 )th obs } /2
= { 75 + 76 } / 2
= 75.5
Lower quartiles = Median of data below Median value i.e median of values less than 75.5 ( n = 15 odd )
data less than 75.5 :- 42 44 44 47 48 50 58 63 65 68 69 70 70 74 75
Lower Quartiles = ( 15 +1 ) /2 thobs = 8 thobs = 63
Lower Quartiles = 63
Upper quartiles = Median of data above Median value i.e median of values greater than 75.5 ( n = 15 odd )
data greater than 75.5 :- 76 76 87 87 89 89 91 91 91 97 100 101 105 106 110
Upper quartiles = ( 15 +1 ) /2 thobs = 8 thobs = 91
Upper quartiles = 91
5-number summary is
Minimum | Lower Quartile | Median | Upper Quartile | Maximum |
42 | 63 | 75.5 | 91 | 110 |
ii)
To draw box plot of samples
{Note we be drawn manually , or using software directly as follow }
{
>SR_No=1:30
>barplot(spl,col=2,names.arg=SR_No,xlab="Sample_No",ylab="Weight")
# to draw histogram
>hist(spl,col=2,xlab="Weight")
Note this Bar-Plot and Histogram can be drawn manually .
iii) Mean
mean = ; n =30
where xi is samples drawn above
mean = ( 47 + 91 + 68 + 44 + 63 + .....+ 75 + 58 + 76 + 74 + 97) /30
= 2283 /30
= 76.1
Thus Mean = 76.1
iv) Standard deviation
Standard deviation =
var(x) = ; where n = 30 and = 76.1
= { ( 47 - 76.1 )2 + ( 91 - 76.1 )2 + ( 68 - 76.1 )2 + ( 44 - 76.1 )2 + ....... + ( 74 - 76.1 )2 + ( 97- 76.1 )2 } / ( 30 - 1 )
= 12102.7 / 29
var(x) = 417.3345
Standard deviation = = = 20.42877
Standard deviation = 20.42877
My survey results for samples were as follows :
Weight of boy in city A of age between 18 -20 years
50 Kg or less : 5 Boys ( underweight )
Between 55 kg and 80 Kg : 11 Boys ( Average or can be consider as healthy )
90 Kg or More : 9 Boys ( Overweight )
The Mean estimate of Weight is 76.1 Kg ,
5-number summary is
Minimum | Lower Quartile | Median | Upper Quartile | Maximum |
42 | 63 | 75.5 | 91 | 110 |