In: Statistics and Probability
Develop your own questionnaire/survey to collect data on a two variables of interest. One variable of interest should be used to estimate a population mean (quantitative) and the other should be used to estimate a proportion (qualitative). I have to have 50 data values.
Topic 2: Population proportion - For your second variable of interest (qualitative): ● Present the data you used ○ Use appropriate displays (graphs) of the data. (bar graph and pie chart) ○ Describe the data in appropriate terms of descriptive measures - mean and standard deviation (standard error) of the proportion ■ [you will need to choose one answer to be the point of reference. For example, if you are going to survey on political parties then you could choose to find the mean and standard deviation of the proportion of the population that are republicans] in the CONTEXT of the problem. ● Construct a 99% confidence interval for a population proportion.
Solution :
Develop your own questionnaire/survey to collect data on a two variables of interest. One variable of interest should be used to estimate a population mean (quantitative) and the other should be used to estimate a proportion (qualitative).
Let us assume that we are coolecting the data of students in a classroom about their score and gender The questionnaire can be as follows,
1. Whats your name
2. gender
male or female (qualitative)
3. score in maths
numeric value (quantitative)
Now according to this questionaire we have to collect the data of 50 students.
Now, Here one have to collect the data according to the quitionaire as expected.
but for sake of question we can generate the data accordingly.
Assuming that the data of score follow normal distrinution since n= 50
We generate data of 50 students from normal distribution
> score=rnorm(50,50,9) # let the random sample is drawn
> score
[1] 39.02586 59.69658 49.71413 60.36263 52.30635 61.64889 71.16087
26.04375
[9] 54.71296 45.20572 45.18979 34.93852 53.05488 55.00140 54.56795
57.39267
[17] 48.48184 48.07627 41.08169 44.35217 60.32940 65.33978 53.85484
61.01796
[25] 55.02830 53.14677 53.27067 44.96156 66.07723 46.74030 30.23382
51.60648
[33] 41.64649 41.96700 55.94541 53.17542 55.77690 65.69723 49.78949
50.05801
[41] 53.33922 42.54690 28.97595 55.36800 54.66576 56.08236 35.03855
41.72340
[49] 48.84126 45.42642
mean of scores = sum of scores /50
>mean(score) # mean of scores
[1] 50.39372
> v=(var(score)*49)/50 # R gives varience with n-1 , hence we
have to adjust it to give varience with n
> s=sqrt(v) # standard deviation
> v
[1] 91.6347
> s
[1] 9.572602
Hence mean = 50.39372 Since we have taken random sample from N(50,9) the obtained mean 50 and sd =9 are same .
> hist(score) # to plot the scores we use histogram as
shown.
From the above histogram we can say that there are 12 student who have scored 50 to 55
The lowest score of students is less than 30 and about 2 students have scored less than 30 .The highest score is above 70 and only one student is above 70.
Now,
Topic 2: Population proportion - For your second variable of interest (qualitative):
We have taken the gender as the qualitative variable
we have again genearted a sample of 50 to create the data of gender
> gender=runif(50,0,1) # sample of 50 is taken from
U(0,1)
> gen=c()
> gender
[1] 0.049079956 0.375843633 0.076831215 0.102490610 0.481586385
0.623256587
[7] 0.227201329 0.140191786 0.905821839 0.732805556 0.118363147
0.773484795
[13] 0.626877714 0.313379633 0.980451874 0.850402601 0.302026239
0.319837980
[19] 0.671885089 0.787472117 0.947938807 0.421051805 0.431068553
0.395859919
[25] 0.514821038 0.025617093 0.515868011 0.586757038 0.824868963
0.364214285
[31] 0.743149746 0.007172561 0.398547864 0.481372865 0.798040321
0.264406967
[37] 0.337308415 0.310703768 0.367571265 0.480288736 0.460250761
0.701797157
[43] 0.253137683 0.372070091 0.133290598 0.572173288 0.925766191
0.977265383
[49] 0.887308048 0.025376214
# we have to get data as male or female
Hence we classify as
if there is observation less than 0.5, we consider it as male and if greater than 0.5 we consider it as female
This is only to create data.
> for( i in 1:50){
+ if (gender[i]<=0.5){gen[i]="male"}
+ else {gen[i]="female"}
+ }
> gen
[1] "male" "male" "male" "male" "male" "female" "male" "male"
[9] "female" "female" "male" "female" "female" "male" "female"
"female"
[17] "male" "male" "female" "female" "female" "male" "male"
"male"
[25] "female" "male" "female" "female" "female" "male" "female"
"male"
[33] "male" "male" "female" "male" "male" "male" "male"
"male"
[41] "male" "female" "male" "male" "male" "female" "female"
"female"
[49] "female" "male"
Hence we got the data
> data=data.frame(number,gen,score) # we create data frame
> data
number gen score
1 1 male 39.02586
2 2 male 59.69658
3 3 male 49.71413
4 4 male 60.36263
5 5 male 52.30635
6 6 female 61.64889
7 7 male 71.16087
8 8 male 26.04375
9 9 female 54.71296
10 10 female 45.20572
11 11 male 45.18979
12 12 female 34.93852
13 13 female 53.05488
14 14 male 55.00140
15 15 female 54.56795
16 16 female 57.39267
17 17 male 48.48184
18 18 male 48.07627
19 19 female 41.08169
20 20 female 44.35217
21 21 female 60.32940
22 22 male 65.33978
23 23 male 53.85484
24 24 male 61.01796
25 25 female 55.02830
26 26 male 53.14677
27 27 female 53.27067
28 28 female 44.96156
29 29 female 66.07723
30 30 male 46.74030
31 31 female 30.23382
32 32 male 51.60648
33 33 male 41.64649
34 34 male 41.96700
35 35 female 55.94541
36 36 male 53.17542
37 37 male 55.77690
38 38 male 65.69723
39 39 male 49.78949
40 40 male 50.05801
41 41 male 53.33922
42 42 female 42.54690
43 43 male 28.97595
44 44 male 55.36800
45 45 male 54.66576
46 46 female 56.08236
47 47 female 35.03855
48 48 female 41.72340
49 49 female 48.84126
50 50 male 45.42642
Here is the data of 50 students
Now , we have to get the proportion
> male=length(which(gen=="male")) # counts the males obs in
data
> male
[1] 29
> female=length(which(gen=="female")) # counts the female obs in
data
> female
[1] 21
There are 29 males and 21 females
● Present the data you used
○ Use appropriate displays (graphs) of the data. (bar graph and pie chart)
Since we have considered the males and females as qualitative measure , we are assuming that males as succsess comparing the binomial distribution.
Now , we have to visualize it .
> x=c(male,female)
> barplot(x,xlab="Gender", ylab= "frequency",
names.arg=c("Male","Female")) # to get barplot
Here in our data set males are more than females
Thus, the proportion of male = number of male / 50
= 29/50
=0.58
the proportion of female = number of female / 50
= 21/50
=0.42
○ Describe the data in appropriate terms of descriptive measures - mean and standard deviation (standard error) of the proportion
Let p= proportion of male =0.58
q= proportion of female = 0.42
Since this follows binomial distribution
mean = n*p
= 0.58*50
=29
var = n*p*q
=50*0.58*0.42
= 12.18
sd = 3.48999
● Construct a 99% confidence interval for a population proportion.
One have to find the proportion from the collected data.
The population proportion of males is given by,
( P- , P +
Lcl = P-
= 29 - 2.58 * 3.4899 / 7.071068
= 27.72662
Ucl = P +
= 29 + 2.58 * 3.4899 / 7.071068
= 30.27338
Thus the 99% confidence interval for a population proportion is (27.72662 ,30.27338)