Question

In: Statistics and Probability

Develop your own questionnaire/survey to collect data on a two variables of interest. One variable of...

Develop your own questionnaire/survey to collect data on a two variables of interest. One variable of interest should be used to estimate a population mean (quantitative) and the other should be used to estimate a proportion (qualitative). I have to have 50 data values.

Topic 2: Population proportion - For your second variable of interest (qualitative): ● Present the data you used ○ Use appropriate displays (graphs) of the data. (bar graph and pie chart) ○ Describe the data in appropriate terms of descriptive measures - mean and standard deviation (standard error) of the proportion ■ [you will need to choose one answer to be the point of reference. For example, if you are going to survey on political parties then you could choose to find the mean and standard deviation of the proportion of the population that are republicans] in the CONTEXT of the problem. ● Construct a 99% confidence interval for a population proportion.

Solutions

Expert Solution

Solution :

Develop your own questionnaire/survey to collect data on a two variables of interest. One variable of interest should be used to estimate a population mean (quantitative) and the other should be used to estimate a proportion (qualitative).

Let us assume that we are coolecting the data of students in a classroom about their score and gender The questionnaire can be as follows,

1. Whats your name

2. gender

male or female (qualitative)

3. score in maths

numeric value (quantitative)

Now according to this questionaire we have to collect the data of 50 students.

Now, Here one have to collect the data according to the quitionaire as expected.

but for sake of question we can generate the data accordingly.

Assuming that the data of score follow normal distrinution since n= 50

We generate data of 50 students from normal distribution

> score=rnorm(50,50,9) # let the random sample is drawn
> score
[1] 39.02586 59.69658 49.71413 60.36263 52.30635 61.64889 71.16087 26.04375
[9] 54.71296 45.20572 45.18979 34.93852 53.05488 55.00140 54.56795 57.39267
[17] 48.48184 48.07627 41.08169 44.35217 60.32940 65.33978 53.85484 61.01796
[25] 55.02830 53.14677 53.27067 44.96156 66.07723 46.74030 30.23382 51.60648
[33] 41.64649 41.96700 55.94541 53.17542 55.77690 65.69723 49.78949 50.05801
[41] 53.33922 42.54690 28.97595 55.36800 54.66576 56.08236 35.03855 41.72340
[49] 48.84126 45.42642

mean of scores = sum of scores /50


>mean(score) # mean of scores
[1] 50.39372
> v=(var(score)*49)/50 # R gives varience with n-1 , hence we have to adjust it to give varience with n
> s=sqrt(v) # standard deviation  
> v
[1] 91.6347
> s
[1] 9.572602

Hence mean = 50.39372 Since we have taken random sample from N(50,9) the obtained mean 50 and sd =9 are same .

> hist(score) # to plot the scores we use histogram as shown.

From the above histogram we can say that there are 12 student who have scored 50 to 55

The lowest score of students is less than 30 and about 2 students have scored less than 30 .The highest score is above 70 and only one student is above 70.

Now,

Topic 2: Population proportion - For your second variable of interest (qualitative):

We have taken the gender as the qualitative variable

we have again genearted a sample of 50 to create the data of gender

> gender=runif(50,0,1) # sample of 50 is taken from U(0,1)
> gen=c()
> gender
[1] 0.049079956 0.375843633 0.076831215 0.102490610 0.481586385 0.623256587
[7] 0.227201329 0.140191786 0.905821839 0.732805556 0.118363147 0.773484795
[13] 0.626877714 0.313379633 0.980451874 0.850402601 0.302026239 0.319837980
[19] 0.671885089 0.787472117 0.947938807 0.421051805 0.431068553 0.395859919
[25] 0.514821038 0.025617093 0.515868011 0.586757038 0.824868963 0.364214285
[31] 0.743149746 0.007172561 0.398547864 0.481372865 0.798040321 0.264406967
[37] 0.337308415 0.310703768 0.367571265 0.480288736 0.460250761 0.701797157
[43] 0.253137683 0.372070091 0.133290598 0.572173288 0.925766191 0.977265383
[49] 0.887308048 0.025376214

# we have to get data as male or female

Hence we classify as

if there is observation less than 0.5, we consider it as male and if greater than 0.5 we consider it as female

This is only to create data.
> for( i in 1:50){
+ if (gender[i]<=0.5){gen[i]="male"}
+ else {gen[i]="female"}
+ }
> gen
[1] "male" "male" "male" "male" "male" "female" "male" "male"
[9] "female" "female" "male" "female" "female" "male" "female" "female"
[17] "male" "male" "female" "female" "female" "male" "male" "male"
[25] "female" "male" "female" "female" "female" "male" "female" "male"
[33] "male" "male" "female" "male" "male" "male" "male" "male"
[41] "male" "female" "male" "male" "male" "female" "female" "female"
[49] "female" "male"
Hence we got the data  
> data=data.frame(number,gen,score) # we create data frame
> data
number gen score
1 1 male 39.02586
2 2 male 59.69658
3 3 male 49.71413
4 4 male 60.36263
5 5 male 52.30635
6 6 female 61.64889
7 7 male 71.16087
8 8 male 26.04375
9 9 female 54.71296
10 10 female 45.20572
11 11 male 45.18979
12 12 female 34.93852
13 13 female 53.05488
14 14 male 55.00140
15 15 female 54.56795
16 16 female 57.39267
17 17 male 48.48184
18 18 male 48.07627
19 19 female 41.08169
20 20 female 44.35217
21 21 female 60.32940
22 22 male 65.33978
23 23 male 53.85484
24 24 male 61.01796
25 25 female 55.02830
26 26 male 53.14677
27 27 female 53.27067
28 28 female 44.96156
29 29 female 66.07723
30 30 male 46.74030
31 31 female 30.23382
32 32 male 51.60648
33 33 male 41.64649
34 34 male 41.96700
35 35 female 55.94541
36 36 male 53.17542
37 37 male 55.77690
38 38 male 65.69723
39 39 male 49.78949
40 40 male 50.05801
41 41 male 53.33922
42 42 female 42.54690
43 43 male 28.97595
44 44 male 55.36800
45 45 male 54.66576
46 46 female 56.08236
47 47 female 35.03855
48 48 female 41.72340
49 49 female 48.84126
50 50 male 45.42642
Here is the data of 50 students

Now , we have to get the proportion
> male=length(which(gen=="male")) # counts the males obs in data
> male
[1] 29
> female=length(which(gen=="female")) # counts the female obs in data
> female
[1] 21

There are 29 males and 21 females

● Present the data you used

○ Use appropriate displays (graphs) of the data. (bar graph and pie chart)

Since we have considered the males and females as qualitative measure , we are assuming that males as succsess comparing the binomial distribution.

Now , we have to visualize it .
> x=c(male,female)
> barplot(x,xlab="Gender", ylab= "frequency", names.arg=c("Male","Female")) # to get barplot

Here in our data set males are more than females

Thus, the proportion of male = number of male / 50

= 29/50

=0.58

the proportion of female = number of female / 50

= 21/50

=0.42

○ Describe the data in appropriate terms of descriptive measures - mean and standard deviation (standard error) of the proportion

Let p= proportion of male =0.58

q= proportion  of female = 0.42

Since this follows binomial distribution

mean = n*p

= 0.58*50

=29

var = n*p*q

=50*0.58*0.42

= 12.18

sd = 3.48999

● Construct a 99% confidence interval for a population proportion.

One have to find the proportion from the collected data.

The population proportion of males is given by,

( P-    , P +

Lcl = P-   

= 29 - 2.58 * 3.4899 / 7.071068

= 27.72662

Ucl = P +

= 29 + 2.58 * 3.4899 / 7.071068

= 30.27338

Thus the 99% confidence interval for a population proportion is (27.72662 ,30.27338)


Related Solutions

For or this problem, collect data on any variables of interest (sample size for each group...
For or this problem, collect data on any variables of interest (sample size for each group of the two groups n=>30) and perform a two-sided significance test for comparing two independent population means. You can also simulate your own data. Address the following: a. A brief introductory paragraph describing the problem. b. Set up your framework in a null and alternative hypothesis using symbols and notation as they are presented in the textbook. c. A paragraph describing how you collected...
For this problem, collect data on any variables of interest (sample size for each group of...
For this problem, collect data on any variables of interest (sample size for each group of the two groups n=>30) and perform a two-sided significance test for comparing two independent population means. You can also simulate your own data. Address the following: a. A brief introductory paragraph describing the problem. Remember that you want to think of an experiment where you’re comparing 2 independent groups, such as, for example, “the population mean speed for runners using training method A versus...
Assignment 1 Choose any one variable of interest (e.g., cups of coffee) and collect data from...
Assignment 1 Choose any one variable of interest (e.g., cups of coffee) and collect data from two independent samples (e.g., men vs. women, children vs. adults, college students vs. non-college students, etc.) could make up the data.. of minimum size n=5 each. Complete the following: Indicate whether your variable is continuous or discrete. Indicate which scale of measurement your variable is categorized as (nominal, ordinal, interval, or ratio). Calculate the mean, median, and mode for each sample. Provide a conclusion...
Collect data on one response (dependent or y) variable and two different explanatory (independent or x)...
Collect data on one response (dependent or y) variable and two different explanatory (independent or x) variables. This will require a survey with three questions. For example: To predict a student’s GPA (y), you might collect data on two x variables: SAT score and age. So we would be trying to determine if there was a linear correlation between someone’s SAT score and their GPA, as well as their age and their GPA. (Note: students may not choose GPA as...
Collect data on one response (dependent or y) variable and two different explanatory (independent or x)...
Collect data on one response (dependent or y) variable and two different explanatory (independent or x) variables. This will require a survey with three questions. For example: To predict a student’s GPA (y), you might collect data on two x variables: SAT score and age. So we would be trying to determine if there was a linear correlation between someone’s SAT score and their GPA, as well as their age and their GPA. (Note: students may not choose GPA as...
Hypothesis testing For this problem, collect data on any variables of interest (sample size for each...
Hypothesis testing For this problem, collect data on any variables of interest (sample size for each group of the two groups n=>30) and perform a two-sided significance test for comparing two independent population means. You can also simulate your own data. Address the following: A brief introductory paragraph describing the problem. Set up your framework in a null and alternative hypothesis using symbols and notation as they are presented in the textbook. A paragraph describing how you collected the data...
develop your own original product mix LP problem with two constraints into real variables follow the...
develop your own original product mix LP problem with two constraints into real variables follow the four steps to formulating LP problems explain the meaning of the numbers on the right hand side of each of your constraints also explain the significance of the technological coefficient.
To establish the impact of population of GDP, you collect quarterly data on the two variables...
To establish the impact of population of GDP, you collect quarterly data on the two variables from the period 1980 to 2016 and obtain the following values; i=1nyi2=54163.6 ;       i=1nxi2=59.225 ; i=1nxiyi=-1491.885 ;             i=1nYi=6278; i=1nXi=199.5;    i=1nYi2=3995492;    i=1nXi2=4039.25; i=1nXiYi=111756 Using these values, you estimate the model; GDPi=β0+β1Populationi+Ui Based on the above model, derive the parameters using OLS.          [6 Marks] Using the data above, compute the estimates β0 and β1 and provide and economic interpretation.                                                                      [6 Marks       
Your neighborhood homeowners’ association (HOA) conducted a survey to collect data about people in your neighborhood....
Your neighborhood homeowners’ association (HOA) conducted a survey to collect data about people in your neighborhood. Based on the survey results, the HOA will decide what to spend money on. Here are the results of the survey: Between one-fourth and one-third of the people in your neighborhood are younger than eighteen years of age. Between one-half and two-thirds of the people are over the age of forty. Between one-sixth and one-fifth of the people in your neighborhood golf on Saturdays...
Choose any research problem of your own in which there is one dependent variable and two...
Choose any research problem of your own in which there is one dependent variable and two or more independent variables. From your choice, present a conceptual framework showing the relationship among variables . Argue for the theoretical rationale of the relationships among variables on the conceptual framework presented above . Write a regression model representing the above relationships
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT