Question

In: Math

Using R Studio Now, set the seed to 348 with `set.seed()`. Then take a sample of...

Using R Studio

Now, set the seed to 348 with `set.seed()`. Then take a sample of size 10,000 from a normal distribution with a mean of 82 and a standard deviation of 11.

(a) Using sum() on a logical vector, how many draws are less than 60? Using mean() on a logical vector, what proportion of the total draws is that? How far is your answer from pnorm() in 1.1 above?


```{R}
set.seed(348)
x=rnorm(10000,82,11)
sum(ifelse(x<60,1,0))

mean(ifelse(x<60,1,0))

pnorm(60,82,11)

Using sum() function there are 128 draws that are less than 60 and using the mean() function 0.0281 is the porportion of total draws. From these outputs we can say that the answer is quite close to the pnorm() value that has been calculated.

(b) What proportion of your sample is greater than 110 or less than 54?

(c) Why are your answers close to what you got above? Why are they not exactly the same?

(d) Using ggplot2, make a histogram of your sample. Set y=..density.. inside aes(). Overlay a normal distribution with stat_function(aes(samp), fun=dnorm, args=list(82,11)). Using geom_vline(xintercept=), add dashed vertical lines corresponding to the 2.5th and the 97.5th percentile of the sample

Solutions

Expert Solution

a) Using sum() on a logical vector, how many draws are less than 60? Using mean() on a logical vector, what proportion of the total draws is that? How far is your answer from pnorm() in 1.1 above?

R code with comments

#set the seed
set.seed(348)

#set the sample size
n<-10000
#take a sample of size n from normal(82,11)
x=rnorm(n,82,11)

#a
#number of draws that are less than 60
k<-sum(x<60)
sprintf('The number of draws that are less than 60 is %g',k)
# what proportion of the total draws is that?
prop<-mean(x<60)
sprintf('The proportion of the total draws that are less than 60 is %.4f',prop)
#How far is your answer from pnorm() in 1.1
sprintf('The theoretica value from pnorm() is %.4f', pnorm(60,82,11))

#get this

We can see that the theoretical value from pnorm is close to the sample proportion

Note: x<60 is a logical vector (made of TRUE,FALSE), where as ifelse(x<60,1,0) is a vector of 0,1s and not a logical vector

(b) What proportion of your sample is greater than 110 or less than 54?

R code

#b) proportion of your sample is greater than 110 or less than 54
prop<-mean(x<54 | x>110)
sprintf('Proportion of sample is greater than 110 or less than 54 is %.4f',prop)

# get this

(c) Why are your answers close to what you got above? Why are they not exactly the same?

R code

c)#How far is your answer from pnorm()
a<-pnorm(54,82,11)+(1-pnorm(110,82,11))
sprintf('The theoretical value from pnorm() is %.4f',a )

# get this

We can see that the sample proportion from b) is close to the theoretical proportion. They are close but not the same as the sample in part b is just that a sample which represents the population. Each sample is subjected to a random variation, and hence the proportion we calculated in part b is a sample statistics and the theoretical proportion from pnorm is the population parameter. We use the sample statistics to estimate the population parameter and hence the proportion in part b is close but not the same.

(d) Using ggplot2, make a histogram of your sample. Set y=..density.. inside aes(). Overlay a normal distribution with stat_function(aes(samp), fun=dnorm, args=list(82,11)). Using geom_vline(xintercept=), add dashed vertical lines corresponding to the 2.5th and the 97.5th percentile of the sample

R code

#d)
library(ggplot2)
#make a histogram of your sample
p<-ggplot(data.frame(x), aes(x=x))+
   geom_histogram(aes(y=..density..),binwidth=1)
#Overlay a normal distribution
p<-p+stat_function(aes(x), fun=dnorm, args=list(82,11),color="red")
#2.5th and the 97.5th percentile of the sample
q<-quantile(x,c(0.025,0.975))
p<-p+geom_vline(xintercept=q,color="blue", linetype="dashed")
#add the title
p+labs(title="histogram of the sample")

# get this


Related Solutions

Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of...
Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of the underlying population. If we assume a given population is a standard normal distribution and we want to estimate its mean, which is the better technique to estimate that mean from a sample: Use the mean of one random sample of size 500 Use the mean of 300 random samples of size 10 Run your own experiment and use your results as a supporting...
Using R Studio: A College Algebra course requires students to take an assessment test at the...
Using R Studio: A College Algebra course requires students to take an assessment test at the start of the course and again at the end of the course. The pre and post test scores for ten students are: Student 1 2 3 4 5 6 7 8 9 10 Pre-test score 70 62 63 61 56 52 71 63 64 67 Post-test score 87 71 82 78 57 50 72 65 78 65 Do the assessment test results support the...
R-question Set the random number generation seed to the value 1234. Draw a sample of size...
R-question Set the random number generation seed to the value 1234. Draw a sample of size 11 fromExp(λ = 0.097) and find the mean get-better time in this sample. Repeat this process for a total of 10000 get-better averages and store these values in the variables gbas. Typically, we would make a histogram of the values, but R has a nice function that will draw a smooth representation of the histogram: plot(density(gbas)). Plot this. We now know that the normal...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
<< Using R code >> Set seed number as "12345" every time you generate random numbers....
<< Using R code >> Set seed number as "12345" every time you generate random numbers. For each answer, use # to explain if necessary. 2) Generate a data.frame "D" with 3 variables. The 1st variable "v1" has 50 number of N(5,3^2) (normal with mean 5, standard deviation 3) The 2nd variable "v2" has 50 number of exp(5) (exponential with parameter 5) The 3rd variable "v3" has 50 random characters from lower case alphabets. 2-1) Rename the variable from "v1",...
( In R / R studio ) im not sure how to share my data set,...
( In R / R studio ) im not sure how to share my data set, but below is the title of my data set and the 12 columns of my data set. Please answer as best you can wheather its pseudo code, partial answers, or just a suggestion on how i can in to answer the question. thanks #---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The dataset incovid_sd_20201001.RDatacontains several variables related to infections of covid-19 for eachzip code in San Diego County as of October...
<<Using R code>> Set seed nuumber as 12345" every time you generate random numbers. For each...
<<Using R code>> Set seed nuumber as 12345" every time you generate random numbers. For each anser, use # to explain if necessary. 3. Use data "thusen" in ibrary ISwR" 3-1) Remove missing observations in the data, name this set as thu1, and print the first 6 and last 6 observations. 3-2) Rename a variable "short.velocity" -> "x", "blood.glucose" -> "y". 3-3) Draw a scatter plot for "y" by "x", give title "velocity vs.glucose". Put tick marks of x-axis at...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
Using R-Studio please answer the following questions and show your code. 1. Julie buys a take-out...
Using R-Studio please answer the following questions and show your code. 1. Julie buys a take-out coffee from one of two coffee shops on a random basis: Ultimo Coffee and Joe’s Place. This month, she measured the temperature of each cup immediately after purchase, using a cooking thermometer. Sample data is shown below, temperatures are in Fahrenheit. ultimo =  c(171,161,169,179, 171,166,169,178,171, 165,172,172) joes = c(168,165,172, 151,162,158,157,160, 158,160,158,164) State the null and alternative hypothesis in your own words. What type of statistical...
I want this to be solved using R studio or R software, please. Here is the...
I want this to be solved using R studio or R software, please. Here is the example: The data in stat4_prob5 present the performance of a chemical process as a function of sever controllable process variables. (a) Fit a multiple regression modelrelating CO2product (y) to total solvent (x1) and hydrogen consumption (x2) and report the fitted regression line. (b) Find a point estimatefor the variance term σ2. (c) Construct the ANOVA tableand test for the significance of the regression using...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT