Question

In: Statistics and Probability

In this topic, you'll do a calculation in RStudio to demonstrate the difference between using the...

In this topic, you'll do a calculation in RStudio to demonstrate the difference between using the paired and unpaired methods to analyze a difference of means. As an example data set, we'll use the county data set from the openintro package. This contains observations of several variables for each county in the USA, but we'll focus on two: pop2000, the county's population in the 2000 census, and pop2010, the population in the 2010 census.

Our goal is to calculate a 95% confidence interval for the difference between log10(pop2000) and log10(pop2010).
In R, log10 is the base-10 logarithm (I'll explain at the end of the prompt why we look at logs instead of raw numbers).
Follow the following procedure:

Use sample() to select 250 rows at random from the county data set. Then, using your sample of 250 rows, calculate a 95% confidence interval for the difference in two ways:

  1. by treating the two censuses as paired samples, similar to the procedure covered in section 5.2.(paired data covered in 5.2)
  2. by treating the two censuses as independent samples, using the procedure covered in section 5.3.(difference of two means covered in 5.3)

Note: you may use the same degrees of freedom, 249, for each procedure, instead of using the complicated formula from section 5.3 for the independent sample. At this sample size, the difference will be negligible.

Include in your post:
1. the point estimate you got in each procedure
2. the standard error you got in each procedure
3. the confidence interval you got in each procedure

Comment briefly on the difference you see. In each case, are you able to conclude with 95% confidence that the population increased from 2000 to 2010 (i.e. that the difference is greater than zero)?

Why we use logs here: Populations tend to grow or shrink by percentages, so looking at absolute differences would not give a very good picture of the changes. In particular, the same percentage growth in a county of 30,000 looks very different in a county of 3 million. Taking logs converts multiplication to addition, which ensures we are looking at relative differences and puts large and small counties on the same scale

Solutions

Expert Solution

Hypothesis of interest: H0: population mean of log10(pop2010) - population mean of log10(pop2000) = 0

vs. H1: population mean of log10(pop2010) - population mean of log10(pop2000) > 0

1. For the paired case, the sample estimate = mean of the log difference = 0.01877433

c the sample estimates are,

mean of log10(pop2010) = 4.450429 and mean of log10(pop2010) = 4.431655

2. For the paired case, the standard error for the mean of the log difference is approximately 0.0032.

  For the independent case, the standard error for the mean- log-differences is approximately 0.0518.

3. For the paired case, the 95% confidence interval is [0.01241094 0.02513773].

For the independent case, the 95% confidence interval is [-0.08292507 0.12047374].

4. For the paired case, we can conclude with 95% confidence that the population increased from 2000 to 2010 as p-value is less than 0.05.

  For the independent case, we conclude that with 95% confidence the population did not increase from 2000 to 2010 as p-value is greater than 0.05.


Related Solutions

Explain and demonstrate the difference between a static and flexible budget
Explain and demonstrate the difference between a static and flexible budget
Can precipitation using a water softener determine water hardness? Demonstrate with a sample calculation using 15...
Can precipitation using a water softener determine water hardness? Demonstrate with a sample calculation using 15 mL of hard water and .101g of postassium carbonate as a softener.
~~~~~~~~~~~~TO BE COMPLETED USING RSTUDIO~~~~~~~~~~~~~~ ~~~~~~~~~~~~(Please display all RCode used)~~~~~~~~~~~~~~ Regression Is there a relationship between...
~~~~~~~~~~~~TO BE COMPLETED USING RSTUDIO~~~~~~~~~~~~~~ ~~~~~~~~~~~~(Please display all RCode used)~~~~~~~~~~~~~~ Regression Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 60 buildings reported in the World Almanac. You will use the data set to decide whether height (in feet) can be predicted from the number of stories. data from buildings.txt. (Note that this is a text file, so use the appropriate instruction. If you...
~~~~~~~~~~~~TO BE COMPLETED USING RSTUDIO~~~~~~~~~~~~~~ ~~~~~~~~~~~~(Please display all RCode used)~~~~~~~~~~~~~~ Regression Is there a relationship between...
~~~~~~~~~~~~TO BE COMPLETED USING RSTUDIO~~~~~~~~~~~~~~ ~~~~~~~~~~~~(Please display all RCode used)~~~~~~~~~~~~~~ Regression Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 60 buildings reported in the World Almanac. You will use the data set to decide whether height (in feet) can be predicted from the number of stories. data from buildings.txt. (Note that this is a text file, so use the appropriate instruction. If you...
Topic 1: What is the difference between social and economic stratification? Give examples. OR Topic 2...
Topic 1: What is the difference between social and economic stratification? Give examples. OR Topic 2 How would you explain the causes social or economic stratification? Give examples you have experienced personally of social or economic stratification.
What is the difference between standard deviation and average deviation? Which calculation is more useful for...
What is the difference between standard deviation and average deviation? Which calculation is more useful for determining if a data point can be omitted from a data set?
**Using RStudio**Please show code** (airquality) What is the relationship between temperature and ozone levels in New...
**Using RStudio**Please show code** (airquality) What is the relationship between temperature and ozone levels in New York and how does the month influence this? Make a plot that would illustrate this relationship (hint: make sure you change Month into a factor, you don't need to include a line of best fit for this plot because there are so many categories in month).
Our discussion topic concerns the calculation of stock values using the Capital Asset Pricing Model (CAPM)....
Our discussion topic concerns the calculation of stock values using the Capital Asset Pricing Model (CAPM). Explain the CAPM model.Choose the firms J&j and Pepsi Co and discuss whether the betas are what you would expect. Be sure to explain why or why not. Calculate the returns based on the CAPM model. Be sure to state your assumptions. The Weighted Average Cost of Capital (WACC) for a firm can be calculated or found through research. Select two firms in the...
Discuss the difference between job order costing and process cost systems.  Use examples to demonstrate the...
Discuss the difference between job order costing and process cost systems.  Use examples to demonstrate the differences.
QUESTION Demonstrate your knowledge of the difference between Economic and Business (Accounting) profit by showing relevant...
QUESTION Demonstrate your knowledge of the difference between Economic and Business (Accounting) profit by showing relevant examples. You may use your own examples or cite examples from other sources. What are the advantages of computing economic profit? Select your own firm or any firm that you are familiar with. Discuss the factors which have an influence on shareholders’ wealth. Identify managerial actions taken by the firm which are design to influence shareholder wealth.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT