Question

In: Computer Science

Explain Sampling Terminology, Methods of Selecting Random Samples, Introduction to Estimation. Explain Sampling Distribution, Confidence Interval...

Explain Sampling Terminology, Methods of Selecting Random Samples, Introduction to Estimation.

Explain Sampling Distribution, Confidence Interval for a Mean, Total, Proportion, Standard Deviation, Difference between Means, Sample size selection.

Expert Solution

1)

Definition: Random sampling is a part of the sampling technique in which each sample has an equal probability of being chosen. A sample chosen randomly is meant to be an unbiased representation of the total population. If for some reasons, the sample does not represent the population, the variation is called a sampling error.

Description:

Random sampling is one of the simplest forms of collecting data from the total population. Under random sampling, each member of the subset carries an equal opportunity of being chosen as a part of the sampling process.
For example, the total workforce in organisations is 300 and to conduct a survey, a sample group of 30 employees is selected to do the survey. In this case, the population is the total number of employees in the company and the sample group of 30 employees is the sample.
Each member of the workforce has an equal opportunity of being chosen because all the employees which were chosen to be part of the survey were selected randomly. But, there is always a possibility that the group or the sample does not represent the population as a whole, in that case, any random variation is termed as a sampling error.
An unbiased random sample is important for drawing conclusions. For example when we took out the sample of 30 employees from the total population of 300 employees, there is always a possibility that a researcher might end up picking over 25 men even if the population consists of 200 men and 100 women.
Hence, some variations when drawing results can come up, which is known as a sampling error. One of the disadvantages of random sampling is the fact that it requires a complete list of population.
For example, if a company wants to carry out a survey and intends to deploy random sampling, in that case, there should be total number of employees and there is a possibility that all the employees are spread across different regions which make the process of survey little difficult.

2)

Sampling Distributions:

Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample. The probability distribution of this statistic is called a sampling distribution. And the standard deviation of this statistic is called the standard error.

Variability of a Sampling Distribution

The variability of a sampling distribution is measured by its variance or its standard deviation. The variability of a sampling distribution depends on three factors:

N: The number of observations in the population.
n: The number of observations in the sample.
The way that the random sample is chosen.

If the population size is much larger than the sample size, then the sampling distribution has roughly the same standard error, whether we sample with or without replacement. On the other hand, if the sample represents a significant fraction (say, 1/20) of the population size, the standard error will be meaningfully smaller, when we sample without replacement.

Sampling Distribution of the Mean

Suppose we draw all possible samples of size n from a population of size N. Suppose further that we compute a mean score for each sample. In this way, we create a sampling distribution of the mean.
We know the following about the sampling distribution of the mean. The mean of the sampling distribution (μ_x) is equal to the mean of the population (μ). And the standard error of the sampling distribution (σ_x) is determined by the standard deviation of the population (σ), the population size (N), and the sample size (n). These relationships are shown in the equations below:

μ_x = μ

σ_x = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]

In the standard error formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard error formula can be approximated by:

σ_x = σ / sqrt(n).

You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.

Sampling Distribution of the Proportion

In a population of size N, suppose that the probability of the occurrence of an event (dubbed a "success") is P; and the probability of the event's non-occurrence (dubbed a "failure") is Q. From this population, suppose that we draw all possible samples of size n. And finally, within each sample, suppose that we determine the proportion of successes pand failures q. In this way, we create a sampling distribution of the proportion.
We find that the mean of the sampling distribution of the proportion (μ_p) is equal to the probability of success in the population (P). And the standard error of the sampling distribution (σ_p) is determined by the standard deviation of the population (σ), the population size, and the sample size. These relationships are shown in the equations below:

μ_p = P

σ_p = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]

σ_p = sqrt[ PQ/n ] * sqrt[ (N - n ) / (N - 1) ]

where σ = sqrt[ PQ ].

Like the formula for the standard error of the mean, the formula for the standard error of the proportion uses the finite population correction, sqrt[ (N - n ) / (N - 1) ]. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard error formula can be approximated by:

σ_p = sqrt[ PQ/n ]

You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.

Central Limit Theorem

The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.

How large is "large enough"? The answer depends on two factors.

Requirements for accuracy. The more closely the sampling distribution needs to resemble a normal distribution, the more sample points will be required.
The shape of the underlying population. The more closely the original population resembles a normal distribution, the fewer sample points will be required.

In practice, some statisticians say that a sample size of 30 is large enough when the population distribution is roughly bell-shaped. Others recommend a sample size of at least 40. But if the original population is distinctly not normal (e.g., is badly skewed, has multiple peaks, and/or has outliers), researchers like the sample size to be even larger.

How to Choose Between T-Distribution and Normal Distribution

The t distribution and the normal distribution can both be used with statistics that have a bell-shaped distribution. This suggests that we might use either the t-distribution or the normal distribution to analyze sampling distributions. Which should we choose?

Guidelines exist to help you make that choice. Some focus on the population standard deviation.

If the population standard deviation is known, use the normal distribution
If the population standard deviation is unknown, use the t-distribution.

Other guidelines focus on sample size.

If the sample size is large, use the normal distribution. (See the discussion above in the section on the Central Limit Theorem to understand what is meant by a "large" sample.)
If the sample size is small, use the t-distribution.

In practice, researchers employ a mix of the above guidelines. On this site, we use the normal distribution when the population standard deviation is known and the sample size is large. We might use either distribution when standard deviation is unknown and the sample size is very large. We use the t-distribution when the sample size is small, unless the underlying distribution is not normal. The t distribution should not be used with small samples from populations that are not approximately normal.

Test Your Understanding

In this section, we offer two examples that illustrate how sampling distributions are used to solve commom statistical problems. In each of these problems, the population standard deviation is known; and the sample size is large. So you can use the Normal Distribution Calculator, rather than the t-Distribution Calculator, to compute probabilities for these problems.

venereology answered 1 year ago

What does a sampling distribution have to do with a confidence interval?

(Topic: Confidence Interval Estimation for the Mean of a Normal Distribution: Population Variance Known) The number...

(Topic: Confidence Interval Estimation for the Mean of a Normal Distribution: Population Variance Known) The number of bolts produced each hour from a particular machine is normally distributed with a standard deviation of 7.4. For a random sample of 15 hours, the average number of bolts produced was 587.3. Find the upper and lower confidence limits of a 98% confidence interval for the population mean number of bolts produced per hour. Find the answer by hand calculation. Find the answer...

Indicate whether the following research sampling methods are convenience samples or random samples: I select 150...

Indicate whether the following research sampling methods are convenience samples or random samples: I select 150 students from my local university for a study on stress among college students. I randomly assign 25 persons from my place of work to a control group and 25 to an experimental group for a study on workplace satisfaction. Using a random number generator, I select a minimum sample size study population of male U.S. citizens with a passport from a passport database obtained...

Differentiate Confidence Interval as an estimation method from Interval as a level of measurement in descriptive...

Differentiate Confidence Interval as an estimation method from Interval as a level of measurement in descriptive statistics. Give an example of each Confidence Interval and Interval.

You are constructing a 90% confidence interval for the difference of means from simple random samples...

You are constructing a 90% confidence interval for the difference of means from simple random samples from two independent populations. The sample sizes are = 6 and = 14. You draw dot plots of the samples to check the normality condition for two-sample t-procedures. Which of the following descriptions of those dot plots would suggest that it is safe to use t-procedures? I. The dot plot of sample 1 is roughly symmetric, while the dot plot of sample 2 is...

What is the relationship among sampling distributions, the Central Limit Theorem and interval estimation?

Criminology Statistics Chapter 5: The Sampling Distribution and Estimation Procedures (2) What is sampling error? How...

Criminology Statistics Chapter 5: The Sampling Distribution and Estimation Procedures (2) What is sampling error? How do we approximate it in our analyses – that is, what do we calculate, and what elements go into that calculation? What can researchers do to reduce sampling error?

For these 4 methods of sampling - 1. Simple Random Sampling Method 2. Judgement Sampling Method...

For these 4 methods of sampling - 1. Simple Random Sampling Method 2. Judgement Sampling Method 3. Systematic Sampling Method 4. Stratified Sampling Method Please provide the following for each method: a. Provide 1-3 sentences to describe using your own words b. Identify whether it is a random or non-random sample c. Identify a merit of the sampling approach d. Identify a limitation of the sampling approach

1. What assumptions must be validated to use t-distribution methods for a single sample mean (confidence interval...

1. What assumptions must be validated to use t-distribution methods for a single sample mean (confidence interval or hypothesis test)? 2. What assumptions must be validated to use t-distribution methods for two independent sample means (confidence interval or hypothesis test)?

PROJECT 4 Estimation of the Population Mean of Soft Plaque Deposit (Confidence Interval of the Mean)....

PROJECT 4 Estimation of the Population Mean of Soft Plaque Deposit (Confidence Interval of the Mean). Estimation of the Population Proportion of Soft Plaque Deposit (Confidence Interval of the Proportion). This project uses the sample data of the experiment Atassi (A-1), shown here. Assume the variable, soft plaque deposit index, is approximately normally distributed. In a study of the oral home care practice and reasons for seeking dental care among individuals on renal dialysis, Atassi (A-1) studied 90 subjects on...