Question

In: Statistics and Probability

Create a statistical model to determine whether a particular geographical region can support a high-tech manufacturing...

Create a statistical model to determine whether a particular geographical region can support a high-tech manufacturing firm. Use data from the US Census Bureau (so I do not know what type of data I need to pull,that is why it is not here Like do I use census,ages, etc..need help in identifying the type of data needed. If you could just make up numbers and tell me the type and go from there. You only have to do like three numbers, so I can get an understanding of what it is asking me to do.) Use 3 independent variables. Test the hypothses of whether the region can support a high-tech manufacturing firm. Use descriptive and inferential statistics.

Solutions

Expert Solution

Simulation studies that are carefully designed under realistic survey conditions can be used to evaluate the quality of new statistical methodology for Census Bureau data. Furthermore, new computationally intensive statistical methodology is often beneficial because it can require less strict assumptions, offer more flexibility in sampling or modeling, accommodate complex features in the data, enable valid inference where other methods might fail, etc. Statistical modeling is at the core of the design of realistic simulation studies and the development of computationally intensive statistical methods. Modeling also enables one to efficiently use all available information when producing estimates. Such studies can benefit from software for data processing. Statistical disclosure avoidance methods are also developed and properties studied.

Research Problem:

  • Systematically develop an environment for simulating complex surveys that can by used as a test-bed for new data analysis methods.
  • Develop flexible model-based estimation methods for survey data.
  • Develop new methods for statistical disclosure control that simultaneously protect confidential data from disclosure while enabling valid inferences to be drawn on relevant population parameters.
  • Investigate the bootstrap for analyzing data from complex sample surveys.
  • Develop models for the analysis of measurement errors in Demographic sample surveys (e.g., Current Population Survey or the Survey of Income and Program Participation).
  • Identify and develop statistical models (e.g., loglinear models, mixture models, and mixed-effects models) to characterize relationships between variables measured in censuses, sample surveys, and administrative records.
  • Investigate noise multiplication for statistical disclosure control.

Potential Applications:

  • Simulating data collection operations using Monte Carlo techniques can help the Census Bureau make more efficient changes.
  • Use noise multiplication or synthetic data as an alternative to top coding for statistical disclosure control in publicly released data. Both noise multiplication and synthetic data have the potential to preserve more information in the released data over top coding.
  • Rigorous statistical disclosure control methods allow for the release of new microdata products.
  • Using an environment for simulating complex surveys, statistical properties of new methods for missing data imputation, model-based estimation, small area estimation, etc. can be evaluated.
  • Model-based estimation procedures enable efficient use of auxiliary information (for example, Economic Census information in business surveys), and can be applied in situations where variables are highly skewed and sample sizes are not sufficiently large to justify normal approximations. These methods may also be applicable to analyze data arising from a mechanism other than random sampling.
  • Variance estimates and confidence intervals in complex surveys can be obtained via the bootstrap.
  • Modeling approaches with administrative records can help enhance the information obtained from various sample surveys.

Accomplishments (October 2016 - September 2017):

  • Developed new methodology that uses the principle of sufficiency to create synthetic data whose distribution is identical to the distribution of the original data under the normal linear regression model.
  • Further developed and refined several data visualization methods for comparing populations and determining if there is a statistically significant difference between pairs of population parameters; applied the methodology to American Community Survey data.
  • Developed finite sample methodology for drawing inference based on multiply imputed synthetic data under the multiple linear regression model.
  • Evaluated bootstrap confidence intervals for unknown population ranks using simulation and proposed new uncertainty measures for estimated ranks using bootstrap.
  • Applied small area estimation methodology to compute state and county level estimates based on the Tobacco Use Supplement to the Current Population Survey.
  • Developed an interactive application using R Shiny to visualize high dimensional synthetic data and associated metrics.
  • Further developed methodology for modeling response propensity using data from the National Crime Victimization Survey Field Representatives.
  • Refined, expanded, and further developed a realistic artificial population that can now be used to simulate Monthly Wholesale Trade Survey data for a period representative of over four years.

Short-Term Activities (FY 2018):

  • Continue developing finite sample methodology for drawing inference based on multiply imputed synthetic data and extend to multivariate models.
  • Evaluate properties of bootstrap-based uncertainty measures for unknown population ranks.
  • Evaluate properties of synthetic data when data generating, imputation, and analysis models differ under multivariate models.
  • Use the constructed artificial population to implement simulation studies to evaluate properties of model-based estimation procedures for the Monthly Wholesale Trade Survey and other similar surveys.
  • Develop and refine visualizations for synthetic data in higher dimensions.
  • Implement model selection and diagnostics for a small area model applied to the Tobacco Use Supplement of the Current Population Survey.
  • Develop methodology for drawing inference based on singly imputed synthetic data.

Longer-Term Activities (beyond FY 2018):

  • Develop methodology for analyzing singly and multiply imputed synthetic data under various realistic scenarios.
  • Develop noise infusion methods for statistical disclosure control.
  • Study ways of quantifying the privacy protection/data utility tradeoff in statistical disclosure control.
  • Develop and study bootstrap methods for sample survey data.
  • Create an environment for simulating complex aspects of economic/demographic surveys.
  • evelop bootstrap and/or other methodology for quantifying uncertainty in statistical rankings, and refine visualizations.

Related Solutions

Use three independent variables to test whether a region can support a high-tech manufacturing firm. Use...
Use three independent variables to test whether a region can support a high-tech manufacturing firm. Use descriptive and inferential statistics.
what is the least squares model and what statistical support do you have that this model...
what is the least squares model and what statistical support do you have that this model has any value?
What three independent variables could I use for geographical statistical model? Please give examples of how...
What three independent variables could I use for geographical statistical model? Please give examples of how to use...PLEASE NOTE>>>I NEED 3 EXAMPLES>>such as POPULATION< INCOME>>ETC>>BECAUSE I AM NOT SURE....AND GIVE EXAMPLES STEP BY STEP HOW TO USE!!!
What three independent variables could I use for geographical statistical model? Please give examples of how...
What three independent variables could I use for geographical statistical model? Please give examples of how to use.
Determine whether the results appear to have statistical? significance, and also determine whether the results appear...
Determine whether the results appear to have statistical? significance, and also determine whether the results appear to have practical significance. In a study of a gender selection method used to increase the likelihood of a baby being born a? girl, 1903 users of the method gave birth to 936 boys and 967 girls. There is about a 25?% chance of getting that many girls if the method had no effect. Because there is a 25?% chance of getting that many...
determine whether the result appear to have statistical significance and also determine whether the results would...
determine whether the result appear to have statistical significance and also determine whether the results would appear to have practical significance in a study of a gender selection method used to increase the likelihood of a baby being born a girl 2077 users of the method gave birth to 1021 boys and 1056 girls there's about a 23% chance of getting that many girls if the method had no effect. because there is a 23% chance of getting that many...
Determine whether the results appear to have statistical​ significance, and also determine whether the results appear...
Determine whether the results appear to have statistical​ significance, and also determine whether the results appear to have practical significance. In a study of a gender selection method used to increase the likelihood of a baby being born a​ girl, 2086 users of the method gave birth to 1019 boys and 1067 girls. There is about a 15​% chance of getting that many girls if the method had no effect. Because there is a 15% chance of getting that many...
You have been asked to study whether there is a statistical relationship between the region of...
You have been asked to study whether there is a statistical relationship between the region of the country and the categorical number of stores that have experienced at least a 20% return rate of the item you are studying. Sample data concerning these two variables is given in appendix three. At both the 5% and 2% levels of significance, is there evidence of a relationship between the region of the country and the categorical number of stores that have experienced...
3) You have been asked to study whether there is a statistical relationship between the region...
3) You have been asked to study whether there is a statistical relationship between the region of the country and the categorical number of stores that have experienced at least a 20% return rate of the item you are studying. Sample data concerning these two variables is given in appendix three. At both the 5% and 2% levels of significance, is there evidence of a relationship between the region of the country and the categorical number of stores that have...
The Intention to Create Legal Relations will determine whether an agreement can either be legally binding...
The Intention to Create Legal Relations will determine whether an agreement can either be legally binding or morally binding. Discuss this statement with reference to the Malaysian Contract Law and decided cases. (Total : 20 marks)
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT