Question

In: Statistics and Probability

The weather company you work for is performing an analysis on predicting sandstorms accurately in Phoenix....

The weather company you work for is performing an analysis on predicting sandstorms accurately in Phoenix. You have been given a large dataset and are asked to determine which factors could be used to accurately predict them.

The dataset you are provided with has over 50 variables. With your vast knowledge of Excel, you determine that 4 variables can be used as predictors to accurately predict sandstorms. Your boss is not convinced and decided to ask you some questions.

  1. You mentioned that before you even begin doing this “regression analysis” of yours, you discarded about half of the variables. How did you choose which variables to discard, and more importantly, why does it matter?

  1. You asked a friend and he told you that in his company they use both “Relative Humidity” and “Number of Sunny Hours” to predict sandstorms. In your report, you mention you only use the “Relative Humidity”, because of it being highly correlated with “Number of Sunny Hours”.  However, by themselves each are good predictors, why didn’t you use both? Wouldn’t that make the model better?

  1. By looking at the correlation matrix you created between independent variables and the dependent variable, which independent variables based on the type of correlation did you discard or keep, and why?
    1. Strong positive correlation
    2. Weak positive correlation
    3. No correlation
    4. Weak negative correlation
    5. Strong negative correlation

Solutions

Expert Solution

A. Predictor variables can be excluded from the analysis on the basis of the following:

Identify outliers and influential points - maybe exclude them at least temporarily.

The need to keep only the required predictor variables in the regression analysis because of the following reasons:

1)  Unnecessary predictors will add noise to the estimation of other quantities that we are interested in. Degrees of freedom will be wasted

2) Collinearity is caused by having too many variables trying to do the same job.

3) If the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors.

B. As both “Relative Humidity” and “Number of Sunny Hours” are highly correlated, it will add collinearity in the model as both the variables are doing the same job.

C. the variables to keep or discard will depend on the interest of your output. We cannot filter the correlation values. If there is a correlation between independent and dependent variables, those will be included in the analysis. If dependent and independent variables do not have any correlation, that variables will be excluded from the analysis. Hence, the variables with the following types of correlation will be included in the analysis:

  1. Strong positive correlation
  2. Weak positive correlation
  3. Weak negative correlation
  4. Strong negative correlation

variables with no correlation will be excluded from the analysis.


Related Solutions

The weather company you work for is performing an analysis on predicting sandstorms accurately in Phoenix....
The weather company you work for is performing an analysis on predicting sandstorms accurately in Phoenix. You have been given a large dataset and are asked to determine which factors could be used to accurately predict them. The dataset you are provided with has over 50 variables. With your vast knowledge of Excel, you determine that 4 variables can be used as predictors to accurately predict sandstorms. Your boss is not convinced and decided to ask you some questions. You...
If you are performing an analysis of a company, would you only look at the last...
If you are performing an analysis of a company, would you only look at the last year's performance ? Please explain your answer. Yes or no is not sufficient.
If you are performing a historical analysis of the company, which areas should you compare? For...
If you are performing a historical analysis of the company, which areas should you compare? For example, ratio analysis or ROE ?? Which  measures are important in performing such analysis ?
Describe the limitations of light wave theory in accurately predicting blackbody radiation and Planck's solution
Describe the limitations of light wave theory in accurately predicting blackbody radiation and Planck's solution
As an investor, what should you look for when performing ratio analysis of a company? What...
As an investor, what should you look for when performing ratio analysis of a company? What cautions should you take when reviewing ratios? Are there limitations to the analysis?
You will be performing an analysis on a dataset that contains data on fertility and life...
You will be performing an analysis on a dataset that contains data on fertility and life expectancy for 198 different countries. All data is from the year 2013. The fertility numbers are the average number of children per woman in each of the countries. The life expectancy numbers are the average life expectancy in each of the countries. You will be turning in a paper that should include section headings, graphics and tables when appropriate and complete sentences which explain...
In performing a vertical analysis for a service company, the base for salaries expense is total...
In performing a vertical analysis for a service company, the base for salaries expense is total selling expenses. total expenses. total revenues. revenues.
You will be performing an analysis on female heights, given of set of 30 heights that...
You will be performing an analysis on female heights, given of set of 30 heights that were randomly obtained. For this project, it is necessary to know that the average height for women is assumed to be 65 inches with a standard deviation of 3.5 inches. You will use these numbers in some of your calculations. Height (in Inches) Name 72.44 Emma 67.53 Olivia 66.71 Ava 62.02 Isabella 73.89 Sophia 65.95 Mia 65.83 Charlotte 64.15 Amelia 65.39 Evelyn 59.68 Abigail...
You will be performing an analysis on heights in the US population, broken out by gender.
Make sure that all statistical analysis to be done in Excel and/or StatCrunch and answer all parts: You will be performing an analysis on heights in the US population, broken out by gender. You will need to know that US heights for males and females both follow an approximately normal distribution. The average height for women is 63.7 inches and a standard deviation of 2.7 inches. The average height for men is 69.1 inches and a standard deviation of 2.9...
​​​​​​​You will be performing an analysis on female heights, given of set of 30 heights that...
​​​​​​​You will be performing an analysis on female heights, given of set of 30 heights that were randomly obtained. For this project, it is necessary to know that the average height for women is assumed to be 65 inches with a standard deviation of 3.5 inches. You will use these numbers in some of your calculations. For this step, please work under the assumption that we do not know the population mean and standard deviation (and in fact, if we...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT