In: Statistics and Probability
Included in a classification or prediction model, highly correlated values, or variables that are unrelated to the outcome of interest can lead to overfitting, and reliability can suffer. T/F
It is best to normalize when the units of measurement are common for the variables and when their scale reflects their importance. T/F
This is a useful procedure for reducing the number of predictors in the model by analyzing the input variables. It is intended to be used with quantitative variables. A.correlation Analysis , B Correspoding Anaylysis, C Principle components analysis D, ABC
This provides the sense of how dispersed the data are relative to the mean. A.Standard Deviation, B.Variance, C.Mean D.Average
1) Included in a classification or prediction model, highly correlated values, or variables that are unrelated to the outcome of interest can lead to overfitting, and reliability can suffer.
True
Because the correlation between such variables is not cause effect relationship.
So it's lead to overfitting, and reliability can suffer.
2) It is best to normalize when the units of measurement are common for the variables and when their scale reflects their importance. T/F
False.
If the units of the measurements are common then we don't need to normalize the variables.
we can compare such variables which are common unit variables .
3) This is a useful procedure for reducing the number of predictors in the model by analyzing the input variables. It is intended to be used with quantitative variables. A.correlation Analysis , B Correspoding Anaylysis, C Principle components analysis D, ABC
The correct option is C) Principle component analysis.
Because: Correlation analysis is useful to investigate the relationship between the quantitative variables.
Correspondence analysis is used to investigate the relationship between two qualitative variables.
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of
observations of possibly correlated variables into a set of values of linearly uncorrelated variables which reduces the number
of predictors. So correct option is C.
4)
This provides the sense of how dispersed the data are relative to the mean. A.Standard Deviation, B.Variance, C.Mean D.Average
The correct option is A) Standard deviation.
Because : it's measure the spread within the data-set and the measurement unit of standard deviation is same as the mean.