In: Statistics and Probability
1. This is especially valuable when we have subsets of measurements that are highly correlated. In that case it provides few variables that are weighted linear combinations of the original variables that retain the explanatory power of the full original set. A. Correlation analysis B. Affinity analysis C. Variance Analysis D. Principal Component Analysis
2. These are very helpful for data reduction. The information they convey can assist in combining categories, in choosing the variables to remove, and in assessing the level of information overlap between variables. A. Dimension Reduction B. Confusion Matrix C. Data Summaries D. Correlation tables
3. The Classical Statistical approach has focused on this objective A. fitting the best model to the data in an attempt to learn about the underlying relationship in the population. B. Predicting new observations C. Predicting the outcomes of new cases D. All the above
4. This provides the estimate of true classification and misclassification rates. If we have large enough dataset and neither class is rare this can provide reliable estimates. A. Classification matrix B. Lift charts C. Naive rule D. None of the above
5. The default cutoff value in two-class classifiers is A. 0.5 B. 1.0 C. 1.5 D. 2.0
6. When the benefits and costs of correct and incorrect classification are known or can be estimated, this chart is still useful presentation and decision tool A. Lift Chart B. Bar Chart C. Matrix Chart D. Scatter Chart
7. In all discussions of this, we assume the common situation in which there are two classes, one of much greater interest than the other. Data with more than two classes do not lend themselves to this procedure A. Overfitting B. Outliers C. Oversampling D. Jittering
8. This measure gives a percentage score of how predictions deviate (on average) from the actual values. A. RMSE B. SSE C. MAE D.MAPE
9. This is the presence of two or more predictors sharing the same linear relationship with the outcome variable. Estimates of regression coefficients are likely to be unstable because of this A. Multicollinearity B. Parsimony C. Correlated variables D. None of the above
10. Three popular iterative search algorithms are forward selection, backward elimination and A. Multiple Linear Regression B. Simple Linear Regression C. Stepwise Regression D. None of the above