In: Statistics and Probability
1.These learning algorithms are used in classification and prediction and must have data available in which value of the outcome of interest is known. Simple Linear Regression analysis is an example of this. A. Correlation Analysis B. Supervised Learning C. Unsupervised Learning D. Confusion Matrix
2.This partition is used to assess the performance of each model so that you can compare models and pick the best one. A. Training Partition B. Test Partition C. Validation Partition D. None of the above
3. This error is most useful and takes the square root of the average squared error and gives the idea of the typical error in the same scale as used in original data. A. RMS Error B.Average Error C.Standard Deviation Error D. None of the above
4. These Charts are useful for comparing a single statistic example average, count, percentage across groups A. Line Charts B. Bar Charts C. Scatter Plot D. Lift Charts
5. In a data mining context, these are especially useful for two purposes: for visualizing correlation tables and for visualizing missing values in the data. A. Box Plots B. Histograms C. Heatmaps D. None of the above
6. The purpose of this is to remove some of the observations from the plot in order to focus attention on certain data while eliminating noise created by other data. A. Filtering B. Panning C. Aggregation D. All the above
7. In this plot a vertical axis is drawn for each variable and each observation is represented by drawing a line that connects its values on different axes, thereby creating a multivariate profile. A. Box Plot B. Scatter Plot C. Parallel Coordinates Plot D. None of the above
8. Basic Charts and Distribution plots in their basic form can display more than two variables and therefore can reveal high dimensional information. A. True B. False
9. Distribution plots are useful in supervised learning for determining potential data mining methods and variable transformations. A. True B. False
10. These are interactive tables that can combine information from multiple variables and compute a range of summary statistics. A. Database tables B. Confusion Matrix C. Scatterplot Matrix D. Excel Pivot Tables