In: Computer Science
1. You're processing data to be uploaded into a database. In what stage of data preprocessing would you deal with fields where there is no data, or data is missing? A.Data Consolidation B.Data Cleaning C. Data Transformation D. Data Reduction
2. You're company, Outdoor Excursions just acquired another local tour company, Excursions Inc. and you've been tasked with merging Excursions Inc's database with yours. During datapreprocessing, you encounter inconsistencies in the "marital status" column. Some values indicate "married", "single", or "widowed", others are represented by their first letters "m", "s", and "w", and others are left blank. What are some ways you could deal with that data? Select all that apply. Delete the column Recode the existing values Do Nothing Fill in missing values All of the above
3. Explain the term "model fitting". Why is it important in data mining and machine learning?
1.) Choice(B) is correct, i.e., data cleaning. In data cleaning, the raw is scanned for all incorrect,irrelevant, null data, missing data and it is handles accordingly.
2.) Recoding the existing data such as where m or f or w is written, writing code to replace those characters with male, female or widow would match attribute format correctly. It will ensure the process of handling the data of the attribute would become easier.
3.) Model fitting is a measuring factor which tracks how perfectly a machine learning model can be able to generalise and interpret the data which is similar to the data used in training. A good model fitting is determined when output is accurate when some unseen and untrained(data not used in training ML) data is passed as input. Fitting is adjusting the parameter so to improve the accuracy of determining the output with unseen input. This together helps to used data mine in an optimized way which in turn improves machine learning.