Question

In: Computer Science

You would like to build a classifier for an Autism early detection application. Each data point...

You would like to build a classifier for an Autism early detection application. Each data point in the dataset represents a patient. Each patient is described by a set of attributes such as age, sex, ethnicity, communication and development figures, etc. You know from domain knowledge that autism is more prevalent in males than females.

If the dataset you are using to build the classifier is noisy, contains redundant attributes and missing values. If you are considering a decision tree classifier and a k-nearest neighbor classifier, explain how each of these can handle the three mentioned problems:

1. Noise

2. Missing Values

3. Redundant Attributes

Solutions

Expert Solution

there are various method to handle these problem, some methods are following:

1. Noise : you can use feature correlation heatmap features to find out co-rrelation between feature and target variable. you can select a group of features and apply cross validation on it. find out which groups or features have a lower accuracy. analysis the attribute and data which is having poor performance which help you to remove that data.

2. Missing Values: In health care domain, we have to used a real world based values of patient. we have to avoid simulated or unbiased data to impute in dataset. if you are having a missing value in an attribute which is not a part of your feature selection you dont have to do anythings. but if it is a part of it then you can do some basic things: you can remove it from dataset, you can impute a mean or median value of that coloum of dataset or you can impute a most occured value in that column( e.g: age, gender).

3. redundant Attributes:  you can use feature correlation heatmap to remove a redundant attributes from dataset or as per your question you can use features importance method to find out best features for your tree classifier. it will auto calculate best fetaures for your model using tree classifier.


Related Solutions

You are planning for a very early retirement. You would like to retire at age 40...
You are planning for a very early retirement. You would like to retire at age 40 and have enough money saved to be able to draw $ 210 comma 000$210,000 per year for the next 4040 years​ (based on family​ history, you think​ you'll live to age 8080​). You plan to save for retirement by making 2020 equal annual installments​ (from age 2020 to age​ 40) into a fairly risky investment fund that you expect will earn 1010​% per year....
a. For your senior project, you would like to build a cyclotron that will accelerate protons...
a. For your senior project, you would like to build a cyclotron that will accelerate protons to 10% of the speed of light. The largest vacuum chamber you can find is 60 cm in diameter.What magnetic field strength will you need? b.What magnetic field strength will levitate the 2.0 g wire in the figure(Figure 1)? Assume that I = 1.9 A and d = 12 cm . c. The magnetic field strength at the north pole of a 2.0-cm-diameter, 8-cm-long...
Project Description In this project you will build a car configuration application in six units. Each...
Project Description In this project you will build a car configuration application in six units. Each unit provides learning opportunities in Object Oriented Design. You are expected to document these lessons and apply them in the next unit. You will notice that the design guidance will taper off as you progress through units in Project 1. You will be expected to design on your own. Project 1 - Unit 1 In this project you will build a Car Configuration Application...
Suppose you are a Data scientist. You are building a Classifier that can predict whether a...
Suppose you are a Data scientist. You are building a Classifier that can predict whether a person is likely to default or Not based on certain parameters/attribute values. Assume, the class variable is “Default” and has two outcomes, {“yes”, “no”} • Own_House = Yes, No • Marital Status = Single, Married, Divorced • Annual Income = Low, Medium, High • Currently Employed = Yes, No Suppose a rule-based classifier produces the following rules: 1. Own_House = Yes → Default =...
You determine the format. I would like you to use some computer application, not pen and...
You determine the format. I would like you to use some computer application, not pen and paper. I would suggest WORD for any analysis comments and EXCEL for any spreadsheet analysis (Horizontal/Vertical) You are analyzing a company from a financial standpoint and comparing them to another company. Make it easy for us to read and understand. Make a recommendation to deal with this company or not based on your analysis. Choose a company from the following public companies. That means...
8. What additional data would you like to collect?
8. What additional data would you like to collect?
What would it have been like to go to work in a factory in the early...
What would it have been like to go to work in a factory in the early 19th century? How did blacks and whites react to the economy in the South?
Describe, using words and drawings, how you would build a circuit to add floating point numbers.
Describe, using words and drawings, how you would build a circuit to add floating point numbers.
Screening for early detection of lung cancer is a new concept. How do you feel about...
Screening for early detection of lung cancer is a new concept. How do you feel about performing this screening for clients who are current or former smokers? What would you advise a man over 50 who is reluctant to be screened for prostate cancer?
The application of lexical analysis techniques in spam email detection You should cover: 1) What is...
The application of lexical analysis techniques in spam email detection You should cover: 1) What is the problem? 2) What is the compiler construction techniques used to solve the problem 3) How to solve the problem using the compiling techniques.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT