Question

In: Statistics and Probability

I working in Cardiotocography data with classification ,data include 8ver continuous and 13discrete ,how l cleaning...

I working in Cardiotocography data with classification ,data include 8ver continuous and 13discrete ,how l cleaning data in R.missing and noise

Solutions

Expert Solution

I am going to give you a detailed answer to the question. In case you have doubts please let me know. I have detailed how to remove both - missing and noise from data in the following reasons :

1. Deleting the observations

If there are huge no. of records in your dataset, where all the catgories are to be predicted are sufficiently represented in the training data, then delete (or not to include missing values while model building, for example by setting na.action=na.omit) those observations (rows) that contain missing values. Make sure after deleting the observations, you have:

1. Have sufficent data points, so the model doesn’t lose power.
2. Not to introduce bias (meaning, disproportionate or non-representation of classes).

lm(medv ~ ptratio + rad, data=BostonHousing, na.action=na.omit)

2. Deleting the variable

If a variable has most missing values that rest of the variables in the dataset, and, if by removing that one variable you can save many observations. then, suggest to remove that particular variable, unless it is a really important predictor that makes a lot of business sense

3. Imputation with mean / median / mode

Replacing the missing values with the mean / median / mode is a crude way of treating missing values. Depending on the context, like if the variation is low or if the variable has low leverage over the response, such a rough approximation is acceptable and could possibly give satisfactory results.

4. Prediction

Prediction is most advanced method to impute your missing values and includes different approaches such as: kNN Imputation, rpart, and mice.

4.1. kNN Imputation

DMwR::knnImputation uses k-Nearest Neighbours approach to impute missing values. What kNN imputation does in simpler terms is as follows: For every observation to be imputed, it identifies ‘k’ closest observations based on the euclidean distance and computes the weighted average (weighted based on distance) of these ‘k’ obs.

Code for this:

library(DMwR)
knnOutput <- knnImputation(BostonHousing[, !names(BostonHousing) %in% "medv"])  # perform knn imputation.

4.2 rpart

The limitation with DMwR::knnImputation is that it sometimes may not be appropriate to use when the missing value comes from a factor variable. Both rpart and mice has flexibility to handle that scenario. The advantage with rpart is that you just need only one of the variables to be non NA in the predictor fields.

4.3 mice

mice short for Multivariate Imputation by Chained Equations is an R package that provides advanced features for missing value treatment. It uses a slightly uncommon way of implementing the imputation in 2-steps, using mice() to build the model and complete() to generate the completed data

To remove Noise:

1.Use BilateralFilter:

ListLinePlot[{data, 
  BilateralFilter[data, 2, .5, MaxIterations -> 25]}, 
 PlotStyle -> {Thin, Red}]

2.Use MeanShiftFilter can produce similar results:

ListLinePlot[{data, 
  MeanShiftFilter[data, 5, .5, MaxIterations -> 10]}, 
 PlotStyle -> {Thin, Red}]

3. Another way is to apply TrimmedMean over a sliding window. In R this the way to do it:

ListLinePlot[{data, ArrayFilter[TrimmedMean, data, 20]}, 
 PlotStyle -> {Thin, Red}]

Related Solutions

A classification technique is a systematic approach to building classification models. Examples include, but are not...
A classification technique is a systematic approach to building classification models. Examples include, but are not limited to, decision trees, neural networks, and naïve Bayes and Bayesian approaches. Examine some data in an organization you are familiar with, that is a candidate for classification. Describe the data under consideration. Which classification technique did you select and why? How can your organization benefit from using this classification model?
please assist with data cleaning problems just want article on data cleaning problems
please assist with data cleaning problems just want article on data cleaning problems
This is Java class working on the eclipse. I include some of the codes just so...
This is Java class working on the eclipse. I include some of the codes just so you know what needed. create and work with interfaces you'll create the DepartmentConstants interface presented. In addition, you'll implement an interface named Displayable that's similar to the Printable interface Create the interfaces 1- Import the project named ch12-ex1_DisplayableTest and review the codes package murach.test; public interface Displayable {     String getDisplayText(); } 2 . Note that this code includes an interface named Displayable that...
How do I include the data? The site keeps telling me the question is too long...
How do I include the data? The site keeps telling me the question is too long when I include it and when I try to add the 2 pictures it will only let me add one pic. With milk sales sagging of late, The Milk Processor Education Program (MPEP) decided to move on from the famous "Got Milk" ad slogan in favor of a new one, "Milk Life." The new tagline emphasizes milk's nutritional benefits, including its protein content. MPEP...
I dont know why it is not working #define _CRT_SECURE_NO_WARNINGS #include <stdio.h> // Define Number of...
I dont know why it is not working #define _CRT_SECURE_NO_WARNINGS #include <stdio.h> // Define Number of Employees "SIZE" to be 2 // Declare Struct Employee /* main program */ #define SIZE 2 struct employee { int id; int age; double salary; }; int main(void) { int option = 0, number = 0, count = 0; struct Employee emp[SIZE] = { { 0 } }; // Declare a struct Employee array "emp" with SIZE elements // and initialize all elements to...
Does lotus cleaning technology work Other than self-cleaning ؟ I mean, can I make a lotus...
Does lotus cleaning technology work Other than self-cleaning ؟ I mean, can I make a lotus broom that extracts dust from the screens? Lotus technology
(d)Consider classification and censorship. What is the purpose of classification of media? Explain briefly how classification...
(d)Consider classification and censorship. What is the purpose of classification of media? Explain briefly how classification systems could be considered censorship.
I am working on creating a Wiebull distribution from a large set of data I have....
I am working on creating a Wiebull distribution from a large set of data I have. Everything I find online says that I should be given the shape parameter (beta), and scale parameter (eta/apha). I do not have these numbers and I am not sure how to find them to accurately create a Weibull dist.
C6H14(l) + Br2(l) + light → C6H13Br(l) + HBr(g) How do I explain this reaction?
C6H14(l) + Br2(l) + light → C6H13Br(l) + HBr(g) How do I explain this reaction?
Hello, I am working on an assignment but I am unsure of how to solve it....
Hello, I am working on an assignment but I am unsure of how to solve it. Please help me. The assignment details are below. Consider this scenario: Your friend starts a website, nothingbutflags.com, which is not making money. Your friend asks you to help generate more traffic. You ask your friend how much traffic the website had last month? And your friend replies and says only 500 visits. You also ask how many flags did you sell? Your friend replies...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT