Question

In: Computer Science

Patient dataset from a hospital has been taken to Identify whether the patient has heart disease...

Patient dataset from a hospital has been taken to Identify
whether the patient has heart disease or not. Dataset contains
noisy data and some outliers present in it, for that dataset choose
any of the suitable data preprocessing tasks and also tell how
outliers or noisy data removed from that dataset.

Expert Solution

Outliers can be detected using thee following methods:

Extreme Value Analysis
Z-score method
K Means clustering-based approach
Visualizing the data

Extreme Value Analysis:

This is the most trivial form of outlier detection. The key of this method is to understand tthe underlying distribution of the variable and find the values at the extreme ends.

In case of a Gaussian Distribution, the outliers will lie outside the mean plus or minus 3 times the standard deviation of the variable.

If the variable is not normally distributed (not a Gaussian distribution), a good approach is to calculate the quantiles and then the inter-quartile range.

Standard Score (Z Score):

A Z-score (or standard score) represents how many standard deviations a given measurement deviates from the mean. It merely re-scales, or standardizes the data. A Z-score serves to specify the precise location of each observation within a distribution. The sign of the Z-score shows whether the score is above (+) or below ( — ) the mean.

The intuition behind the Z-score method of outlier detection is that, once we’ve centered and rescaled the data, anything that is too far from zero (the threshold is usually a Z-score of 3 or -3) should be considered an outlier.

Clustering Method:

Clustering is a popular technique used to group similar data points or objects in groups or clusters. It can also be used as an important tool for outlier analysis. In this approach, group the similar kind of objects and the oulier is automatically seperated.

Graphical Approach:

Various plots such as Box plots, histograms, and Scatter plots are majorly used to identify outliers in the dataset.

Methods to Pre-Process Outliers:

Mean/Median or random Imputation
Trimming
Top, Bottom and Zero Coding

Mean / Median / Random Sampling:

If we have reasons to believe that outliers are due to mechanical error or problems during measurement. That means, the outliers are in nature similar to missing data, then any method used for missing data imputation can we used to replace outliers. The number are outliers are small (otherwise they won't be called outliers) and it's reasonable to use mean/median/random imputation to replace them.

Trimming:

In this method, we discard the outliers completely i.e. eliminate the data points that are considered as outliers. In situations where you won’t be removing a large number of values from the dataset, trimming is a good and fast approach.

Top / bottom / zero Coding:

Top Coding means capping the maximum of the distribution at an arbitrary set value. A top coded variable is one for which data points above an upper bound are censored. By implementing top coding, the outlier is capped at a certain maximum value and looks like many other observations.

Bottom coding is analogous but on the left side of the distribution. That is, all values below a certain threshold, are capped to that threshold. If the threshold is zero, then it is known as zero-coding. For example, for variables like “age” or “earnings”, it is not possible to have negative values. Thus it’s reasonable to cap the lowest value to zero.

PLEASE LIKE THE SOLUTION :))

IF YOU HAVE ANY DOUBTS PLEASE MENTION IN THE COMMENT

venereology answered 5 months ago

A test to diagnose heart disease is correct 99% of the time if the patient has...

A test to diagnose heart disease is correct 99% of the time if the patient has the disease, and 97% correct in its diagnosis if the patient does not have the disease. Only 2% of the population has this heart disease. a).If a patient is randomly selected from the population to perform the test, what is the probability that the disease will be diagnosed? What do you think of that result? b). If the disease is diagnostic, what is the...

A hospital administers a test to see if a patient has a certain disease. 2 %...

A hospital administers a test to see if a patient has a certain disease. 2 % of the overall population has the disease. The test is 90% accurate. (a)If a patient tests positive, what is the probability that they actually have the disease? (b) If we instead perform two successive tests on each person, what is the probability that a person who tests positive both times actually has the disease? (Hint: drawing a probability tree might help)

what are three priority assessment for a patient with heart disease?

A patient has recently been diagnosed with asthma. The patient is unsure whether the medication regimen...

A patient has recently been diagnosed with asthma. The patient is unsure whether the medication regimen prescribed is actually helping. The physician orders pulmonary function testing. What information might testing provide that will aid in the patient’s treatment regimen? close to 1 page explaination

A study was done to check whether there is a relationship betwen snoring and heart disease....

A study was done to check whether there is a relationship betwen snoring and heart disease. Group Having heart disease (xx) Total (nn) Snorers 118 1175 Non-snorers 76 944 Let p1p1 and p2p2 represent population proportions of snorers and non-snorers respectively who are having heart disease. 1. What proportions of the snorers are having heart disease? [answer to 3 decimal places - answer in fraction, NOT in percentage] 2. What proportions of the non-snorers are having heart disease? [answer to...

A patient has been in the hospital for 31 days and being treated for depression. She...

A patient has been in the hospital for 31 days and being treated for depression. She was given various medications, including sedatives. During her time in the hospital, she was described as lethargic, tired, and drowsy. On the morning of her discharge, nursing staff woke her on four occasions, but each time she fell asleep again. When woken at lunchtime the patient was told the bed was needed for another patient. Whilst in the shower her bags were packed by...

Coronary Heart Disease and Hypertension Review Questions Short Answer 1. Identify the risk factors for heart...

Coronary Heart Disease and Hypertension Review Questions Short Answer 1. Identify the risk factors for heart disease. What control do people have over these risk factors? 2. Identify four dietary recommendations for a patient who has had a heart attack. Describe how each recommendation facilitates recovery. 3. Discuss the three levels of hypertension and the treatment options for each. 4. What does the term essential hypertension mean? Why would weight control and sodium restriction contribute to its control? What other...

The average length of stay in a chronic disease hospital for a certain type of patient...

The average length of stay in a chronic disease hospital for a certain type of patient is 60 days with a standard deviation of 15 days. If 36 of these patients are randomly selected, find the probability that the average length of stay of these 36 patients is less than 58 days. (a) .1519 (b) .3176 (c) .4483 (d) .2119 (e) .6124

Escape Fire follows Yvonne Osborne, who has been in treatment for heart disease for over 15...

Escape Fire follows Yvonne Osborne, who has been in treatment for heart disease for over 15 years. What mistakes are noted in her care? What have these mistakes cost in terms of money, risk, and the emotional distress?

Mrs. Lu, an 83 year old patient, has been admitted to the hospital for a fractured...

Mrs. Lu, an 83 year old patient, has been admitted to the hospital for a fractured femur as a result of a fall at home. The patient has been on bedrest and is very weak. She states, "I got dizzy when I tried to stand up to go to the bathroom and I fell down and was really scared. I thought I was going to die". The patient also presents with dehydration a fever of 100.2 and respiration's of 30....