Question

In: Statistics and Probability

You are given a data set containing the height, weight, age, and blood pressure of a...

You are given a data set containing the height, weight, age, and blood pressure of a representative sample of people from a major metropolitan area. Comment on the suitability of using a statistically-based versus a cluster-based outlier detection scheme to identify people with anomalous characteristics for this data set.   

Solutions

Expert Solution

thank you


Expert Solution

To find the outlier we have several methods like graphical statistically based also cluster based etc.

In statistically based analysis the approach is like assuming a parametrical model describing the distribution of the data .example we can say that normal distribution.To apply a statistical test it depend several things like. Data distribution , parameter of distribution and also number of the expected outlier . The statistical based approach is likehood approach. There are also limitations to the statistical method.

In the statistical method most of the approach are for a single attribute secondly in many cases data distribution may not be known then this approach faces problem and also for high dimensional data it may be challenging to find out the true distribution

Then it comes the cluster based analysis its basis is like clustering the data into groups of different density then choose points in small cluster as the candidate outlier then comparing with this candidate point compute the distance between the candidate point and noncandidating clusters . If the candidate points are so far from the all other noncandidating points they are outliers.  

Since here we got height ,weight,age, and blood pressure of sample from metropolitan city means we have more than one or several type of variables so we can say that it's multivariate and also it will be a huge dimensional data because it's from metropolitan city . According to these founding I think the stable method for outlier detection is cluster based outlier detection method.


Expert Solution


Related Solutions

There are two variables in this data set. Variable Definition Height Height in inches Weight Weight...
There are two variables in this data set. Variable Definition Height Height in inches Weight Weight in pounds Using Excel, compute the standard deviation and variance (both biased and unbiased) for height and weight. Height weight 53 156 46 131 54 123 44 142 56 156 76 171 87 143 65 135 45 138 44 114 57 154 68 166 65 153 66 140 54 143 66 156 51 173 58 143 49 161 48 131
The systolic blood pressure of individuals is thought to be related to both age and weight....
The systolic blood pressure of individuals is thought to be related to both age and weight. For a random sample of 11 men, the following data were obtained. Systolic Blood pressue x1 Age (years) x2 Weight (pounds) x3 132 52 173 143 59 184 153 67 194 162 73 211 154 64 196 168 74 220 137 54 188 149 61 188 159 65 207 128 46 167 166 72 217 (a) Generate summary statistics, including the mean and standard...
The systolic blood pressure of individuals is thought to be related to both age and weight....
The systolic blood pressure of individuals is thought to be related to both age and weight. For a random sample of 11 men, the following data were obtained. Systolic Blood pressue x1 Age (years) x2 Weight (pounds) x3 132 52 173 143 59 184 153 67 194 162 73 211 154 64 196 168 74 220 137 54 188 149 61 188 159 65 207 128 46 167 166 72 217 (a) Generate summary statistics, including the mean and standard...
The systolic blood pressure of individuals is thought to be related to both age and weight....
The systolic blood pressure of individuals is thought to be related to both age and weight. For a random sample of 11 men, the following data were obtained. Systolic Blood pressue x1 Age (years) x2 Weight (pounds) x3 132 52 173 143 59 184 153 67 194 162 73 211 154 64 196 168 74 220 137 54 188 149 61 188 159 65 207 128 46 167 166 72 217 (a) Generate summary statistics, including the mean and standard...
The data set is height in inches and weight in pounds of random patients at the...
The data set is height in inches and weight in pounds of random patients at the Dr's office. Predict the weight of a patient that is 67 inches tall. Is it possible to predict using linear regression? Support your answer Linear regression was completed with the following results: Equation: Weight = -281.847 + 6.335*Height p-value = 0.00161 Height Weight 68 148 69 126 66 145 70 158 66 140 68 126 64 120 66 119 70 182 62 127 68...
The weight and systolic blood pressure (BP) of 26 randomly selected males in the age group...
The weight and systolic blood pressure (BP) of 26 randomly selected males in the age group 25-30 are shown in the SAS code. Assume that for any fixed value of weight, BP is normally distributed. Number of Observations Read 27 Number of Observations Used 26 Number of Observations with Missing Values 1 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 2693.58122 2693.58122 35.74 <.0001 Error 24 1808.57262 75.35719 Corrected Total 25...
Data Set Height Weight Age Shoe Size Waist Size Pocket Change 64 180 39 7 36...
Data Set Height Weight Age Shoe Size Waist Size Pocket Change 64 180 39 7 36 18 66 140 31 9 30 125 69 130 31 9 25 151 63 125 36 7 25 11 68 155 24 8 31 151 62 129 42 6 32 214 63 173 30 8 34 138 60 102 26 6 25 67 66 180 33 8 30 285 66 130 31 9 30 50 63 125 32 8 26 32 68 145 33...
A researcher studied age and systolic blood pressure of 6 randomly selected subjects. The data are...
A researcher studied age and systolic blood pressure of 6 randomly selected subjects. The data are shown in the table. Subject /Age /Pressure A 43 120 B 48 128 C 56 135 D 61 141 E 67 143 F 70 152 Test the significance of the correlation between age and systolic blood pressure. Use a significance level of 0.05. Step1) Identify the null and alternative hypotheses symbolically. H0: H1: Step 2) Find the p – value and make the decision...
Researcher recorded the following data to study the effects of age (x) on systolic blood pressure...
Researcher recorded the following data to study the effects of age (x) on systolic blood pressure (y) in men over the age of 50. Using the data below, calculate the 95% prediction interval for the systolic blood pressure of a 60-year old man. Estimated regression line is Y = 84.250 + 0.991x. Age 67 54 78 66 80 57 59 71 60 62 53 Systolic BP (mmHg) 135 145 167 129 163 146 152 168 142 151 129
Below are the ages of six patients with blood pressure problems.   Age, x Blood Pressure, y...
Below are the ages of six patients with blood pressure problems.   Age, x Blood Pressure, y 43 128 48 120 56 135 61 143 67 141 70 152 Use this data to determine if there is a correlation between the age and blood pressure using the significance level of α=0.05. What is the p-value?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT