Question

In: Computer Science

INTRODUCTION TO DATA MINING Question 1: Outlier detection Draw boxplot and detect outliers 201,199, 269, 236,...

INTRODUCTION TO DATA MINING

Question 1: Outlier detection
Draw boxplot and detect outliers
201,199, 269, 236, 278, 271, 303, 291, 283, 301, 341

Solutions

Expert Solution

Let the data range be

201,199, 269, 236, 278, 271, 303, 291, 283, 301, 341

Therefore n = 11

Median(Q2) = 1/2(11+1)th term = 6th term

Q2 = 271

Lower Quartile(Q1) = 1/4(11+1)th term = 3rd term

Q1 = 269

Upper Quartile (Q3) = 3/4(11+1) th term = 9th term

Q3 = 283

Inter Quartile Range (IQR) = Q3 – Q1 = 283 – 269

IQR = 14

Lower Limit = Q1 – 1.5 IQR = 269 – 1.5 (14)

Lower Limit = 248

Upper Limit = Q3 + 1.5 IQR = 283 + 1.5 (14)

Upper Limit = 304

Hence it is clear that any range above 304 and below 248 are outliers.Hence in the data series 201,199, 269, 236, 278, 271, 303, 291, 283, 301, 341, outliers are 201, 199 and 341. These 3 values which lies on either of the extremes can be considered abnormal and should be discarded from the entire series so that any analysis made on this series is not influenced by these extreme values. So the data series that should be considered for further observation or study after discarding the outliers are as below.

269, 236, 278, 271, 303, 291, 283, 301


Related Solutions

Question 1: Outlier detection Draw boxplot and detect outliers 201,199, 269, 236, 278, 271, 303, 291,...
Question 1: Outlier detection Draw boxplot and detect outliers 201,199, 269, 236, 278, 271, 303, 291, 283, 301, 341
After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B...
After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B claims that this does not make any sense and suggests that A re-read the definition of an anomaly. Do you think it is meaningful to cluster anomalies? Explain.
After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B...
After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B claims that this does not make any sense and suggests that A re-read the definition of an anomaly. Do you think it is meaningful to cluster anomalies? Explain.
INTRODUCTION TO DATA MINING Question 3: K-means clustering Use the k-means algorithm and Euclidean distance to...
INTRODUCTION TO DATA MINING Question 3: K-means clustering Use the k-means algorithm and Euclidean distance to cluster the following seven examples into two clusters: A1=(1, 1), A2=(1.5, 2), A3=(3,4), A4=(5,7), A5=(3.5,5), A6=(4.5,5), A7=(3.5,4.5) Suppose that the initial seeds (centers of each cluster) are A1 and A4. Run the k-means algorithm for 2 epochs. At the end of this epoch show: a) Distance matrix by calculating Euclidean distance. b) The new clusters (i.e. the examples belonging to each cluster) c) The...
Introduction to Database SHort answer question -1)Can intersection data be placed in the entity box of...
Introduction to Database SHort answer question -1)Can intersection data be placed in the entity box of one of the two entities in the many-to-many binary relationship? If yes, describe which one. If not, where can you put it? Explain. -2)What is the difference between a record type and an occurrence of that record? Explain and give example(s). -3)Name at least 4 entities, some sample attributes for each entity, and the primary key field for each entity, in a university environment...
QUESTION a. Explain the importance of data mining tools and the types of information they produce....
QUESTION a. Explain the importance of data mining tools and the types of information they produce. In what type of circumstance would you advise a company to use data mining? b. Describe the use of personalization and customization in e-commerce. What business value do these techniques have?
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science...
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science is a hot new career 4 I don't love my major or career Use the corpus of documents shown in the above table to answer the quiz questions below. What is the inverse document frequency (IDF) of the term "love"? (Round your answer to 2 decimal places). What is the TF-IDF value (importance) of the term "data" to document 1? (Round your answer to...
Question Define big data and data mining. What purpose does collecting huge amounts of data serve?...
Question Define big data and data mining. What purpose does collecting huge amounts of data serve? Consider Twitter. Do you believe big data is accurate and reliable? Why or why not? What type(s) of sampling methods could be used with big data? What sampling errors could occur and how could they be avoided? Has collecting big data been helpful for businesses? Why or why not? What do you see happening in the future with big data?
Question 1: Describe a molecular method used for the detection of + a given human genetic...
Question 1: Describe a molecular method used for the detection of + a given human genetic disease (analyze a related paper) + a given infectious disease (analyze a related paper)
Question 1 During the year to 31 December 2013, Acacia Mining Ltd built a new mining...
Question 1 During the year to 31 December 2013, Acacia Mining Ltd built a new mining facility to take advantage of new laws regarding on-shore gas extraction. The construction of the facility cost K10 million, and to fund this, Acacia Mining Ltd took out a K10 million 6% loan on 1 January 2013, which will not be repaid until 2016. The 6% interest was paid on 31 December 2013. Construction work begun on 1st January 2013, and the work was...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT