Question

In: Statistics and Probability

Answer/discuss briefly the following questions. (a) What is the motivation for using standardized lift in association...

Answer/discuss briefly the following questions.

(a) What is the motivation for using standardized lift in association rule learning?

(b) Using your own words, describe the appropriate process for evaluating and comparing multiple classifiers.

(c) Discuss and compare advantages and disadvantages of classification trees, random forests and boosting.

Solutions

Expert Solution

Answer

a) The lift value in an association rule idenotes the measure of importance of a rule. It is a major performance indicator of a target model to predict or classify data poinmts against a random choice target model.

Standardising lift is a technique which takes into account the range of values lift may take to increase the effectiveness of the association rule. Using standardised lift to rank association rules has the effect of ranking a rule depending on the relative position of its lift to the maximum and minimum potential values. This results in a natural and absolute method of ranking association rules.

b) Processes to evaluate multiple classifiers are as mentioned below: Confusion Matrix, ROC curve, Area under the curve (AUC), Concordance, discordance etc

The best and most relevant process which is used in Confusion matrix. Below is the basic layout of the same

Truth
+ -
Predicted + True Positives (TP) False Positives (FP)
- False Negatives (FN) True Negatives (TN)

Many parameters can be calculated using confusion matrix:

1) Sensitivity / True Positive Rate

2) Specificity / False Positive Rate

3) Precision / Positive predictive value

4) Negative Predictive value

5) Many others like False discovery rate, False ommision rate etc

c) Below is the answer

Advantages Disadvantages
Classification Tree It doesn’t require normalization and scaling of data Sometimes the calculation in Classification and decision trees is complex as compared to other algorithms
Requires less effort at Data preparation stage Due to high complexity, it may be expensive and time consuming
Self explanatory and easily explanable to technical teams and business leads Not a good technique for regression and predicting continuous output
Random Forest The prediction performance is one of the best in all the supervised learning techniques Randon forest model is inherently less interpretable than an classification tree
Can be used for both classifictaion and regression problems High computational costs
Robust to outliers Training a large number of trees, may result in slow prediction
Boosting Curbs Over-fitting of data Sensitive to outliers
Easy to interpret algorithm Time and computationally expensive

Related Solutions

answer to these questions in short answer What is the importance of employee motivation in an...
answer to these questions in short answer What is the importance of employee motivation in an organization? Distinguish between internal and external motivation. Differentiate between Maslow’s “hierarchy of needs” and Herzberger’s “two-factor theory of needs.” Explain Alderfer’s “existence-relatedness-growth theory.”
Please discuss the following questions: What are some examples of extrinsic and intrinsic motivation in your...
Please discuss the following questions: What are some examples of extrinsic and intrinsic motivation in your life or of those you know? Have you ever experienced or witnessed the over justification effect? Do you think you are a good liar or know someone who is?  Are you fairly accurate in detecting lies in others?  Discuss your experiences with deception.
Explain using an example why computing lift ratio is better in assessing soundness of an association...
Explain using an example why computing lift ratio is better in assessing soundness of an association rule than just using support for the products associated in the rule or confidence in the rule. What is the practical significance of lift ratio less than 1?
Question (1) Answer each of the following questions briefly. These questions are based on the following...
Question (1) Answer each of the following questions briefly. These questions are based on the following relational schema: Emp(eid: integer, ename: string, age: integer, salary: real) Works(eid: integer, did: integer, pcttime: integer) Dept(did: integer, dname: string, budget: real, managerid: integer) (a) (5 points) Give an example of a foreign key constraint that involves the Dept relation. What are the options for enforcing this constraint when a user attempts to delete a Dept tuple? (b) (5 points) Write the SQL statements...
Please answer the following questions below. 1. Briefly discuss the hospital surveillance system that exists in...
Please answer the following questions below. 1. Briefly discuss the hospital surveillance system that exists in the Philippines,  and how SARI surveillance can be incorporated into an existing system or a new system in order to create a sustainable program. 2. Briefly discuss the flow of information within the SARI surveillance system in the Philippines. If your country has yet to establish a SARI surveillance system, discuss how you envision the information flow will be.
Answer the following Questions: 1. There is no correlation between motivation and behavior. T         F 2....
Answer the following Questions: 1. There is no correlation between motivation and behavior. T         F 2. In general, most humans have the desire to succeed in some capacity or   another. T       F 3. There are three criteria to emotion. They are biological, cognitive and neurological.   T           F 4. How many instincts did William MsDougall suggest humans possess?________________ 3. How many instincts did Bernard suggest humans possess? 5. Most animal behavior is based on _________________________. 6. In 1 sentence describe drive...
Please answer and discuss the following questions.  What is GDP and how is it calculated?  Discuss what would...
Please answer and discuss the following questions.  What is GDP and how is it calculated?  Discuss what would happen to an economy’s GDP if that economy suddenly made the production of a good/service which was previously legal, now illegal.  Next, based on your analysis of what would happen to GDP, what are the advantages and/or the disadvantages this economy may experience (use economic theory to support your answer)?  
Answer the following questions briefly and concisely. 1. What is audit sampling? 2. Define the following...
Answer the following questions briefly and concisely. 1. What is audit sampling? 2. Define the following terms: a. Error in the context of tests of controls b. Error in the context of substantive tests 3. What are the two components of detection risk? Explain each component briefly. 4. Can you eliminate sampling risk? Explain. 5. Describe the relationship between sampling risk and sample size.
Using the following powerpoint answer the following questions on the study guide. THESE ARE THE QUESTIONS:...
Using the following powerpoint answer the following questions on the study guide. THESE ARE THE QUESTIONS: 1.)Be able to explain why ER signal sequences are thought to be necessary and sufficient 2.)Know how the cell regulates the activity of transporters, receptors, and enzymatic proteins. 3.)Be able to explain, in moderate detail, the three main mechanisms of protein transport into organelles. 4.)Be able to describe the transport of soluble, single-pass and double-pass transmembrane proteins across the ER membrane. 5.)Know what happens...
Please answer these following questions briefly. 1. What is bond? 2. What is mutual fund? 3....
Please answer these following questions briefly. 1. What is bond? 2. What is mutual fund? 3. How much car insurance do you need? 4. What is the certificate of deposit (CD)? 5. What is a 401(k)?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT