Question

In: Computer Science

Machine learning Adaboost question: The AdaBoost algorithm has two drawbacks. Answer the following questions regarding these....

Machine learning Adaboost question:

The AdaBoost algorithm has two drawbacks. Answer the following questions regarding these.
(I) Show mathematically why a weak learner with < 50% predictive accuracy presents a problem to AdaBoost.
(II) AdaBoost is susceptible to outliers. Suggest a simple heuristic that may alleviate this.

Solutions

Expert Solution

(I)

Adaboost, abbreviated for Adaptive Boosting, is an AI approach that is theoretically straightforward, however less simple to get a handle on mathematically. AdaBoost have one main disadvantage: AdaBoost is sensitive to noisy data and outliers. The idea is anything but difficult to follow, however once we attempt to burrow somewhat more profound to comprehend the number related that bolsters the idea, we become confronted with numerous articles and talks with this comparative looking sight:

Boosting: consolidating numerous frail (straightforward) students to make an exceptionally precise expectation.

Weak learners: classifiers that produce prediction that is slightly better than random guessing. Random guessing is equivalent to 50%, like flipping a coin. This will be familiar to those who are conversant with information theory, particularly the idea of Shannon’s entropy.

Adaboost: The first practical boosting algorithm invented by Freund and Schapire (1995). It is based on Vapnik and Chervonekis’ idea that for a trained classifier to be effective and accurate in its predictions, it should meet these three conditions:

1) classifier should be trained on “enough” training examples

2) it should provide a good fit to these examples by producing low training error

3) it should be simple (in that simpler models are better than overly complex one)

1) Given (x_1,y_1),… ..,(x_m,y_m) where x_i ∈ X, y_i ∈ {-1, +1}

Helpful Notations

∈: "component of"

{}: set

ex: if A = {1,2,3,7}, 2 ∈ A

(x_1, y_1): first preparing test, (x_m,y_m) = m-th preparing test

Since we have all the documentations down, we can peruse the initial segment of the equation as:

"Given the preparation set containing m tests where all x inputs are a component of the complete set X and where y yields are a component of a set including just two qualities, - 1 (negative class) and 1 (positive class)… "

2) Initialize: D1(i) = 1/m for I = 1, … ,m.

Here, D = loads of tests and I = the I-th preparing test. In different papers, the D will be composed as W. Consequently the following articulation peruses:

"… introduce all loads of your examples to 1 separated by number of preparing test… "

3) For t=1, … , T:

* train frail student utilizing circulation Dt.

* Get frail speculation h_t: X - > {-1, +1}

* Aim: select h_t with low weighted mistake:

ε = Pr_i~Dt [h_t(xi) not equivalent to y_i]

* Choose α_t = 1/2 * ln(1-ε/ε)

* Update, for I = 1,… ,m:

Dt+1(i) = Dt(i)exp(- αt * y_i * h_t(x_i)/Zt

Valuable Notations

Pr = likelihood

h_t = theory/classifier

ε = least misclassification mistake for the model

α = weight for the classifier

exp = euler's e: 2.71828

Zt = standardization factor, used to guarantee that loads speak to a genuine conveyance

With these documentations within reach, we can peruse the following bit as:

"For t=1 to T classifiers, fit it to the preparation information (where every expectation is either - 1 or 1) and select the classifier with the least weighted order mistake."

The formula to compute ε is portrayed as follows:

Let’s break down this particular model.

Useful Notations

Σ = sum

y_i not equal to h_j = 1 if misclassified and 0 if correctly classified

w_i = weight

Thus, the formula reads: “Error equals the sum of the misclassification rate, where weight for training sample i and y_i not being equal to our prediction h_j (which equals 1 if misclassified and 0 if correctly classified).”

Let us apply simple math to make sense of the formula. Consider having 4 different samples with weights 0.5, 0.2, 0.1 and 0.04. Imagine our classifier h predicted values 1, 1, -1 ,and -1, but the actual output value y was -1, 1, -1, 1.

predicted: 1 1 -1 -1

actual: -1 1 -1 1

weights: 0.5 0.2 0.1 0.04

1 or 0: 1 0 0 1

This leads to the following calculation for the misclassification rate:

misclassification rate / error = (0.5*1 + 0.2*0 + 0.1*0 + 0.04*1) / (0.5 + 0.2 + 0.1 + 0.04)

error = 0.64285714285

Next, choose our weight for the classifier, α, by the formula that reads 1/2 * ln(1- error / error).

Simple math might explain better than words could here. Assume for instance, that we have errors 0.30, 0.70, 0.5.

Our classifier weights would be calculated as follows:

ε = 0.3

α = 1/2 * ln(1- 0.3 / 0.3) = 0.42365

ε = 0.7

α = 1/2 * ln(1- 0.7 / 0.7) = -0.42365

ε = 0.5

α = 1/2 * ln(1- 0.5 / 0.5) = 0

Notice three interesting observations: 1) classifier with accuracy higher than 50% results in a positive weight for the classifier (in other words, α > 0 if ε <= 0.5), 2) classifier with exact 50% accuracy is 0, and thus, does not contribute to the final prediction, and 3) errors 0.3 and 0.7 lead to classifier weights with inverse signs.

(II)

AdaBoost can be sensitive to outliers / label noise since it is fitting a classification model (an added substance model) to an exponential loss function, and the exponential loss function is delicate to outliers/label noise.

Boosting technique learns progressively, it is important to ensure that you have quality data. AdaBoost is also extremely sensitive to Noisy data and outliers so if you do plan to use AdaBoost then it is highly recommended to eliminate them.

I.e., what this means is at each stage it is adding another weighted model to the overall classifier it is learning, with the overall objective of minimizing the exponential loss of the combined classifier on the training data. This figure comparing different loss functions (taken from the book mentioned below) helps illustrate why this exponential loss could be a problem for data with outliers / label noise:

The issue is punishments for misclassification develop dramatically with the size of the prescient capacity yield.

E.g., if something is profoundly in the positive class area yet really named as being in the negative class (because of mark commotion or being an exception), an ideal classifier's expectation f(x) for this occurrence x may bring about an extremely high certain worth, but since the real name is negative, this forecast would endure an enormous misfortune/punishment since the punishment is exponentiated (exp(- f(x)*y)). This implies this ideal classifier probably won't be the one we show up it utilizing the calculation, since it would look for one that limits the complete outstanding misfortune - so this one outlier/mislabeled point could wind up impacting the last model educated.

In a perfect world we may just need to punish a preparation occasion with a fixed worth in the event that it is misclassified (0-1 misfortune), instead of punishing it dramatically more for more prominent greatness of mis-forecast - anyway limiting 0-1 misfortune is commonly harder on the grounds that it normally results in non-curved improvement issues. There have been a few papers on utilizing different misfortune capacities with boosting that bring about less affectability to outliers and noise, similar to savage boost.


Related Solutions

Please discuss the answer with details: (Machine learning using CNN) I need those questions to answer...
Please discuss the answer with details: (Machine learning using CNN) I need those questions to answer for fruit detection system apply for project report. Please discuss broadly. 3.Why a fruit detection system using neural network (Cnn) is needed? 4.what Methods to use in a fruit detection system using neural network (Cnn)? How to use TensorFlow to make a fruit detection system using neural network (Cnn)
Answer the following questions about probabilities regarding the toss of two dice and their resulting sum....
Answer the following questions about probabilities regarding the toss of two dice and their resulting sum. The table below lists all possible outcomes and will be helpful! (a) (1 point) What’s the probability of rolling a sum that is anything other than 7? (b) (1 point) What’s the probability of rolling a sum of that is greater than or equal to 8? (c) (1 point) Given that Die 1 is a 5, what is the probability of rolling a sum...
Hi I have to answer following questions regarding my research paper. Note that this question is...
Hi I have to answer following questions regarding my research paper. Note that this question is about Coca Cola company operations in United Arab Emirate. 1) Who carries the bottling function for coke in United Arab Emirates? 2) Which product of coca cola has the highest sales and the sales about their other products in United Arab Emirates?
1. Define the concept of “Machine Learning”. 2. Summarise two applications of machine learning and the...
1. Define the concept of “Machine Learning”. 2. Summarise two applications of machine learning and the value it create.
answer the following questions based on the learning and your personal knowledge – with the concept...
answer the following questions based on the learning and your personal knowledge – with the concept of Best Practices at mind. Today, you are all marketing teams for your brand. You are working on the Promotion function: 1.How is your brand currently promoted (be specific)? Detail all of the ways (do some research) and show examples. 2.Create a new promotion strategy (using IMC principles) for the brand and choose a minimum of three tools in the promotion mix. Why did...
answer the following questions based on the learning and your personal knowledge – with the concept...
answer the following questions based on the learning and your personal knowledge – with the concept of Best Practices at mind. Your team has been charged with considering a proposal to open your own retail locations and need to determine the following: Presently, your distribution channel looks like this Manufacturer (you) sells to Wholesaler (grocery distributor in local area) who sells to the retailer or restaurant. You would be the first company of your type to go direct to the...
QUESTION 7 Use the following information to answer the next two questions: (Question 1 of 2)...
QUESTION 7 Use the following information to answer the next two questions: (Question 1 of 2) Higgins Company purchased specialized equipment on July 1, 2019, that cost $300,000, has a residual value of $40,000, and a useful life of four years. The amount of depreciation expense for 2020, under the double declining balance method is: A. $112,500. B. None of the above C. $97,500. D. $75,000. E. $125,000. 2 points    QUESTION 7 B Use the following information to answer...
Question Two Read the scenario and answer the following questions. Heather Williams is the manger and...
Question Two Read the scenario and answer the following questions. Heather Williams is the manger and owner of Perfect Beauty Hair & Cosmetics located on High Street. She has been an entrepreneur for the past five years and has learnt a great deal about human behavior. Some of her stylists are self- motivated. However. There are a few that are lazy and need constant supervision to perform their duties responsibly. There are also a few technicians who create conflict and...
Question no.2 (10) Read the details regarding “The Plaza Hotel” and answer the questions below: The...
Question no.2 (10) Read the details regarding “The Plaza Hotel” and answer the questions below: The Plaza Hotel is a 20-story luxury hotel and condominium apartment building in Midtown Manhattan in New York City. It opened in 1907 and is now owned by Katara Hospitality. The Plaza Hotel has many services including a butler on every floor, baby-sitting and concierges, a shopping mall, the Palm Court under the restored stained glass ceiling, the Champagne Bar located in the hotel lobby...
Please answer the following questions regarding the titration curve shown below.
Please answer the following questions regarding the titration curve shown below. (a) What is true about the analyte and the titrant for this titration? It is a weak acid analyte with a strong base titrant. (b) If the titrant has a molarity of 0.3000 M and there are 25.00 mL of analyte present, what is the molarity of the analyte? (c) What is the pKa of the analyte in this titration to the nearest 0.5?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT