Question

In: Computer Science

Stochastic Gradient Ascent (SGA) for Logistic Regression. In the exer- cise, you will implement logistic regression...

Stochastic Gradient Ascent (SGA) for Logistic Regression. In the exer- cise, you will implement logistic regression algorithm using SGA, similar to the logistic regression algorithm that you have seen in class. You will work with the datasets attached to the assignment and complete the lo- gisticRegression.py file to learn the coefficients and predict binary class labels. The data comes from breast cancer diagnosis where each sample (30 features) is labeled by a diagnose: either M (malignant) or B (be- nign) (recorded in the 31-st column in the datasets). Read the main code, check the configuration parameters, and make sure the data is loaded and augmented correctly. Do not use logistic regression packages.
(a) [15 pts.] Complete the function predict(x, w), gradient(x, y, w), and cross entropy(y hat, y) functions according to the instructions in lo- gisticRegression.py. These functions will be used in the main SGA algorithm (logisticRegression SGA).
(b) [15 pts.] Complete the logisticRegression SGA(X, y, psi, epsilon, epochs) function. In class, we used a stopping criterion for repeat loop in the SGA algorithm for logistic regression: the loop contin- ued until the norm of difference between w’s in consecutive itera- tions were less than a predefined number ε (line 11 of the algorithm:∥w ̃t − w ̃t−1∥ ≤ ε). Here, in addition to this criterion, we would like to limit the number of epochs (iterations over the whole dataset, or t (step/iteration number) in the algorithms in the slides) to a pre- defined number (max epochs). In the main function, max epochs is initialized to 8.
(c) [15 pts.] Complete the rest of the main code to use the learned w to predict class labels for test datapoints (that has not been used for learning the w’s) and to calculate and print the average cross-entropy error for training and testing data.
(d) [15 pts.] Run the code on the cancer dataset with different psi and epsilon values. Check the change in cross-entropy values across itera- tions (in the plot) and the average training and testing cross-entropy errors. What do you observe about the losses and number of itera- tions? What do you conclude?

Solutions

Expert Solution

1).ANSWER:

GIVEN BELOW:

coefficients = np.array(initial_coefficients)
np.random.seed(seed=1)
permutation = np.random.permutation(len(feature_matrix))
feature_matrix = feature_matrix[permutation,:]
sentiment = sentiment[permutation]
i = 0
for itr in xrange(max_iter):
predictions = predict_probability(feature_matrix[i:(i+batch_size),:], coefficients)
indicator = (sentiment[i:i+batch_size]==+1)
errors = indicator - predictions

for j in xrange(len(coefficients)):
derivative = feature_derivative(errors, feature_matrix[i:i+batch_size,j])
coefficients[j] += step_size * derivative * 1. / batch_size


lp = compute_avg_log_likelihood(feature_matrix[i:i+batch_size,:], sentiment[i:i+batch_size],
coefficients)
log_likelihood_all.append(lp)
if itr <= 15 or (itr <= 1000 and itr % 100 == 0) or (itr <= 10000 and itr % 1000 == 0) \
or itr % 10000 == 0 or itr == max_iter-1:
data_size = len(feature_matrix)
print 'Iteration %*d: Average log likelihood (of data points [%0*d:%0*d]) = %.8f' % \
(int(np.ceil(np.log10(max_iter))), itr, \
int(np.ceil(np.log10(data_size))), i, \
int(np.ceil(np.log10(data_size))), i+batch_size, lp)

i += batch_size
if i+batch_size > len(feature_matrix):
permutation = np.random.permutation(len(feature_matrix))
feature_matrix = feature_matrix[permutation,:]
sentiment = sentiment[permutation]
i = 0
return coefficients, log_likelihood_all

sample_feature_matrix = np.array([[1.,2.,-1.], [1.,0.,1.]])
sample_sentiment = np.array([+1, -1])
coefficients, log_likelihood = logistic_regression_SG(sample_feature_matrix, sample_sentiment, np.zeros(3), step_si
###

Now run batch gradient ascent over the feature_matrix_train for 200 iterations using:

initial_coefficients = np.zeros(194)

step_size = 5e-1

batch_size = len(feature_matrix_train)

max_iter = 200

coefficients_batch, log_likelihood_batch = logistic_regression_SG(feature_matrix_train, sentiment_train,\

initial_coefficients=np.zeros(194),\

step_size=5e-1, batch_size=len(feature_matrix_train), max_iter=200)

Iteration 0: Average log likelihood (of data points [00000:47780]) = -0.68308119

Iteration 1: Average log likelihood (of data points [00000:47780]) = -0.67394599

Iteration 2: Aver
age log likelihood (of data points [00000:47780]) = -0.66555129

Iteration 3: Average log likelihood (of data points [00000:47780]) = -0.65779626

Iteration 4: Average log likelihood (of data points [00000:47780]) = -0.65060701

Iteration 5: Average log likelihood (of data points [00000:47780]) = -0.64392241

Iteration 6: Average log likelihood (of data points [00000:47780]) = -0.63769009

Iteration 7: Average log likelihood (of data points [00000:47780]) = -0.63186462

Iteration 8: Average log likelihood (of data points [00000:47780]) = -0.62640636

Iteration 9: Average log likelihood (of data points [00000:47780]) = -0.62128063

Iteration 10: Average log likelihood (of data points [00000:47780]) = -0.61645691

Iteration 11: Average log likelihood (of data points [00000:47780]) = -0.61190832

Iteration 12: Average log likelihood (of data points [00000:47780]) = -0.60761103

Iteration 13: Average log likelihood (of data points [00000:47780]) = -0.60354390

Iteration 14: Average log likelihood (of data points [00000:47780]) = -0.59968811

Iteration 15: Average log likelihood (of data points [00000:47780]) = -0.59602682

Iteration 100: Average log likelihood (of data points [00000:47780]) = -0.49520194

Iteration 199: Average log likelihood (of data points [00000:47780]) = -0.47126953

plt.plot(log_likelihood_batch)

plt.show()


Related Solutions

Stochastic Gradient Ascent (SGA) for Logistic Regression. In the exer- cise, you will implement logistic regression...
Stochastic Gradient Ascent (SGA) for Logistic Regression. In the exer- cise, you will implement logistic regression algorithm using SGA, similar to the logistic regression algorithm that you have seen in class. You will work with the datasets attached to the assignment and complete the lo- gisticRegression.py file to learn the coefficients and predict binary class labels. The data comes from breast cancer diagnosis where each sample (30 features) is labeled by a diagnose: either M (malignant) or B (be- nign)...
In the exer- cise, you will implement logistic regression algorithm using SGA, similar to the logistic...
In the exer- cise, you will implement logistic regression algorithm using SGA, similar to the logistic regression algorithm that you have seen in class. You will work with the datasets attached to the assignment and complete the lo- gisticRegression.py file to learn the coefficients and predict binary class labels. The data comes from breast cancer diagnosis where each sample (30 features) is labeled by a diagnose: either M (malignant) or B (be- nign) (recorded in the 31-st column in the...
Logistic Regression In logistic regression we are interested in determining the outcome of a categorical variable....
Logistic Regression In logistic regression we are interested in determining the outcome of a categorical variable. In most cases, we deal with binomial logistic regression with the binary response variable, for example yes/no, passed/failed, true/false, and others. Recall that logistic regression can be applied to classification problems when we want to determine a class of an event based on the values of its features.    In this assignment we will use the heart data located at   http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29 Here is the...
When should logistic regression be used for data analysis? What is the assumption of logistic regression?...
When should logistic regression be used for data analysis? What is the assumption of logistic regression? How to explain odds ratio?
compute the first three iterations of the gradient ascent algorithm applied to the function f(x) =...
compute the first three iterations of the gradient ascent algorithm applied to the function f(x) = -0.2 + x + x^2 - 5.5x^3 +4x^4. Assume initial value for x0 = 0.11 and alpha = 0.1.
define the logistic regression model.
define the logistic regression model.
What is the main purpose of logistic regression? Do you know other regression that can provide...
What is the main purpose of logistic regression? Do you know other regression that can provide similar estimates?
how would you check the robustness and validity of results in logistic regression
how would you check the robustness and validity of results in logistic regression
If a dependent variable is binary, is it optimal to use linear regression or logistic regression?...
If a dependent variable is binary, is it optimal to use linear regression or logistic regression? Explain your answer and include the theoretical and practical concerns associated with each regression model. Provide a business-related example to illustrate your ideas.
What is binary logistic regression, and how to use it?
What is binary logistic regression, and how to use it?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT