In: Computer Science
Use the confusion matrixes to answer the questions below. Record your answers in a word document. Explain your responses and include screen shots where requested. A 5000 row dataset was used to build a model to predict if a person would accept a marketing offer for a personal loan (Personal Loan = Yes). It was partitioned into Training, Validation, and Test with the model results above.
If you were to decrease the cutoff to 0.3 how would that impact your FP and FN counts. Would they increase or decrease? Would your Model Cost increase or decrease?
All code with sample data:
# creating dataset of 7 features
import random
import pandas as pd
import numpy as np
n_rows = 5000 # number of rows
n_feat = 7 # number of features
features = list()
for loop1 in range(n_feat):
feat = list()
target = list()
for loop2 in range(n_rows):
feat.append(random.randint(-2,2))
target.append(random.randint(0,1)) # Personal Loan = Yes is 1 else 0
features.append(feat)
# creating a dataframe of features
data = pd.DataFrame()
for loop3 in range(len(features)):
data['feat_'+str(loop3+1)] = features[loop3]
data['target'] = target
data.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.iloc[:,:7], data.iloc[:,7:], test_size = 0.20)
# converting into array
X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = (y_train.values).ravel()
y_test = (y_test.values).ravel()
type(X_train)
# Apply logistic regression
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
clf = LogisticRegression(random_state=0).fit(X_train, y_train)
y_pred = clf.predict(X_test)
# confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
print('False Positive: ',cm[0][1])
print('False Negative: ',cm[1][0])
# Now, with threshold
threshold = 0.3
y = clf.predict_proba(X_test)
#defining own threshhold for prediction over probability values
def pred(y):
y_pred = []
i=[]
for i in y:
if i.any() >= threshold:
y_pred.append(1)
else:
y_pred.append(0)
return(y_pred)
y_pred = pred(y)
cm = confusion_matrix(y_test, y_pred)
cm
print('False Positive: ',cm[0][1])
print('False Negative: ',cm[1][0])
Results:
Data:
with threshold 0.5:
with threhold 0.3:
FP has increased and FN has decresed.