In: Computer Science
From the MNIST dataset introduced in class, write code in Python that
a) Splits the 42000 training images into a training set (50% of all the data) and a test set (the rest). The labels should also be split accordingly. [10 points]
b) Use this training set for training a multi-class logistic regression model and test it on the test set. Report the score. [10 points]
c) Use this training set for training a multi-class support vector machine model and test it on the test set. Report the score.
Part a :
I have taken 60000 instances instead of 42000 as given in question
because no of intances in training set of MNIST dataset =60000
only.
After that I have divided the dataset into 50 % train set and 50 %
test set as per given in question.
Code :
!pip3 install python-mnist
from mnist import MNIST
from sklearn.model_selection import train_test_split
mndata = MNIST('data')
print('dataset loading- trainng')
##Loading dataset from the drive as I have download the dataset and
store it in my drive
X,y = mndata.load('/content/drive/My
Drive/MNIST/train-images.idx3-ubyte','/content/drive/My
Drive/MNIST/train-labels.idx1-ubyte')
print("No of instances in data",len(X))
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.5, random_state=42)
print("No of instances in train data",len(X_train))
print("No of instances in test data",len(X_test))
Output :
Part b:
Code :
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore") # To ignore warnings
clf = LogisticRegression(random_state=0).fit(X_train, y_train) ##
Fitting the logistic model on train set
y_pred=clf.predict(X_train) # predicting value for train data
accuracy= accuracy_score(y_train, y_pred) # calculating accuracy
score for train data
print ("Train accuracy :",accuracy)
y_pred=clf.predict(X_test) # predicting value for test data
accuracy= accuracy_score(y_test, y_pred) # calculating accuracy
score for test data
print ("Test accuracy :",accuracy)
Output:
Training acuuracy using multi-class logistic regression
model = 94.01 %
Testing acuuracy using multi-class logistic regression
model = 91.37 %
Part c:
Code :
from sklearn import svm
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore") # To ignore warnings
clf = svm.SVC()
clf.fit(X_train, y_train) ## Fitting the svm model on train
set
y_pred=clf.predict(X_train) # predicting value for train data
accuracy= accuracy_score(y_train, y_pred) # calculating accuracy
score for train data
print ("Train accuracy :",accuracy)
y_pred=clf.predict(X_test) # predicting value for test data
accuracy= accuracy_score(y_test, y_pred) # calculating accuracy
score for test data
print ("Test accuracy :",accuracy)
Output :
Training acuuracy using
multi-class support vector machine model = 98.8 %
Testing acuuracy using multi-class support vector machine
modell = 97.27 %