In: Computer Science
Question. Programming question: Dimension Reduction
In this question, you are asked to run Singular Value Decomposition
(SVD) on Fashion-MNIST data set, interpret the output and train
generative classifiers for multi- nomial classification of 10
classes. For the Fashion-MNIST data set, you can find more details
in the original GitHub website or Kaggle website. In this
assignment, you are allowed to use a library implementation of SVD.
For python users, we recommend scikit-learn’s implementation
TruncatedSVD.
Tasks:
?Load the training and test data sets from fashion-mnist train.csv and fashion- mnist test.csv. Each row uses a vector of dimension 784 with values between 0 (black) and 255 (white) on the gray color scale.
?Guidelines:
In this homework, you are allowed to use scikit-learn’s
implementations Multinomial Logistic Regression, Naive Bayes, and
KNN directly
Structure of my ans:
1. Have provided the images of the jupyter notebook.
2. At the end entire code is provided in single code cell.
Jupyter notebook Images:
The data was downloaded from kaggle.
The data needs to be normalized to speed to training.
Note: n_component should be treated as a hyperparameter. The elbow method is used to determine the optimal value. Lower the value the model might miss on important features.
A value above 500 reduces nb accuracy so a I have stuck with 300.
Training models :
value of n_neighbours can be played with to measure the models performance.
Code:
from sklearn.decomposition import TruncatedSVD
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
train_data = pd.read_csv('fashion_mnist/fashion-mnist_train.csv')
test_data = pd.read_csv('fashion_mnist/fashion-mnist_test.csv')
train_data.head()
y = train_data.iloc[:,0].values
y.shape
x = train_data.iloc[:,1:].values
x.shape
# loading test data
y_test = test_data.iloc[:,0].values
x_test = test_data.iloc[:,1:].values
y_test.shape,x_test.shape
x = x / 255.0
x_test = x_test / 255.0
svd = TruncatedSVD(n_components=300)
svd.fit(x)
x_new = svd.transform(x)
x_test_new = svd.transform(x_test)
x_test_new.shape
#### training models
clf = LogisticRegression(C = 10,max_iter = 500)
clf.fit(x_new,y)
print(f'accuracy on test data = {clf.score(x_test_new,y_test)}')
clf = GaussianNB()
clf.fit(x_new,y)
print(f'accuracy on test data = {clf.score(x_test_new,y_test)}')
clf = KNeighborsClassifier(n_neighbors=7,n_jobs=-1)
clf.fit(x_new,y)
print(f'accuracy on test data = {clf.score(x_test_new,y_test)}')
Please UpVote if you find this helpful.