In: Computer Science
Apply PCA ( Principal Component Analysis ) in python to this data set below that is a csv file
Then plot it with different colors. Thank you I will UPVOTE!
target | A | B | C | D | E | F | G |
surprise | 2 | 3 | 1 | 1 | 19 | 12 | 0 |
sad | 2 | 0 | 0 | 2 | 12 | 1 | 15 |
angry | 95 | 2 | 1 | 0 | 1 | 0 | 1 |
sad | 4 | 56 | 2 | 0 | 0 | 3 | 1 |
neutral | 1 | 2 | 2 | 0 | 39 | 0 | 11 |
happy | 0 | 0 | 0 | 34 | 1 | 0 | 0 |
neutral | 5 | 55 | 0 | 0 | 0 | 2 | 1 |
sad | 0 | 33 | 3 | 0 | 0 | 12 | 1 |
happy | 0 | 5 | 2 | 0 | 18 | 15 | 2 |
angry | 0 | 0 | 0 | 19 | 37 | 0 | 0 |
happy | 0 | 1 | 0 | 68 | 17 | 2 | 0 |
Find the answer below
NOTE:- I have done the computation for 3 Principal components.
#Importing modules
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# Importing data
data = pd.read_csv('data.csv',sep='\t')
data.head()
#converting data into array
data_arr = data.to_numpy()
#removing the targets
final_data = np.delete(data_arr,0,1)
#Computing the 3 principal components
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(final_data)
#setting up 3 components on the dataframe
principalDf = pd.DataFrame(data=(principalComponents),
columns=['PC1', 'PC2','PC3'])
#setting up for plotting with targets
rarr = principalDf['PC1']
carr = principalDf['PC2']
tarr = principalDf['PC3']
target = data_arr[:,0]
#Plotting the principal components
plt.scatter(target,carr)
plt.scatter(target,rarr)
plt.scatter(target,tarr)
plt.legend("123")
Outputs:
Code Snippet attached below
NOTE- If you want to compute a different number of components do change in the In [5] with n_components = '<you want>'
Thanks