In: Computer Science
Apply PCA ( Principal Component Analysis ) in python to this data set below that is a csv file
Then plot it. Thank you I will UPVOTE!
A | B | C | D | E | F | G |
2 | 3 | 1 | 1 | 19 | 12 | 0 |
2 | 0 | 0 | 2 | 12 | 1 | 15 |
95 | 2 | 1 | 0 | 1 | 0 | 1 |
4 | 56 | 2 | 0 | 0 | 3 | 1 |
1 | 2 | 2 | 0 | 39 | 0 | 11 |
0 | 0 | 0 | 34 | 1 | 0 | 0 |
5 | 55 | 0 | 0 | 0 | 2 | 1 |
0 | 33 | 3 | 0 | 0 | 12 | 1 |
0 | 5 | 2 | 0 | 18 | 15 | 2 |
0 | 0 | 0 | 19 | 37 | 0 | 0 |
0 | 1 | 0 | 68 | 17 | 2 | 0 |
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv(".\\Downloads\\Book1.csv")
data
A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
0 | 2 | 3 | 1 | 1 | 19 | 12 | 0 |
1 | 2 | 0 | 0 | 2 | 12 | 1 | 15 |
2 | 95 | 2 | 1 | 0 | 1 | 0 | 1 |
3 | 4 | 56 | 2 | 0 | 0 | 3 | 1 |
4 | 1 | 2 | 2 | 0 | 39 | 0 | 11 |
5 | 0 | 0 | 0 | 34 | 1 | 0 | 0 |
6 | 5 | 55 | 0 | 0 | 0 | 2 | 1 |
7 | 0 | 33 | 3 | 0 | 0 | 12 | 1 |
8 | 0 | 5 | 2 | 0 | 18 | 15 | 2 |
9 | 0 | 0 | 0 | 19 | 37 | 0 | 0 |
10 | 0 | 1 | 0 | 68 | 17 | 2 | 0 |
from sklearn.preprocessing import StandardScaler
features = ['A', 'B', 'C', 'D','E','F','G']
x = data.loc[:, features].values
x = StandardScaler().fit_transform(data)
x
array([[-0.2933703 , -0.52600014, 0. , -0.4944455 , 0.42515456, 1.41041204, -0.59732271], [-0.2933703 , -0.66598404, -0.95742711, -0.44631364, -0.07849007, -0.59735098, 2.48262253], [ 3.15625978, -0.57266144, 0. , -0.54257737, -0.86993163, -0.77987489, -0.39199303], [-0.21918471, 1.94704889, 0.95742711, -0.54257737, -0.94188086, -0.23230316, -0.39199303], [-0.3304631 , -0.57266144, 0.95742711, -0.54257737, 1.8641392 , -0.77987489, 1.6613038 ], [-0.36755589, -0.66598404, -0.95742711, 1.09390598, -0.86993163, -0.77987489, -0.59732271], [-0.18209191, 1.90038759, -0.95742711, -0.54257737, -0.94188086, -0.41482707, -0.39199303], [-0.36755589, 0.87383894, 1.91485422, -0.54257737, -0.94188086, 1.41041204, -0.39199303], [-0.36755589, -0.43267753, 0.95742711, -0.54257737, 0.35320532, 1.95798377, -0.18666335], [-0.36755589, -0.66598404, -0.95742711, 0.37192803, 1.72024074, -0.77987489, -0.59732271], [-0.36755589, -0.61932274, -0.95742711, 2.73038932, 0.28125609, -0.41482707, -0.59732271]])
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
principalDf
principal component 1 | principal component 2 | |
---|---|---|
0 | 0.535223 | 0.707044 |
1 | -1.236728 | 1.131952 |
2 | 0.073237 | -1.639883 |
3 | 1.898497 | -0.795800 |
4 | -0.773920 | 2.381402 |
5 | -1.238889 | -1.292917 |
6 | 0.811656 | -1.446545 |
7 | 2.613050 | 0.294014 |
8 | 1.288402 | 1.329610 |
9 | -1.799728 | 0.362192 |
10 | -2.170799 | -1.031071 |
plt.scatter(principalDf['principal component
1'],principalDf['principal component 2'])
plt.xlabel("principal component 1")
plt.ylabel("principal component 2")
plt.show()