Question

In: Computer Science

Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a...

Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a csv file

Then plot it. Thank you I will UPVOTE!

A B C D E F G
2 3 1 1 19 12 0
2 0 0 2 12 1 15
95 2 1 0 1 0 1
4 56 2 0 0 3 1
1 2 2 0 39 0 11
0 0 0 34 1 0 0
5 55 0 0 0 2 1
0 33 3 0 0 12 1
0 5 2 0 18 15 2
0 0 0 19 37 0 0
0 1 0 68 17 2 0

Solutions

Expert Solution

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv(".\\Downloads\\Book1.csv")

data

A B C D E F G
0 2 3 1 1 19 12 0
1 2 0 0 2 12 1 15
2 95 2 1 0 1 0 1
3 4 56 2 0 0 3 1
4 1 2 2 0 39 0 11
5 0 0 0 34 1 0 0
6 5 55 0 0 0 2 1
7 0 33 3 0 0 12 1
8 0 5 2 0 18 15 2
9 0 0 0 19 37 0 0
10 0 1 0 68 17 2 0

from sklearn.preprocessing import StandardScaler

features = ['A', 'B', 'C', 'D','E','F','G']

x = data.loc[:, features].values

x = StandardScaler().fit_transform(data)

x

array([[-0.2933703 , -0.52600014,  0.        , -0.4944455 ,  0.42515456,
         1.41041204, -0.59732271],
       [-0.2933703 , -0.66598404, -0.95742711, -0.44631364, -0.07849007,
        -0.59735098,  2.48262253],
       [ 3.15625978, -0.57266144,  0.        , -0.54257737, -0.86993163,
        -0.77987489, -0.39199303],
       [-0.21918471,  1.94704889,  0.95742711, -0.54257737, -0.94188086,
        -0.23230316, -0.39199303],
       [-0.3304631 , -0.57266144,  0.95742711, -0.54257737,  1.8641392 ,
        -0.77987489,  1.6613038 ],
       [-0.36755589, -0.66598404, -0.95742711,  1.09390598, -0.86993163,
        -0.77987489, -0.59732271],
       [-0.18209191,  1.90038759, -0.95742711, -0.54257737, -0.94188086,
        -0.41482707, -0.39199303],
       [-0.36755589,  0.87383894,  1.91485422, -0.54257737, -0.94188086,
         1.41041204, -0.39199303],
       [-0.36755589, -0.43267753,  0.95742711, -0.54257737,  0.35320532,
         1.95798377, -0.18666335],
       [-0.36755589, -0.66598404, -0.95742711,  0.37192803,  1.72024074,
        -0.77987489, -0.59732271],
       [-0.36755589, -0.61932274, -0.95742711,  2.73038932,  0.28125609,
        -0.41482707, -0.59732271]])

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])

principalDf

principal component 1 principal component 2
0 0.535223 0.707044
1 -1.236728 1.131952
2 0.073237 -1.639883
3 1.898497 -0.795800
4 -0.773920 2.381402
5 -1.238889 -1.292917
6 0.811656 -1.446545
7 2.613050 0.294014
8 1.288402 1.329610
9 -1.799728 0.362192
10 -2.170799 -1.031071

plt.scatter(principalDf['principal component 1'],principalDf['principal component 2'])
plt.xlabel("principal component 1")
plt.ylabel("principal component 2")
plt.show()


Related Solutions

Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a...
Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a csv file Then plot it with different colors. Thank you I will UPVOTE! target A B C D E F G surprise 2 3 1 1 19 12 0 sad 2 0 0 2 12 1 15 angry 95 2 1 0 1 0 1 sad 4 56 2 0 0 3 1 neutral 1 2 2 0 39 0 11 happy 0 0...
I already ran PCA on data given in last Principal Component Analysis: Energy_kcal, Protein_g, Fat_g, Carb_g...
I already ran PCA on data given in last Principal Component Analysis: Energy_kcal, Protein_g, Fat_g, Carb_g Eigenanalysis of the Correlation Matrix Eigenvalue 2.2504 1.1894 0.5583 0.0018 Proportion 0.563 0.297 0.140 0.000 Cumulative 0.563 0.860 1.000 1.000 Variable PC1 PC2 PC3 PC4 Energy_kcal 0.663 0.090 -0.028 -0.743 Protein_g 0.399 -0.578 0.663 0.261 Fat_g 0.604 0.027 -0.563 0.564 Carb_g 0.191 0.811 0.494 0.250 I ned answers to these two parts State principal components as linear combination of given set of variables. Explain...
Are there any limitations to using PCA?Principal Components Analysis (PCA)
Are there any limitations to using PCA?Principal Components Analysis (PCA)
What are two benefits of using Principal Components Analysis (PCA)?
What are two benefits of using Principal Components Analysis (PCA)?
What are cross loadings in principal component analysis? What is an unrotated vs a rotated factor...
What are cross loadings in principal component analysis? What is an unrotated vs a rotated factor loading?
What are cross loadings in principal component analysis? What is an unrotated vs a rotated factor...
What are cross loadings in principal component analysis? What is an unrotated vs a rotated factor loading?
1a)state the properties of characteristics roots 1b)state the features of principal component analysis 1c)what are the...
1a)state the properties of characteristics roots 1b)state the features of principal component analysis 1c)what are the aims of principal component analysis 1d)given the vector X1 and X2 from a multivariate normal population, define the conditional distribution of X1
(20 pts) Use the “Distance.sav” (SPSS) data set (located below) to perform a linear regression analysis....
(20 pts) Use the “Distance.sav” (SPSS) data set (located below) to perform a linear regression analysis. This dataset shows how far on average a person in Illinois drives each year. Write your findings using the format presented in the class slides. (2 pts) How much of the variation in the dependent variable is explained by the variation in the independent variable? What statistic did you use? (2 pts) Is the linear model significantly different than zero? Why or why not?...
What are the principal aspects of data that need to be examined when using multivariate analysis?
What are the principal aspects of data that need to be examined when using multivariate analysis?
Apply the classification algorithm to the following set of data records. Draw a decision tree. The...
Apply the classification algorithm to the following set of data records. Draw a decision tree. The class attribute is Repeat Customer. RID Age City Gender Education Repeat Customer 101 20..30 NY F College YES 102 20..30 SF M Graduate YES 103 31..40 NY F College YES 104 51..60 NY F College NO 105 31..40 LA M High school NO 106 41..50 NY F College YES 107 41..50 NY F Graduate YES 108 20..30 LA M College YES 109 20..30 NY...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT