In: Statistics and Probability
Stock-price data (weekly rate of return)
Week | Allied Chemical | Du Pont | Union Carbide | Exxon | Texaco |
1 | 0.000000 | 0.000000 | 0.000000 | 0.039473 | 0.000000 |
2 | 0.027027 | -0.044855 | -0.003030 | -0.014466 | 0.043478 |
3 | 0.122807 | 0.060773 | 0.088146 | 0.086238 | 0.078124 |
4 | 0.057031 | 0.029948 | 0.066808 | 0.013513 | 0.019512 |
5 | 0.063670 | -0.003793 | -0.039788 | -0.018644 | -0.024154 |
6 | 0.003521 | 0.050761 | 0.082873 | 0.074265 | 0.049504 |
7 | -0.045614 | -0.033007 | 0.002551 | -0.009646 | -0.028301 |
8 | 0.058823 | 0.041719 | 0.081425 | -0.014610 | 0.014563 |
9 | 0.000000 | -0.019417 | 0.002353 | 0.001647 | -0.028708 |
10 | 0.006944 | -0.025990 | 0.007042 | 0.041118 | -0.024630 |
... | ... | ... | ... | ... | ... |
91 | -0.044068 | 0.020704 | -0.006224 | -0.018518 | 0.004694 |
92 | 0.039007 | 0.03854 | 0.024988 | -0.028301 | 0.032710 |
93 | -0.039457 | -0.029297 | -0.065844 | -0.015837 | -0.045758 |
94 | 0.039568 | 0.024195 | -0.006608 | 0.028423 | -0.009661 |
95 | -0.031142 | -0.007941 | 0.011080 | 0.007537 | 0.014634 |
96 | 0.000000 | -0.020080 | -0.006579 | 0.029950 | -0.004807 |
97 | 0.021429 | 0.049180 | 0.006622 | -0.002421 | 0.028985 |
98 | 0.045454 | 0.046375 | 0.074561 | 0.014563 | 0.018779 |
99 | 0.050167 | 0.036380 | 0.004082 | -0.011961 | 0.009216 |
100 | 0.019108 | -0.033303 | 0.008342 | 0.033898 | 0.004566 |
using principal components, is the stock rates-of-return data can be summarized in fewer than five dimensions? Explain
Yes, it is possible to summarize the data by projecting it into fewer dimensions. In this case please use any statistical software like SPSS or R.
The eigen values from the correlation matrix from the data set can be evaluated as:
3.0718097, 0.8339257, 0.5030298, 0.3464342, 0.2448006
Sum of the eigen values is 5.
We can see, first three eigen values accumulate to ((3.0718097 + 0.8339257 + 0.5030298)/5)*100 = 88.18%.
Hence, the whole 5-dimensional data set can be projected to a 3-dimensional data with 88.18% accuracy.
Following R-script (just to run in any R console) can do this principal component analysis.
-----------------------------------------------------------------------------------------------------------
## Set the working directory into correct folder to import the data set.
# In working directory save the data as .csv file with name
'Stock Price Data'.
dat <- read.csv('Stock Price Data.csv')[-1]
## Find the correlation matrix.
S <- round(cor(dat), 3)
## Find the loading matrix to get three components which are
individual linear combinations
# of five original variables.
load <- t(as.matrix(eigen(S)$vectors[, c(1,2,3)]))
load
## Calculate the new projected data set in three
dimension.
new <- list()
for(i in 1:nrow(dat)){
new[[i]] <- t(load %*% t(as.matrix(dat[i, ])))
}
new.dat <- do.call('rbind', new)
colnames(new.dat) <- c('Y1', 'Y2', 'Y3')
new.dat
---------------------------------------------------------------------------------------------------------
See the newly projected data set as 'new.dat'.
If the variables are taken as:
X1 = Allied Chemical | X2 = Du Pont | X3 = Union Carbide | X4 = Exxon | X5 = Texaco |
From the loading matrix 'load', it can be noticed, the three new projected variables Y1, Y2, Y3 can be evaluated as:
Y1 = -0.4297153*X1 -0.4634440*X2 -0.50550704*X3 -0.3209922*X4 -0.49192587*X5
Y2 = 0.2810179*X1 + 0.3851993*X2 -0.06884783*X3 -0.8756546*X4 + 0.03375592*X5
Y3 = 0.8501618*X1 -0.2965287*X2 -0.34006214*X3 + 0.1607006*X4 -0.21869750*X5
Y1, Y2, Y3 construct the new projected data.
Further:
The whole data can be projected into 4-dimension also with accuracy of
((3.0718097 + 0.8339257 + 0.5030298 + 0.3464342)/5)*100 = 95.1%
In this case R-script will be:
----------------------------------------------------------------------------------------------------------
## Find the loading matrix to get four components which are
individual linear combinations
# of five original variables.
load <- t(as.matrix(eigen(S)$vectors[, c(1,2,3,4)]))
load
## Calculate the new projected data set in four dimension.
new <- list()
for(i in 1:nrow(dat)){
new[[i]] <- t(load %*% t(as.matrix(dat[i, ])))
}
new.dat <- do.call('rbind', new)
colnames(new.dat) <- c('Y1', 'Y2', 'Y3', 'Y4')
new.dat
---------------------------------------------------------------------------------------------------------
In this case four projected variables will be
Y1 = -0.4297153*X1 -0.4634440*X2 -0.50550704*X3 -0.3209922*X4 -0.49192587*X5
Y2 = 0.2810179*X1 + 0.3851993*X2 -0.06884783*X3 -0.8756546*X4 + 0.03375592*X5
Y3 = 0.8501618*X1 -0.2965287*X2 -0.34006214*X3 + 0.1607006*X4 -0.21869750*X5
Y4 = 0.01077335*X1 -0.6106148*X2 -0.03688498*X3 -0.2331131*X4 + 0.75586462*X5
Y1, Y2, Y3, Y4 construct the new projected data.
N.B. Values may differ slightly during calculation, as this data contains only 20 rows and all calculations have been done on the basis of this 20x6 matrix. Although the results will be the same and exactly same decisions can be made from the original data.