Question

In: Computer Science

How do outliers affect PC scores? Perform a PCA on the board stiffness dataset with and...

How do outliers affect PC scores? Perform a PCA on the board stiffness dataset with and without detected outliers.

Solutions

Expert Solution

PCA(Principal component analysis) is used for reducing dimensions of the dataset. It helps us to identify datapoints that are different from rest(outliers). When the data have outliers it will lead PCA to misleading conclusions.

#Installing all the required libraries

install.packages("SMLoutliers")
library(SMLoutliers)
data("stiff")
View(stiff)

par(mfrow = c(2, 2))

hist(stiff$x1, breaks = 20) #you can see the outlier.
max(stiff$x1) #this x1 value is the outlier
#[1] 2983

hist(stiff$x2, breaks = 20)
max(stiff$x2)
#[1] 2794

hist(stiff$x3, breaks = 20)
max(stiff$x3)
##[1] 2412

hist(stiff$x4, breaks = 20)
max(stiff$x4)
##[1] 2581  

#PCA WITH OUTLIERS
stiff.pca <- prcomp(stiff, center = TRUE, scale = TRUE)
print(stiff.pca)
plot(stiff.pca, type='l') 

#ploting part
install.packages("ggfortify")
library(ggplot2)
library(ggfortify)

autoplot(stiff.pca) #you can see only one PC giving us 90% of whole data variability.


#PCA without Outlier

#just cut the 9th row, take rest of the data.

data=stiff[-9,]

data.pca <- prcomp(data, center = TRUE, scale = TRUE)
print(data.pca)
plot(data.pca, type='l')

autoplot(data.pca)

#now we can see PC1 giving us only 85.4% variability, which was 90% before.

#So,from the above plot we can see that outlier makes a difference in PCA 

Output Screenshot


Related Solutions

Download the dataset CARS1 from BlackBoard. a. Do not worry about outliers. Assume the data is...
Download the dataset CARS1 from BlackBoard. a. Do not worry about outliers. Assume the data is correct and any outliers will remain in the dataset. b. Do scatterplot and analyze the results. c. Test for correlation (correlation coefficient) d. Regress weight (column 2) against gas mileage in the city (column 1). Make sure you make gas mileage the dependent (Y) variable. e. Determine and fully explain R2 MPG City Weight 19 3545 23 2795 23 2600 19 3515 23 3245...
What is an outlier? How would you scan for outliers in your dataset? What would you...
What is an outlier? How would you scan for outliers in your dataset? What would you do with data points that are considered outliers? [6 Points]
13a) Compute z-scores for the Sale Price variable. Do you note any outliers? 13b) Is there...
13a) Compute z-scores for the Sale Price variable. Do you note any outliers? 13b) Is there a relationship between Lot Size and the home's Age in years? What test do you perform and why? Now check for whether there is a difference in Lot Size for older versus younger homes (using a cutoff that makes sense). What test do you perform and why? Home ID Sale Price Lot Size Age Central Air Living Area Full Baths Half Baths Bedrooms Fireplaces...
How do store closures affect wages and spending? How do these factors affect employment? How do...
How do store closures affect wages and spending? How do these factors affect employment? How do these factors affect the demand curve? Will these factors cause the demand curve to move or shift? Explain.
How does the composition of the board of directors affect the the strategic decision making?
How does the composition of the board of directors affect the the strategic decision making?
Research topic the cause of stress and how it affect employees in the workplace to perform....
Research topic the cause of stress and how it affect employees in the workplace to perform. Help me write Literature Review
How do we choose the shape of PC sections for a given beam?
How do we choose the shape of PC sections for a given beam?
Materials Science: Composites (Chapter 16) How do we predict the stiffness and strength of the various...
Materials Science: Composites (Chapter 16) How do we predict the stiffness and strength of the various types of composites?
How do you graph outliers on a box plot when given a data set of numbers?...
How do you graph outliers on a box plot when given a data set of numbers? I found the median, lower and upper quartile numbers and have already plotted that but how do you plot outliers? Lets say the data is :1 2 3 4 5 6 7 8 9
Teaching how to do a tube feeding. How to perform an accucheck.
Teaching how to do a tube feeding. How to perform an accucheck.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT