In this lab you perform a PCA to a personality test, a bag of words and photo’s of faces.
For this lab you need the packages stylo
and psych
, and the data sets in Profession.Rdata
and faces.RData
. The package stylo
includes nine novels by the Bronthe sisters and Jane Austin, and functions to compute the relative frequencies of the words used in these novels.
Warning for Mac OS users: the stylo
package requires X11 support being installed. For details about installing X11, see the github page stylo: Installation Issues. If you do not want to install X11, do not install the package stylo
, but instead load the file novels.Rdata
which contains the data frame freqs
that you need for the PCA.
::opts_chunk$set(eval=params$answers)
knitr
library(stylo) # for Mac OS; only when X11 is installed
library(psych)
library(dplyr)
The file “Profession.Rdata” contains the data set Profession
with preference scores of 5389 subjects on 48 professions. The professions are subdivided in 6 sets of 8, with each set corresponding to different personality type (e.g. items R1 to R8 are professions that are supposed to be appealing to personality type “R = Realistic (Doers)”). The code book for the data is Profession.txt
. The aim of the analysis is to find a principal component for each of the six personality types.
profession
data. How many PCs does the plot suggest?load("Profession.RData")
VSS.scree(Profession)
prof_pca
.<- principal(Profession, nfactors = 6, rotate = "none") prof_pca
loadings
of the first 6 components with the function print()
, and set the cutoff to 0.4. Do the loadings correspond to the personality types?print(loadings(prof_pca), cutoff=0.4)
prof_rot
, and display the rotated component loadings. Did the rotation help interpretation of the personality types?<- principal(Profession, nfactors = 6)
prof_rot print(loadings(prof_rot), cutoff = 0.4)
data.frame(mean = apply(prof_pca$scores, 2 , mean), sd = apply(prof_pca$scores, 2, sd)) %>% round(4)
<- as.data.frame(prof_rot$scores)
d names(d) <- c("Organizer", "Doer", "Thinker", "Creator", "Helper", "Persuader")
sample(5389, 5), ] %>% round(1) d[
In this exercise you use PCA to analyze word frequencies in nine novels by the the three Bronthe sisters and Jane Austin. The data for this analysis are in the novels
object of the package stylo
. (Mac OS users without a X11 installation should skip exercise a.)
novels
object with the command data()
, and summarize the object with the function summary()
.data(novels)
summary(novels)
freqs
. (Mac OS users without a X11 installation should instead load the file novels.Rdata
.)<- txt.to.words.ext(novels,
tokens preserve.case = FALSE)
<- make.frequency.list(tokens,
frequent head = 10)
<- make.table.of.frequencies(tokens,
freqs features = frequent)
# load("novels.Rdata") for Mac OS users without X11
attributes(freqs)
freqs
object with the function prcomp()
, and save the result.<- prcomp(freqs) author_pca
prcomp
object. According to the elbow criterion, how many PCs do we need to extract?screeplot(author_pca, type="line")
biplot(author_pca, cex = .8)
The data set for this exercise is called faces
and in the file faces.RData
. The data for this analysis consist of 2410 pictures of faces of which the pixels are converted to gray scales in the range \((0, 255)\). The pictures all have 8064 pixels \((96\times84)\). The aim of the analysis to is to create a reduced number of principal components that summarize stereotype faces, a.k.a eigenfaces.
load()
.load("faces.RData")
faces
.summary(faces[, 1:5])
par(mfrow=c(3,3))
for(i in sample(1:2410, size=9)){
<- matrix(rev(faces[,i]), nrow = 84, ncol = 96)
face image(face, col = gray(0:255 / 255 ))
}
faces
using prcomp()
(may take a while), and save the result.<- prcomp(faces) faces_pca
par(mfrow=c(3, 3))
for(i in 1:9){
<- matrix(rev(faces_pca$x[, i]), nrow = 84, ncol = 96)
face image(face, col = gray(0:255 / 255 ))
}
par(mfrow=c(3, 3))
for(i in 2402:2410){
<- matrix(rev(faces_pca$x[, i]), nrow = 84, ncol = 96)
face image(face, col = gray(0:255 / 255 ))
}
plot()
function, and also do this for the 2nd component. These plots do not have any sensible interpretation, but they just look nice!par(mfrow=c(1, 2))
plot(faces_pca$x[, 1])
plot(faces_pca$x[, 2])
END OF LAB