::opts_chunk$set(eval=params$answers, message=FALSE)
knitrlibrary(stylo)
library(psych)
library(factoextra)
library(dplyr)
For this lab you need the packages stylo
, psych
, and factoextra
.
The wine
data include 13 features of three different wine cultivars in Italy.
wine.Rdata
file in workspace, and display a summary of the wine
data.load("data/wine.Rdata")
summary(wine)
Cultivar
) with 1 to 6 to centers, respectively, and save the objects. Then plot the total within sum of squares of the six analyses to determine the optimal number of clusters.set.seed(1)
<- kmeans(scale(wine[, -1]), centers = 1, nstart = 10)
k_1 <- kmeans(scale(wine[, -1]), centers = 2, nstart = 10)
k_2 <- kmeans(scale(wine[, -1]), centers = 3, nstart = 10)
k_3 <- kmeans(scale(wine[, -1]), centers = 4, nstart = 10)
k_4 <- kmeans(scale(wine[, -1]), centers = 5, nstart = 10)
k_5 <- kmeans(scale(wine[, -1]), centers = 6, nstart = 10)
k_6
plot(1:6,
c(k_1$tot.withinss,
$tot.withinss,
k_2$tot.withinss,
k_3$tot.withinss,
k_4$tot.withinss,
k_5$tot.withinss),
k_6type = "b", ylab = "within clusters ss", xlab = "k")
Cultivar
of the wine
data. Did the cluster analysis recover the cultivars?table(obs = wine$Cultivar, est = k_3$cluster)
In the previous lab we did a PCA on the frequently used words used in the novels by the Bronthe sister and Jane Austin. In this lab we perform a k-means analysis on these data.
novels
data that produces the freqs
object.<- kmeans(scale(freqs), centers = 1)
k_1 <- kmeans(scale(freqs), centers = 2)
k_2 <- kmeans(scale(freqs), centers = 3)
k_3 <- kmeans(scale(freqs), centers = 4)
k_4 <- kmeans(scale(freqs), centers = 5)
k_5 <- kmeans(scale(freqs), centers = 6)
k_6
plot(1:6,
c(k_1$tot.withinss,
$tot.withinss,
k_2$tot.withinss,
k_3$tot.withinss,
k_4$tot.withinss,
k_5$tot.withinss),
k_6type = "b", ylab = "within clusters ss", xlab = "k")
freqs
object with 4 clusters, i.e. one cluster for each author. Tabulate the row names of freqs
(author_novel) against the cluster numbers. How well did the k-means solution do?table(rownames(freqs), k_3$cluster)
fviz_cluster
of the package factoextra
. To make this function work, you first have to convert the class of the freqs
object from stylo.data
to data.frame
.fviz_cluster(k_3, scale(as.data.frame(unclass(freqs))))
freqs
data, and plot the dendogram, and add rectangles for 4 clusters. Do the clusters correspond to the authors?<- hclust(dist(scale(freqs)))
cl plot(cl)
rect.hclust(cl, 4)
The data set Animals
contains the brain and body weights for 28 species of land animals. This data lends itself very well for making a taxonomy. We will perform a series of hierarchical cluster analyses with varying options.
animals.Rdata
in workspace, and display the rownames and the summary of the animals
data.load("animals.Rdata")
rownames(animals)
summary(animals)
animals
data, one with the default linkage method “complete”, and one one with “ward.D2” (if you like, you can also look at other linkage methods). What do you expect to see, given that you did not standardize the features?plot(hclust(dist(animals), method = "complete"), cex = .8)
plot(hclust(dist(animals), method = "ward.D2"), cex = .8)
plot(hclust(dist(scale(animals))), cex = .8)
plot(hclust(dist(scale(animals)), method = "ward.D2"), cex = .8)
boxplot(scale(animals))
animals
.<- animals %>% mutate_at(vars(bw, brw), log)
animals
plot(hclust(dist(scale(animals))), cex = .8)
plot(hclust(dist(scale(animals)), method = "ward.D2"), cex = .8)
END OF LAB