::opts_chunk$set(eval=params$answers, message=FALSE)
knitrlibrary(ISLR)
library(tidyverse)
library(glmnet)
library(caret)
library(gridExtra)
For this lab you need the packages ISLR
, ggplot2
, glmnet
, caret
, and gridExtra
.
The data set for this lab is College
, check the help page for its content.
The lab is structured as follows:
Explore, normalize and standardize the data
Split data in train/test set
Train filter, wrapper and embedded methods, and compare test MSE
Introduce two-way interactions effects, and train the lasso
summary(College)
<- College %>%
College filter(Grad.Rate <= 100,
<= 100) PhD
Private
into a numeric variable), and display the boxplots of the standardized variables in a single plot array.%>%
College mutate(Private = as.numeric(Private),
across(everything(), scale)) %>%
pivot_longer(everything(), values_to = "z-score", names_to = "variable") %>%
ggplot(aes(`z-score`, variable)) +
geom_boxplot() +
theme_minimal()
College
).<- College %>%
College mutate(Private = as.numeric(Private),
across(c(PhD, Top25perc, Terminal), ~{101 - .}),
across(c(2, 3, 4, 5, 7, 8, 11, 12, 13, 14, 17), ~ log(. + 1)),
across(everything(), scale))
%>%
College pivot_longer(everything(), values_to = "z-score", names_to = "variable") %>%
ggplot(aes(`z-score`, variable)) +
geom_boxplot() +
theme_minimal()
Grad.Rate
on the y-axis and the respective 16 numeric predictors on x-axis (so exclude Private
). Check the scatterplots for non-linearity.par(mfrow = c(4, 4))
for(i in 2:17)plot(College[, c(i, 18)], cex = .7)
train
with the row numbers for the train set containing of 75% of the cases. Set the seed to 1 for reproducibility of the results.set.seed(1)
<- createDataPartition(y = College$Grad.Rate, p = .75, list = FALSE) train
Filter methods select a set of features before training the model. For these we will select features from the training set based on a minimum correlation with Grad.Rate
. As selection criterion we use an absolute correlation larger than 0.3.
Grad.Rate
). Which features satisfy the criterion?cor(College[train, ])[-18, 18] %>% round(3)
vars
containing the names of the features that satisfy our selection criterion.<- which(abs(cor(College[train, ])[-18, 18]) > .3) vars
Grad.Rate
, and save the object as train_filter
.<- lm(Grad.Rate ~ ., data = College[train, c(vars, 18)]) train_filter
train_filter
. Although all predictors have an absolute correlation of at least .3 with Grad.rate
, some of them are not significant as predictor? Why is that?summary(train_filter)
<- predict(train_filter, newdata = College[-train, ]) filter_pred
filter_mse
.<- mean((College[-train, "Grad.Rate"] - filter_pred)^2) filter_mse
Wrapper methods select features while training the model. Here we use the backward step-wise selection procedure. Check the help page of the function step()
for assistance on how to conduct a backward step-wise selection.
step()
procedure. Save the result under the name train_step
.<- step(lm(Grad.Rate ~ ., College[train, ])) train_step
summary(train_step)
<- predict(train_step, newdata = College[-train, ]) pred_step
step_mse
.<- mean((College[-train, "Grad.Rate"] - pred_step)^2)) (step_mse
In the embedded method, the feature selection process is embedded in the model training phase. We use the function glmnet()
to fit the lasso and ridge models to the training data. This function computes the deviance for a sequence of values of the hyperparameter lambda
, which determines the budget for the coefficients. The function cv.glmnet()
is then applied to object created with glmnet()
to determine the optimal value for lambda
with cross-validation.
train_lasso
and train_ridge
with glmnet()
. Set the argument alpha = 1
for the lasso, and alpha = 0
for the ridge. Do not forget that the argument x
has to be of class matrix
!<- glmnet(x = as.matrix(College[train, -18]),
train_ridge y = College[train, 18],
alpha = 0)
<- glmnet(x = as.matrix(College[train, -18]),
train_lasso y = College[train, 18],
alpha = 1)
glmnet
objects. Interpret the plots.par(mfrow=c(2,1))
plot(train_ridge, label = T)
plot(train_lasso, label = T)
cv.glmnet()
to find the optimal lambda
values for alpha=0
an alpha=1
. Include the argument type.measure = "mse"
. Save the objects as cv_ridge
and cv_lasso
.<- cv.glmnet(x = as.matrix(College[train, -18]),
cv_ridge y = College[train, 18],
alpha = 0,
type.measure = "mse")
<- cv.glmnet(x = as.matrix(College[train, -18]),
cv_lasso y = College[train, 18],
alpha = 1,
type.measure = "mse")
cv.glmnet
objects, and interpret.par(mfrow=c(1, 2))
plot(cv_ridge)
plot(cv_lasso)
cv_ridge
and cv_lasso
in the columns. Round all coefficients to three decimals. Compare the effects of the shrinkage.round(cbind(coef(lm(Grad.Rate ~ . , College[train, ])), coef(cv_ridge), coef(cv_lasso)), 3)
predict.cv.glmnet()
- and compute and the test MSE’s for the ridge and lasso.<- predict(cv_ridge, newx = as.matrix(College[-train, -18]))
pred_ridge
<- predict(cv_lasso, newx = as.matrix(College[-train, -18]))
pred_lasso
<- mean((College[-train, "Grad.Rate"] - pred_ridge)^2)
ridge_mse
<- mean((College[-train, "Grad.Rate"] - pred_lasso)^2) lasso_mse
step
function, and the ridge and lasso. Which method performs work best on the test data.data.frame(filter_mse, step_mse, ridge_mse, lasso_mse)
So far we have not considered any pairwise interactions in our models. It may very well be that that the effect of one feature on Grad.rate
is moderated by another feature. The problem with finding relevant interactions is that there are a lot of them (\({17\choose2}=136\)). The lasso, however, is particularly suited for such a task.
Grad.Rate ~ .^2
for the training set (excluding the intercept), and cross-validate the lasso version of this model with cv.glmnet()
. Save the the result as cv_lasso2
<- cv.glmnet(x = model.matrix(Grad.Rate ~ .^2, College)[train, -1],
cv_lasso2 y = College$Grad.Rate[train],
alpha = 1,
type.measure = "mse")
lambda
?plot(cv_lasso2)
round(coef(cv_lasso2) , 3)
cv_lasso2
. How does it compare to the models without interactions?<- predict(cv_lasso2, newx = model.matrix(Grad.Rate ~ .^2, College)[-train, -1])
pred_lasso2 <- mean((College[-train, "Grad.Rate"] - pred_lasso2)^2)) (lasso2_mse
END OF LAB