For this lab you need the R package ggplot2
. Install the package if you haven’t done that yet, and load it in workspace with the function library()
. The aim of this lab is to get acquainted with building plots layer-by-layer with the ggplot()
function, and with the basic geometries for displaying numeric and categorical variables.
For help on the ggplot2
functions, check the Data Visualization Cheat Sheet
if('ggplot2' %in% installed.packages() == FALSE) install.packages('ggplot2')
library(ggplot2)
Make the exercises in the R Markdown template 1A-Lab-Template.Rmd
. You’ll find instructions on how to work with R Markdown documents in the template.
The code fo the exercises can be made visible by clicking the , but first try to find the solution yourself.
In this lab we use the data set txhousing
(run ?txhousing
at the prompt to see its help page).
The data and aesthetics arguments tell ggplot()
where to to find the data, and where to map the variables. Together they specify the axes of the plot array, but they do not make any plot yet.
ggplot(data = txhousing)
. This creates an empty plot surface.ggplot(data = txhousing)
volume
to the x-axis. Check the result.ggplot(data = txhousing, mapping = aes(x = volume))
sales
to the y-axis. Again check the result.ggplot(data = txhousing, mapping = aes(x = volume, y = sales))
The geoms determine how the data are represented in the plot. There are many different geoms, including those for displaying histograms, lines, points, densities, bars, boxplots, etc.
volume
and housing
, and the common way to represent two numeric variables is a scatter plot. Find the appropriate geom to display a scatter plot of volume
and housing
. (ggplot
generates a warning about missing values. You can prevent this by using na.omit(txhousing)
as data specification.)ggplot(data = na.omit(txhousing), mapping = aes(x = volume, y = sales)) +
geom_point()
volume
and sales
.ggplot(data = na.omit(txhousing), mapping = aes(x = volume, y = sales)) +
geom_point() +
geom_smooth()
color
to color the points in the scatter plot by year.ggplot(data = na.omit(txhousing), mapping = aes(x = volume, y = sales, col = year)) +
geom_point() +
geom_smooth()
The scale on which the axis are represent can be changed by adding a scale
layer to the plot. For example, scale_y_sqrt()
transforms the y-axis to the square root of the original values, and scale_y_log10()
takes the logarithm of the original values. Thes kind of transformations can be useful to reduce heteroscedasticity (non-constant variance around the regression line).
ggplot(data = na.omit(txhousing), mapping = aes(x = volume, y = sales, col = year)) +
geom_point() +
geom_smooth() +
scale_y_log10() +
scale_x_log10()
Facets are multiples of the plot, split up by the levels of categorical variable.
year
from the aesthetics, and use the facet_wrap()
or facet_grid()
function to make multiples by year
.ggplot(data = na.omit(txhousing), mapping = aes(x = volume, y = sales)) +
geom_point() +
geom_smooth() +
scale_y_log10() +
scale_x_log10() +
facet_wrap(vars(year))
Themes are uses to change the look of the plot.
ggplot(data = na.omit(txhousing), mapping = aes(x = volume, y = sales)) +
geom_point() +
geom_smooth() +
scale_y_log10() +
scale_x_log10() +
facet_wrap(vars(year)) +
theme_minimal()
In the previous exercise we used geoms for two numeric variables to display a scatter plot with a regression line. We will now briefly explore some geoms for categorical variables, and for combinations of numeric and categorical variables.
sales
, with the x-axis displayed on a logarithmic scale.ggplot(data = na.omit(txhousing), mapping = aes(x = sales)) +
geom_histogram() +
scale_x_log10()
fill = factor(month)
to the plotggplot(data = na.omit(txhousing), mapping = aes(x = sales, fill = factor(month))) +
geom_histogram() +
scale_x_log10()
ggplot(data = na.omit(txhousing), mapping = aes(x = sales)) +
geom_histogram() +
scale_x_log10() +
facet_wrap(vars(month))
ggplot(data = na.omit(txhousing), mapping = aes(x = sales)) +
geom_density() +
scale_x_log10() +
facet_wrap(vars(month))
month
as factor
).ggplot(data = na.omit(txhousing), mapping = aes(x = factor(month), y = sales)) +
geom_boxplot() +
scale_y_log10()
END OF LAB.