Dual y-Axis
An R tutorial on how to create a plots with dual y-axes
Introduction
In some cases, a graph with two y-axes is desired for visualizing two different sets of data. However, this is sometimes frowned upon since the required scaling of the data can be adjusted to fit the desired narrative.
With that said, there still are situations were dual y-axes are
appropriate. This vignette will show you how to do this in
with ggplot2
, despite the
package authors disagreements.
Prepare Data
# devtools::install_github("derekmichaelwright/agData")
library(agData)
# Prep data
<- c("www.dblogr.com/ or derekmichaelwright.github.io/dblogr/ | Data: AGILE")
myCaption <- read.csv("data_dual_y_axis.csv") %>%
xx mutate(Date = as.Date(Date))
as_tibble(xx)
## # A tibble: 214 × 4
## Date Day.Length Soil.Temperature Air.Temperature
## <date> <dbl> <dbl> <dbl>
## 1 2016-11-29 9.38 5.56 8.56
## 2 2016-11-30 9.35 5.34 8.74
## 3 2016-12-01 9.33 4.65 8.75
## 4 2016-12-02 9.31 4.17 8.68
## 5 2016-12-03 9.29 3.53 8.99
## 6 2016-12-04 9.27 3.66 10.9
## 7 2016-12-05 9.26 3.21 10.6
## 8 2016-12-06 9.24 2.89 9.21
## 9 2016-12-07 9.22 2.96 11.0
## 10 2016-12-08 9.21 3.02 9.90
## # ℹ 204 more rows
Single y-axis
First lets create a plot of Air Temperature
.
<- ggplot(xx, aes(x = Date)) +
mp1 geom_line(aes(y = Air.Temperature, color = "1"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = "red", labels = "Air Temp") +
scale_x_date(date_labels = "%b" , date_breaks = "1 month") +
labs(title = "Environmental Data", x = NULL,
y = "Temperature (\u00B0C)", caption = myCaption) +
theme_agData(legend.position = "bottom")
mp1
Now lets add a second set of data (Soil Temperature
), in
this case it uses the same unit as the first (°C).
<- mp1 +
mp2 geom_line(aes(y = Soil.Temperature, color = "2"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = c("red","darkred"),
labels = c("Air Temp","Soil Temp"))
mp2
But when we add another data set (Day Length
) using
different units (hours), problems arise.
<- mp2 +
mp3 geom_line(aes(y = Day.Length, color = "3"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = c("red","darkred","steelblue"),
labels = c("Air Temp","Soil Temp","DayLength"))
mp3
Data Scaling
In this case, the range of Day Length
and
Temperature
are drastically different.
max(xx$Air.Temperature) - min(xx$Air.Temperature)
## [1] 31.9055
max(xx$Day.Length) - min(xx$Day.Length)
## [1] 5.78
In order to present the data better, we need to rescale it
\[y_{scaled}=(y_{2i}-min(y_2))*\frac{max(y_1)-min(y_1)}{max(y_2)-min(y_2)}+min(y_1)\]
where:
- \(y_1\) = Set of values you want to scale to
- \(y_2\) = Set of values to be rescaled to min and max of y1
- \(y_{2i}\) = Value from the \(y_2\) set to be rescaled
in our case:
- \(y_1\) = Air + Soil Temperature
- \(y_2\) = Day Length
- \(y_{2i}\) = Day Length on a specific day
<- min(c(xx$Soil.Temperature, xx$Air.Temperature))
y1_min <- max(c(xx$Soil.Temperature, xx$Air.Temperature))
y1_max <- min(xx$Day.Length)
y2_min <- max(xx$Day.Length)
y2_max <- xx %>%
xx mutate(Day.Length_scaled = (Day.Length - y2_min) * (y1_max - y1_min) /
- y2_min) + y1_min ) (y2_max
Scaling the data can also be done with the rescale
function from the scales
package
<- xx %>%
xx mutate(Day.Length_scaled = scales::rescale(Day.Length, to = c(y1_min, y1_max)))
<- mp3 +
mp4 geom_line(data = xx, aes(y = Day.Length_scaled, color = "4"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL,
values = c("red","darkred","steelblue","darkblue"),
labels = c("Air Temp","Soil Temp","Day Length","Day Length*"))
mp4
That looks better. However, we still need to add the second y-axis, which will require some more math.
\[y_{2i}=(y_{scaled}-min(y_1))*\frac{max(y_2)-min(y_2)}{max(y_1)-min(y_1)}+min(y_2)\]
Double y-axis
# Prep sec_axis
<- sec_axis(~(. - y1_min) * (y2_max - y2_min) / (y1_max - y1_min) + y2_min,
mySA name = "Hours", breaks = 9:14)
# Plot
<- mp2 +
mp5 geom_line(data = xx, aes(y = Day.Length_scaled, color = "4"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = c("red","darkred","darkblue"),
labels = c("Air Temp","Soil Temp","Day Length")) +
scale_y_continuous(sec.axis = mySA)
mp5
To help better visualize this rescaling, we will use a simpler example.
\[y_{scaled}=(y_{2i}-min(y_2))*\frac{max(y_1)-min(y_1)}{max(y_2)-min(y_2)}+min(y_1)\]
\[y_{scaled}=(7.5-5)*\frac{40-20}{10-5}+20)=30\]
\[y_{2i}=(y_{scaled}-min(y_1))*\frac{max(y_2)-min(y_2)}{max(y_1)-min(y_1)}+min(y_2)\]
\[y_{2i}=(30-20)*\frac{10-5}{40-20}+5=7.5\]
<- data.frame(x = 1:20, y = 1:20)
xx ggplot(xx, aes(x = x, y = y)) +
geom_hline(yintercept = 30, color = "blue", alpha = 0.7, size = 2) +
theme_agData(axis.text.y = element_text(color = "red", size = 10)) +
scale_y_continuous(limits = c(20, 40),
sec.axis = sec_axis(~ (. - 20) * (10 - 5) / (20 - 0) + 5,
name = "y2", breaks = 5:10)) +
labs(caption = myCaption)