Dual y-Axis
An R tutorial on how to create a plots with dual y-axes
Introduction
In some cases, a graph with two y-axes is desired for visualizing two different sets of data. However, this is sometimes frowned upon since the required scaling of the data can be adjusted to fit the desired narrative.
With that said, there still are situations were dual y-axes are
appropriate. This vignette will show you how to do this in
with ggplot2
, despite the
package authors disagreements.
Prepare Data
# Prep data
myCaption <- c("www.dblogr.com/ or derekmichaelwright.github.io/dblogr/ | Data: AGILE")
xx <- read.csv("data_dual_y_axis.csv") %>%
mutate(Date = as.Date(Date))
as_tibble(xx)
## # A tibble: 214 × 4
## Date Day.Length Soil.Temperature Air.Temperature
## <date> <dbl> <dbl> <dbl>
## 1 2016-11-29 9.38 5.56 8.56
## 2 2016-11-30 9.35 5.34 8.74
## 3 2016-12-01 9.33 4.65 8.75
## 4 2016-12-02 9.31 4.17 8.68
## 5 2016-12-03 9.29 3.53 8.99
## 6 2016-12-04 9.27 3.66 10.9
## 7 2016-12-05 9.26 3.21 10.6
## 8 2016-12-06 9.24 2.89 9.21
## 9 2016-12-07 9.22 2.96 11.0
## 10 2016-12-08 9.21 3.02 9.90
## # ℹ 204 more rows
Single y-axis
First lets create a plot of Air Temperature
.
mp1 <- ggplot(xx, aes(x = Date)) +
geom_line(aes(y = Air.Temperature, color = "1"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = "red", labels = "Air Temp") +
scale_x_date(date_labels = "%b" , date_breaks = "1 month") +
labs(title = "Environmental Data", x = NULL,
y = "Temperature (\u00B0C)", caption = myCaption) +
theme_agData(legend.position = "bottom")
mp1
Now lets add a second set of data (Soil Temperature
), in
this case it uses the same unit as the first (°C).
mp2 <- mp1 +
geom_line(aes(y = Soil.Temperature, color = "2"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = c("red","darkred"),
labels = c("Air Temp","Soil Temp"))
mp2
But when we add another data set (Day Length
) using
different units (hours), problems arise.
mp3 <- mp2 +
geom_line(aes(y = Day.Length, color = "3"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = c("red","darkred","steelblue"),
labels = c("Air Temp","Soil Temp","DayLength"))
mp3
Data Scaling
In this case, the range of Day Length
and
Temperature
are drastically different.
## [1] 31.9055
## [1] 5.78
In order to present the data better, we need to rescale it
\[y_{scaled}=(y_{2i}-min(y_2))*\frac{max(y_1)-min(y_1)}{max(y_2)-min(y_2)}+min(y_1)\]
where:
- \(y_1\) = Set of values you want to scale to
- \(y_2\) = Set of values to be rescaled to min and max of y1
- \(y_{2i}\) = Value from the \(y_2\) set to be rescaled
in our case:
- \(y_1\) = Air + Soil Temperature
- \(y_2\) = Day Length
- \(y_{2i}\) = Day Length on a specific day
y1_min <- min(c(xx$Soil.Temperature, xx$Air.Temperature))
y1_max <- max(c(xx$Soil.Temperature, xx$Air.Temperature))
y2_min <- min(xx$Day.Length)
y2_max <- max(xx$Day.Length)
xx <- xx %>%
mutate(Day.Length_scaled = (Day.Length - y2_min) * (y1_max - y1_min) /
(y2_max - y2_min) + y1_min )
Scaling the data can also be done with the rescale
function from the scales
package
mp4 <- mp3 +
geom_line(data = xx, aes(y = Day.Length_scaled, color = "4"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL,
values = c("red","darkred","steelblue","darkblue"),
labels = c("Air Temp","Soil Temp","Day Length","Day Length*"))
mp4
That looks better. However, we still need to add the second y-axis, which will require some more math.
\[y_{2i}=(y_{scaled}-min(y_1))*\frac{max(y_2)-min(y_2)}{max(y_1)-min(y_1)}+min(y_2)\]
Double y-axis
# Prep sec_axis
mySA <- sec_axis(~(. - y1_min) * (y2_max - y2_min) / (y1_max - y1_min) + y2_min,
name = "Hours", breaks = 9:14)
# Plot
mp5 <- mp2 +
geom_line(data = xx, aes(y = Day.Length_scaled, color = "4"),
alpha = 0.7, size = 1.25) +
scale_color_manual(name = NULL, values = c("red","darkred","darkblue"),
labels = c("Air Temp","Soil Temp","Day Length")) +
scale_y_continuous(sec.axis = mySA)
mp5
To help better visualize this rescaling, we will use a simpler example.
\[y_{scaled}=(y_{2i}-min(y_2))*\frac{max(y_1)-min(y_1)}{max(y_2)-min(y_2)}+min(y_1)\]
\[y_{scaled}=(7.5-5)*\frac{40-20}{10-5}+20)=30\]
\[y_{2i}=(y_{scaled}-min(y_1))*\frac{max(y_2)-min(y_2)}{max(y_1)-min(y_1)}+min(y_2)\]
\[y_{2i}=(30-20)*\frac{10-5}{40-20}+5=7.5\]
xx <- data.frame(x = 1:20, y = 1:20)
ggplot(xx, aes(x = x, y = y)) +
geom_hline(yintercept = 30, color = "blue", alpha = 0.7, size = 2) +
theme_agData(axis.text.y = element_text(color = "red", size = 10)) +
scale_y_continuous(limits = c(20, 40),
sec.axis = sec_axis(~ (. - 20) * (10 - 5) / (20 - 0) + 5,
name = "y2", breaks = 5:10)) +
labs(caption = myCaption)