dblogr.com/

Dual y-Axis

An R tutorial on how to create a plots with dual y-axes


Introduction

In some cases, a graph with two y-axes is desired for visualizing two different sets of data. However, this is sometimes frowned upon since the required scaling of the data can be adjusted to fit the desired narrative.

With that said, there still are situations were dual y-axes are appropriate. This vignette will show you how to do this in with ggplot2, despite the package authors disagreements.


Prepare Data

# devtools::install_github("derekmichaelwright/agData")
library(agData)
# Prep data
myCaption <- c("www.dblogr.com/ or derekmichaelwright.github.io/dblogr/ | Data: AGILE")
xx <- read.csv("data_dual_y_axis.csv") %>% 
  mutate(Date = as.Date(Date))
as_tibble(xx)
## # A tibble: 214 × 4
##    Date       Day.Length Soil.Temperature Air.Temperature
##    <date>          <dbl>            <dbl>           <dbl>
##  1 2016-11-29       9.38             5.56            8.56
##  2 2016-11-30       9.35             5.34            8.74
##  3 2016-12-01       9.33             4.65            8.75
##  4 2016-12-02       9.31             4.17            8.68
##  5 2016-12-03       9.29             3.53            8.99
##  6 2016-12-04       9.27             3.66           10.9 
##  7 2016-12-05       9.26             3.21           10.6 
##  8 2016-12-06       9.24             2.89            9.21
##  9 2016-12-07       9.22             2.96           11.0 
## 10 2016-12-08       9.21             3.02            9.90
## # ℹ 204 more rows

Single y-axis

First lets create a plot of Air Temperature.

mp1 <- ggplot(xx, aes(x = Date)) +
  geom_line(aes(y = Air.Temperature, color = "1"), 
            alpha = 0.7, size = 1.25) +
  scale_color_manual(name = NULL, values = "red", labels = "Air Temp") +
  scale_x_date(date_labels = "%b" , date_breaks = "1 month") +
  labs(title = "Environmental Data", x = NULL,
       y = "Temperature (\u00B0C)", caption = myCaption) +
  theme_agData(legend.position = "bottom")
mp1

Now lets add a second set of data (Soil Temperature), in this case it uses the same unit as the first (°C).

mp2 <- mp1 + 
  geom_line(aes(y = Soil.Temperature, color = "2"), 
            alpha = 0.7, size = 1.25) +
  scale_color_manual(name = NULL, values = c("red","darkred"), 
                     labels = c("Air Temp","Soil Temp"))
mp2

But when we add another data set (Day Length) using different units (hours), problems arise.

mp3 <- mp2 + 
  geom_line(aes(y = Day.Length, color = "3"), 
            alpha = 0.7, size = 1.25) +
  scale_color_manual(name = NULL, values = c("red","darkred","steelblue"), 
                     labels = c("Air Temp","Soil Temp","DayLength"))
mp3


Data Scaling

In this case, the range of Day Length and Temperature are drastically different.

max(xx$Air.Temperature) - min(xx$Air.Temperature)
## [1] 31.9055
max(xx$Day.Length) - min(xx$Day.Length)
## [1] 5.78

In order to present the data better, we need to rescale it

\[y_{scaled}=(y_{2i}-min(y_2))*\frac{max(y_1)-min(y_1)}{max(y_2)-min(y_2)}+min(y_1)\]

where:

  • \(y_1\) = Set of values you want to scale to
  • \(y_2\) = Set of values to be rescaled to min and max of y1
  • \(y_{2i}\) = Value from the \(y_2\) set to be rescaled

in our case:

  • \(y_1\) = Air + Soil Temperature
  • \(y_2\) = Day Length
  • \(y_{2i}\) = Day Length on a specific day
y1_min <- min(c(xx$Soil.Temperature, xx$Air.Temperature))
y1_max <- max(c(xx$Soil.Temperature, xx$Air.Temperature))
y2_min <- min(xx$Day.Length)
y2_max <- max(xx$Day.Length)
xx <- xx %>%
  mutate(Day.Length_scaled = (Day.Length - y2_min) * (y1_max - y1_min) / 
                                 (y2_max - y2_min) + y1_min )

Scaling the data can also be done with the rescale function from the scales package

xx <- xx %>% 
  mutate(Day.Length_scaled = scales::rescale(Day.Length, to = c(y1_min, y1_max)))
mp4 <- mp3 + 
  geom_line(data = xx, aes(y = Day.Length_scaled, color = "4"), 
            alpha = 0.7, size = 1.25) +
  scale_color_manual(name = NULL, 
    values = c("red","darkred","steelblue","darkblue"), 
    labels = c("Air Temp","Soil Temp","Day Length","Day Length*"))
mp4

That looks better. However, we still need to add the second y-axis, which will require some more math.

\[y_{2i}=(y_{scaled}-min(y_1))*\frac{max(y_2)-min(y_2)}{max(y_1)-min(y_1)}+min(y_2)\]


Double y-axis

# Prep sec_axis
mySA <- sec_axis(~(. - y1_min) * (y2_max - y2_min) / (y1_max - y1_min) + y2_min, 
                 name = "Hours", breaks = 9:14)
# Plot
mp5 <- mp2 +
  geom_line(data = xx, aes(y = Day.Length_scaled, color = "4"),
            alpha = 0.7, size = 1.25) +
  scale_color_manual(name = NULL, values = c("red","darkred","darkblue"), 
                     labels = c("Air Temp","Soil Temp","Day Length")) +
  scale_y_continuous(sec.axis = mySA)
mp5

To help better visualize this rescaling, we will use a simpler example.

\[y_{scaled}=(y_{2i}-min(y_2))*\frac{max(y_1)-min(y_1)}{max(y_2)-min(y_2)}+min(y_1)\]

\[y_{scaled}=(7.5-5)*\frac{40-20}{10-5}+20)=30\]

\[y_{2i}=(y_{scaled}-min(y_1))*\frac{max(y_2)-min(y_2)}{max(y_1)-min(y_1)}+min(y_2)\]

\[y_{2i}=(30-20)*\frac{10-5}{40-20}+5=7.5\]

xx <- data.frame(x = 1:20, y = 1:20)
ggplot(xx, aes(x = x, y = y)) + 
  geom_hline(yintercept = 30, color = "blue", alpha = 0.7, size = 2) +
  theme_agData(axis.text.y = element_text(color = "red", size = 10)) +
  scale_y_continuous(limits = c(20, 40), 
                     sec.axis = sec_axis(~ (. - 20) * (10 - 5) / (20 - 0) + 5,
                        name = "y2", breaks = 5:10)) +
  labs(caption = myCaption)



dblogr.com/


© Derek Michael Wright