GWAS Tutorial with gwaspr

An R tutorial on running genome-wide association studies (GWAS) with GAPIT and gwaspr

Introduction

This vignette is a tutorial on running Genome-Wide Association Studies (GWAS) in with the packages GAPIT and plotting the results with gwaspr.

Resources

GAPIT Website: https://www.maizegenetics.net/gapit

GAPIT Manual: http://www.zzlab.net/GAPIT/gapit_help_document.pdf

GAPIT github: https://github.com/jiabowang/GAPIT

GAPIT Issues: https://github.com/jiabowang/GAPIT/issues

gwaspr Website: https://derekmichaelwright.github.io/gwaspr/

gwaspr github: https://github.com/derekmichaelwright/gwaspr

Lentil Diversity Panel: https://derekmichaelwright.github.io/dblogr/academic/lentil_diversity_panel/

Install `GAPIT`

GAPIT github: https://github.com/jiabowang/GAPIT

install.packages("devtools")
devtools::install_github("jiabowang/GAPIT")

Install `gwaspr`

gwaspr package which contains tidyverse and ggplot packages for plotting the GAPIT results later.

devtools::install_github("derekmichaelwright/gwaspr")

library(gwaspr)

Data

Mandatory

myY.csv

myG_hmp.zip

Optional

myCV.csv

myM.csv

myY.csv: Phenotype file
myG_hmp.csv: Genotype file in hapmap format
myCV.csv: (Optional) Covariate file
myM.csv: (Optional) Metadata file for the genotypes.

All phenotype and genotype data is derived from and available at: https://knowpulse.usask.ca/

Associated papers:

LDP

Lentil Diversity Panel: https://derekmichaelwright.github.io/dblogr/academic/lentil_diversity_panel/

For this tutorial, we will use a Lentil Diversity Panel (LDP) consisting of 324 Lens culinaris cultivars, representative of the global genetic diversity of cultivated lentils.

Origins of the Lentil Diversity Panel (LDP).

Basic GWAS R Code

Load your Data

# Read in phenotype data
myY <- read.csv("Data/myY.csv")
# Read in genotype data (note: header = F)
myG <- read.csv("Data/myG_hmp.csv", header = F)

Run the GWAS

library(GAPIT)
# Simplest GWAS
myGAPIT <- GAPIT( Y = myY, G = myG )

# Custom GWAS
myGAPIT <- GAPIT( Y = myY,
                  G = myG, 
                  CV = myCV, # Optional Covariate File
                  KI = myKI, # Optional Kinship file
                  PCA.total = 4,
                  model = c("GLM","MLM","MLMM","FarmCPU","BLINK")
                )

GWAS Models

GLM General Linear Model: uses population structure (Q) as a cofactor.
MLM Mixed Linear Model: adds Kinship (K) as a random effect. AKA a Q+K model.
CMLM Compressed Mixed Linear Model: reduces the confounding effect between the testing markers and the individuals genetic effects by replacing them with their corresponding groups based on cluster analysis defined by the user. MLM is the equivelent of each individual being its own group, and GLM is the equivlent of having only one group for all individuals.
MLMM Multiple Locus Mixed Linear Model: uses forward-backward stepwise linear mixed-model regression to include associated markers as covariates.
SUPER Settlement of MLM Under Progressively Exclusive Relationship: uses a FaST-LMM algorithm to solve a computational burden for large samples. K is derived from a smaller number of associated markers.
FarmCPU Fixed and random model Circulating Probability Unification: uses iterations to fit associated markers selected using a maximum likelihood method as cofactors.
BLINK BLINK: similar to FarmCPU but eliminates the assumption of causal genes are evenly distributed across the genome. Markers that are in LD with the most significant marker are excluded. Also uses a different method to determine markers for K (BIC).

Data Preparation

The code above will produce errors and not run properly. This is due to the input data not being properly formatted and will be a common occurance. Lets explore why and try to fix these errors.

Phenotype Data

Our population (The LDP) has been phenotyped for three traits:

Cotyledon_Color: a qualitative trait describing cotyledon color (Red = 0, Yellow = 1).
DTF_Nepal_2017: a quantitative trait describing days from sowing to flowering in a 2017 Nepal field trial.
DTF_Sask_2017: a quantitative trait describing days from sowing to flowering in a 2017 Saskatchewan field trial.

# Read in phenotype data
myY <- read.csv("Data/myY.csv")

First we must convert our qualitative trait Cotyledon_Color from character to numeric format.

Red = 0

Yellow = 1

Green = NA

missing data = NA

# Convert qualitative trait to numeric
myY <- myY %>% 
  mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c("Red", "Yellow"), 0:1),
         Cotyledon_Color = as.numeric(Cotyledon_Color))

Now we can visualize our data.

# Prep data
xx <- myY %>%
  gather(Trait, Value, 2:ncol(.)) %>%
  mutate(Trait = factor(Trait, levels = unique(Trait)))
# Plot
mp <- ggplot(xx, aes(x = Value)) + 
  geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) + 
  facet_wrap(. ~ Trait, ncol = 3, scales = "free") +
  theme_gwaspr() +
  labs(title = "Phenotype Data")
# Save
ggsave("Fig_01_myY_Phenotypes.png", mp, width = 8, height = 3)

Genotype Data

Our population, the LDP, was genotyped using an exome caputure array and filtered with the following criteria to yield a data set of 336,367 SNPs, stored in our object myG.

Only Bi-allelic = TRUE
Minimum Read Depth = 3
Minor Allele Frequency = >5%
Maximum Missing Frequency = 25%
Heterozygous = <25%

# Read in genotype data (note: header = F)
myG <- read.csv("Data/myG_hmp.csv", header = F)

The gwaspr package is built for genotype data in the hapmap format. Although GAPIT can also be run with numeric formatted genotype data, when using hapmap formatted genotype data, it is important to set header = F. This will make the column names the first row of your myG object, as demonstrated below:

# Map + marker Info
myG[1:10,1:11]

##                      V1      V2    V3     V4     V5       V6     V7       V8        V9   V10    V11
## 1                    rs alleles chrom    pos strand assembly center protLSID assayLSID panel QCcode
## 2  Lcu.1GRN.Chr1p853882     A/G     1 853882   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 3  Lcu.1GRN.Chr1p854117     G/A     1 854117   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 4  Lcu.1GRN.Chr1p854159     A/C     1 854159   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 5  Lcu.1GRN.Chr1p854174     C/G     1 854174   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 6  Lcu.1GRN.Chr1p870836     A/G     1 870836   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 7  Lcu.1GRN.Chr1p870898     C/T     1 870898   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 8  Lcu.1GRN.Chr1p870903     G/A     1 870903   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 9  Lcu.1GRN.Chr1p871108     A/G     1 871108   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 10 Lcu.1GRN.Chr1p872492     T/C     1 872492   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>

# genotrype calls
myG[1:10,12:17]

##             V12             V13            V14            V15          V16               V17
## 1  X3156.11_AGL CDC_Asterix_AGL CDC_Cherie_AGL CDC_Glamis_AGL CDC_Gold_AGL CDC_Greenstar_AGL
## 2             G               A              G              A            G                 A
## 3             G               G              G              A            G                 G
## 4             C               C              C              C            C                 A
## 5             G               G              G              G            G                 C
## 6             G               G              N              G            G                 A
## 7             T               C              N              C            T                 C
## 8             G               G              N              A            G                 G
## 9             A               A              A              A            A                 A
## 10            T               T              T              T            T                 N

Marker Details

gwaspr has a funtion to output graphs and details about our marker data.

Note: This functions can be slow to run on large marker sets.

# Plot
gg_myG_Details(filename = "Data/myG_hmp.csv", myPrefix = "Fig_02")

Covariates + Kinship

GAPIT will automatically calculate kinship matrices and performs PCA on the genotype data to be used as a covariate to account for population structure. However, custom Kinship matrices and covariates can be used as well. Just make sure they are formatted similar to the output files GAPIT uses for covariates and kinship.

GWAS_Results/GAPIT.Genotype.PCA.csv

GWAS_Results/GAPIT.Genotype.Kin_Zhang.csv

For more information on my CV see the following papers:

# Read in my custom covariate data
myCV <- read.csv("Data/myCV.csv")

# Plot
mp <- ggplot(myCV, aes(x = b_coef * 10000)) + 
  geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) + 
  theme_gwaspr() +
  labs(title = "Covariate Data")
# Save
ggsave("Fig_03_myCV_Covariates.png", mp, width = 5, height = 3)

Common Errors

There are a number of common errors one might encounter when attempting to do GWAS with GAPIT. We will go through a few of these:

Incorect data uploading

myG <- read.csv("Genotype_Data.csv", header = T) when it should be header = F

Notice how the first row of our genotype data is actually the column names.

Numeric Phenotype Data

For our qualitative trait Cotyledon_Color, we needed to convert

Red to 0
Yellow to 1

Genotype Names

Genotype names not matching between the phenotype file and genotype file is an easy error to miss. R does not like column names that begin with a number. E.g., the genotype 3156-11_AGL should have its named changed to X3156-11_AGL. Another common problem can come from dashses (-) and periods (.), E.g.. CDC_QG.1_AGL or CDC_QG-1_AGL, or spaces () and underscores (_), E.g., CDC Asterix AGL or CDC_Asterix_AGL.

setdiff(myY$Name, t(myG[1,])) # should be character(0)

## [1] "CDC Asterix AGL" "3156-11_AGL"     "CDC_QG-1_AGL"

setdiff(t(myG[1,]), myY$Name) # should show just the first 11 columns of myG

##  [1] "rs"              "alleles"         "chrom"           "pos"             "strand"          "assembly"        "center"          "protLSID"       
##  [9] "assayLSID"       "panel"           "QCcode"          "X3156.11_AGL"    "CDC_Asterix_AGL" "CDC_QG.1_AGL"

# Change names of select individuals
myY <- myY %>% 
  mutate(Name = plyr::mapvalues(Name, 
           c("3156-11_AGL",  "CDC Asterix AGL", "CDC_QG-1_AGL"), 
           c("X3156.11_AGL", "CDC_Asterix_AGL", "CDC_QG.1_AGL")))

setdiff(myY$Name, t(myG[1,])) # should be character(0)

## character(0)

setdiff(t(myG[1,]), myY$Name) # should show just the first 11 columns of myG

##  [1] "rs"        "alleles"   "chrom"     "pos"       "strand"    "assembly"  "center"    "protLSID"  "assayLSID" "panel"     "QCcode"

Run GWAS

Load GAPIT

library(GAPIT)

For our purposes we will run GWAS with the following models: GLM, MLM, MLMM, FarmCPU, BLINK.

Y = phenotype file

G = genotype file

CV = covariate file

Run GWAS

setwd("GWAS_Results/")
# Custom GWAS
GAPIT( Y = myY, # Phenotype data
       G = myG, # Genotype data
       PCA.total = 4,
       model = c("GLM", "MLM", "MLMM", "FarmCPU", "BLINK"), 
       Random.model = F,  # Optional: use if GAPIT returns an error
       Phenotype.View = F # Optional: use if GAPIT returns an error
      )

Run GWAS with Covariates

We will rerun GWAS on the DTF traits using the b coefficient from a photothermal model derived from Wright et al. (2020). This should act to remove or diminish temperature related associations, and thus emphasize photoperiod related associations.

setwd("GWAS_Results/")
# Prep data
myY2 <- myY %>% select(Name, DTF_Sask_2017_b=DTF_Sask_2017)
# Run GWAS with covariate
GAPIT( Y = myY2, G = myG, 
       CV = myCV,
       model = c("MLM", "FarmCPU", "BLINK", "MLMM", "GLM"),
       PCA.total = 0, 
       Random.model = F,  # Optional: use if GAPIT returns an error
       Phenotype.View = F # Optional: use if GAPIT returns an error
      )

PCA

Why set PCA.total = 4?

Principle components of the marker data are used to account for population structure. By plotting our eigenvalues by the number of principle components we can use the “kink method” to choose how many PCs to use in our GWAS. From PC4 to PC5, there is little change in the eigenvalues, meaning that additional PCs will not account for more of the variance within our data (i.e., they are unnecessary).

In our case, we are using an output (the PCA of our marker data) from the GWAS to make this decision. If this is your first time running a GWAS, setting PCA.total = 3 is good practice, as your GWAS can be easily rerun if you later determine you need to adjust settings.

# Prep data
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA_eigenvalues.csv") %>% 
  mutate(PC = 1:324, `Eigen Values` = x,
         fill = ifelse(PC == 4, "PC", ""),
         `Percent of Variation` = 100 * `Eigen Values` / sum(`Eigen Values`) ) %>%
  slice(1:10)
# Plot
mp <- ggplot(xx,  aes(x = PC, y = `Percent of Variation`)) + 
  geom_line(linewidth = 1, alpha = 0.7) + 
  geom_point(aes(fill = fill), size = 3, pch = 21) +
  scale_fill_manual(values = c("darkgreen", "darkred")) +
  scale_x_continuous(breaks = 1:10) +
  theme_gwaspr(legend.position = "none") +
  labs(title = "PCA determination")
# Save
ggsave("Fig_04_PCA_Eigen_Values.png", width = 6, height = 4)

Post GWAS Preparation

After running a genome-wide association study, there are some quality control and post-GWAS steps we can take to evaluate our results and identify false positives.

Prepare Data

Convert our Cotyledon_Color back to its character values.

myY <- myY %>% 
  mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c(0,1), c("Red","Yellow")))

Reread in our myG file but with header = T.

# Reread in genotype data (Note: header = T)
myG <- read.csv("Data/myG_hmp.csv", header = T)

Check Results

Check which traits and models have ran with is.ran()

is_ran(folder = "GWAS_Results/")

##             Trait GLM MLM MLMM FarmCPU BLINK CMLM SUPER
## 1 Cotyledon_Color   X   X    X       X     X   NA    NA
## 2  DTF_Nepal_2017   X   X    X       X     X   NA    NA
## 3   DTF_Sask_2017   X   X    X       X     X   NA    NA
## 4 DTF_Sask_2017_b   X   X    X       X     X   NA    NA

Check if the results files are ordered by p.value with is_ordered(). If not this could cause problems

is_ordered(folder = "GWAS_Results/")

##             Trait GLM MLM MLMM FarmCPU BLINK CMLM SUPER FarmCPU_Kansas BLINK_Kansas
## 1 Cotyledon_Color   X   X    X       X     X   NA    NA              X            X
## 2  DTF_Nepal_2017   X   X    X       X     X   NA    NA              X            X
## 3   DTF_Sask_2017   X   X    X       X     X   NA    NA              X            X
## 4 DTF_Sask_2017_b   X   X    X       X     X   NA    NA              X            X

If they are not ordered, they can be with the order_GWAS_Results() function

order_GWAS_Results(folder = "GWAS_Results/")

Summarise Results

Create a summary table of all significant associations

# Create summary of GWAS results
xx <- table_GWAS_Results(folder = "GWAS_Results/", threshold = 6.8, sug.threshold = 5.3)
# Save
write.csv(xx, "Supplemtnal_Table_01.csv", row.names = F)

Significance threshold

suggestive association = \(p<5*10^{-6}=0.000005\) -> -log10(0.000005) = 5.3
bonferroni threshold = \(p<5*10^{-8}=0.00000005\) -> -log10(0.00000005) = 7.3
custom threshold = \(p<0.05/n(markers)=0.05/336367\) -> -log10(0.05/336367) = 6.8

Manually calculate significance threshold

-log10(0.05)

## [1] 1.30103

-log10(0.000005)

## [1] 5.30103

-log10(0.00000005)

## [1] 7.30103

-log10( 0.05 / (nrow(myG)-1) )

## [1] 6.827842

-log10( 0.01 / (nrow(myG)-1) )

## [1] 7.526812

use gwaspr function

calc_Threshold(myG, p = 0.05)

## [1] 6.827843

DTF Nepal 2017

Since Days To Flower (DTF) is a quantitative trait, we can expect many smaller effect QTLs.

Manhattan Plots

Facet = T

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Facet each model
  facet = T 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_01.png", mp, width = 12, height = 8, bg = "white")

Facet = F

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Facet each model
  facet = F # default setting
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_02.png", mp, width = 12, height = 4, bg = "white")

Custom

Now lets customize the manhattan plot with some selected markers

list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "DTF_Nepal_2017" 
  )

## # A tibble: 45 × 6
##    SNP                       Chr       Pos Traits         Models                         Max_LogP
##    <chr>                   <int>     <int> <chr>          <chr>                             <dbl>
##  1 Lcu.1GRN.Chr5p1658484       5   1658484 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM     34.0
##  2 Lcu.1GRN.Chr2p44546658      2  44546658 DTF_Nepal_2017 FarmCPU; GLM; MLM                  33.0
##  3 Lcu.1GRN.Chr2p44545877      2  44545877 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM     22.7
##  4 Lcu.1GRN.Chr2p44558948      2  44558948 DTF_Nepal_2017 GLM; MLM                           22.2
##  5 Lcu.1GRN.Chr3p389298069     3 389298069 DTF_Nepal_2017 GLM; MLM                           19.0
##  6 Lcu.1GRN.Chr3p389318049     3 389318049 DTF_Nepal_2017 GLM; MLM                           19.0
##  7 Lcu.1GRN.Chr3p389319560     3 389319560 DTF_Nepal_2017 GLM; MLM                           19.0
##  8 Lcu.1GRN.Chr5p1650591       5   1650591 DTF_Nepal_2017 FarmCPU; GLM; MLM                  18.7
##  9 Lcu.1GRN.Chr3p345157947     3 345157947 DTF_Nepal_2017 FarmCPU                            15.9
## 10 Lcu.1GRN.Chr5p2101990       5   2101990 DTF_Nepal_2017 FarmCPU                            14.8
## # ℹ 35 more rows

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","blue"),
  # Specify number of legend rows
  legend.rows = 1 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_03.png", mp, width = 12, height = 4, bg = "white")

Chr 2

facet = T

# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot just Chromosome 2
  chr = 2,
  pos1 = 40000000,
  pos2 = 50000000,
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr2p44545877",
  # Create alt labels for the markers
  labels = "44545877",
  # Specify Color for each marker vline
  vline.color = "red",
  # Facet each model
  facet = T
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_04.png", mp, width = 6, height = 6)

facet = F

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot just Chromosome 2
  chrom = 2,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr2p44545877",
  # Create alt labels for the markers
  labels = "44Mbp",
  # Specify Color for each marker vline
  vline.colors = "red",
  # Facet each model
  facet = F # default setting
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_05.png", mp, width = 12, height = 4, bg = "white")

Chr 5 Association

facet = T

# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017", 
  # Highlight specific markers 
  markers = "Lcu.1GRN.Chr5p1658484", 
  # Create alt labels for the markers
  labels = "1658484", 
  # Plot just Chromosome 5
  chr = 5, 
  pos1 = 1000000, 
  pos2 = 2000000,
  # Specify Color for each marker vline
  vline.color = "red",
  # Facet each model
  facet = T
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_06.png", mp, width = 6, height = 6)

facet = F

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot just Chromosome 5
  chrom = 5,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr5p1658484",
  # Create alt labels for the markers
  labels = "16Mbp",
  # Specify Color for each marker vline
  vline.colors = "blue",
  # Facet each model
  facet = F # default setting
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_07.png", mp, width = 12, height = 4, bg = "white")

H & B P Values

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","blue"),
  # Specify number of legend rows
  legend.rows = 1,
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_08.png", mp, width = 12, height = 4, bg = "white")

Markers

2 strong association peaks can be identified on chromosomes 2 and 5.

Chr 2

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr2p44545877",
  # Specify colors for alleles
  marker.colors = c("darkgoldenrod3","darkgreen") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_02_01.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr2p44545877 is associated with early flowering in Nepal.

Chr 5

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "DTF_Nepal_2017", 
  # Select markers to plot
  markers = "Lcu.1GRN.Chr5p1658484",
  # Specify colors for alleles
  marker.colors = c("darkgoldenrod3","darkgreen") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_02_02.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr5p1658484 is associated with early flowering in Nepal.

Chr 2 + 5

Now lets plot both markers

Box

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484"),
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3","steelblue","darkgreen") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_02_03.png", mp, width = 6, height = 4)

Bar

# Plot
mp <- gg_Marker_Bar(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484"),
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3","steelblue","darkgreen") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_02_04.png", mp, width = 6, height = 4)

Volcano Plot

# Plot
mp <- gg_Volcano(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877", 
              "Lcu.1GRN.Chr5p1658484"), 
  # Create alt labels for the markers
  labels = c("44545877","1658484") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_03_01.png", mp, width = 12, height = 4)

Minor Allele Frequency

# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Nepal_2017(NYC).csv") %>%
  mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Nepal_2017(NYC).csv") %>%
  mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) + 
  geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)), 
             color = "red", alpha = 0.7) +
  geom_point(color = "darkgreen", alpha = 0.5) +
  facet_grid(. ~ Model) +
  scale_shape_manual(values = 1:7) +
  theme_gwaspr() +
  labs(title = "DTF_Nepal_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Nepal_2017_03_02.png", mp, width = 10, height = 4)

DTF Sask 2017

Saskatchewan has drastically different temperature and photoperiods than Nepal, so we should expect different associations. In addition, we ran this trait twice, with and without the use of a covariate (the b coefficient from a photothermal model^*^)

DTF_Sask_2017 = no covariate DTF_Sask_2017_b = covariate = b coefficient

Manhattan Plots

facet = T

Facet each model

DTF_Sask_2017

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Facet each model
  facet = T 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_01.png", mp, width = 12, height = 8, bg = "white")

DTF_Sask_2017_b

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Facet each model
  facet = T
  )
ggsave("Fig_DTF_Sask_2017_01_02.png", mp, width = 12, height = 8, bg = "white")

facet = F

DTF_Sask_2017

Plot models together (default)

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Facet each model
  facet = F # default setting
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_03.png", mp, width = 12, height = 4, bg = "white")

DTF_Sask_2017_b

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Facet each model
  facet = F # default setting
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_04.png", mp, width = 12, height = 4, bg = "white")

Custom

Now lets customize the manhattan plot with some selected markers

DTF_Sask_2017

list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "DTF_Sask_2017"
  )

## # A tibble: 49 × 6
##    SNP                       Chr       Pos Traits        Models              Max_LogP
##    <chr>                   <int>     <int> <chr>         <chr>                  <dbl>
##  1 Lcu.1GRN.Chr2p44558948      2  44558948 DTF_Sask_2017 BLINK                   15.2
##  2 Lcu.1GRN.Chr2p45088100      2  45088100 DTF_Sask_2017 BLINK; FarmCPU; GLM     14.5
##  3 Lcu.1GRN.Chr2p45092147      2  45092147 DTF_Sask_2017 FarmCPU; GLM            13.3
##  4 Lcu.1GRN.Chr1p437359352     1 437359352 DTF_Sask_2017 BLINK; GLM              12.1
##  5 Lcu.1GRN.Chr1p437368329     1 437368329 DTF_Sask_2017 GLM                     12.1
##  6 Lcu.1GRN.Chr1p437368794     1 437368794 DTF_Sask_2017 GLM                     12.1
##  7 Lcu.1GRN.Chr6p236151499     6 236151499 DTF_Sask_2017 BLINK                   12.1
##  8 Lcu.1GRN.Chr2p48982831      2  48982831 DTF_Sask_2017 GLM                     11.5
##  9 Lcu.1GRN.Chr4p5017613       4   5017613 DTF_Sask_2017 BLINK                   11.5
## 10 Lcu.1GRN.Chr3p448489765     3 448489765 DTF_Sask_2017 BLINK                   10.5
## # ℹ 39 more rows

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue") 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_05.png", mp, width = 12, height = 4, bg = "white")

DTF_Sask_2017_b

list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "DTF_Sask_2017_b",
  # Select only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK")
  )

## # A tibble: 39 × 6
##    SNP                       Chr       Pos Traits          Models                    Max_LogP
##    <chr>                   <int>     <int> <chr>           <chr>                        <dbl>
##  1 Lcu.1GRN.Chr6p3269280       6   3269280 DTF_Sask_2017_b BLINK; FarmCPU; MLM; MLMM     32.9
##  2 Lcu.1GRN.Chr1p437374598     1 437374598 DTF_Sask_2017_b FarmCPU                       30.6
##  3 Lcu.1GRN.Chr1p446411579     1 446411579 DTF_Sask_2017_b FarmCPU                       28.2
##  4 Lcu.1GRN.Chr6p3270522       6   3270522 DTF_Sask_2017_b FarmCPU                       21.8
##  5 Lcu.1GRN.Chr3p448586972     3 448586972 DTF_Sask_2017_b FarmCPU                       20.4
##  6 Lcu.1GRN.Chr3p437386398     3 437386398 DTF_Sask_2017_b FarmCPU                       18.6
##  7 Lcu.1GRN.Chr6p431657465     6 431657465 DTF_Sask_2017_b FarmCPU                       17.8
##  8 Lcu.1GRN.Chr1p437359352     1 437359352 DTF_Sask_2017_b BLINK; FarmCPU                17.5
##  9 Lcu.1GRN.Chr3p239208186     3 239208186 DTF_Sask_2017_b BLINK                         14.0
## 10 Lcu.1GRN.Chr3p437318468     3 437318468 DTF_Sask_2017_b FarmCPU                       13.3
## # ℹ 29 more rows

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue") 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_06.png", mp, width = 12, height = 4, bg = "white")

Multil Trait

# Plot
mp <- gg_Manhattan_Traits(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select traits to plot
  traits = c("DTF_Nepal_2017", "DTF_Sask_2017", "DTF_Sask_2017_b"),
  # Specify a title
  title = "Days to Flower",
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue")
)
# Save
ggsave("Fig_DTF_Sask_2017_01_07.png", mp, width = 12, height = 8, bg = "white")

Chr 6

facet = F

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Plot just Chromosome 6
  chrom = 6,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr6p3269280",
  # Create alt labels for the markers
  labels = "32Mbp",
  # Specify Color for each marker vline
  vline.colors = "blue",
  # Facet each model
  facet = F # default setting
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_08.png", mp, width = 12, height = 4, bg = "white")

facet = T

# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b", 
  # Specify a title
  title = "DTF_Sask_2017 | CV = b", 
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr6p3269280", 
  # Create alt labels for the markers
  labels = "3269280", 
  # Specify Color for each marker vline
  vline.color = "blue",
  # Plot just Chromosome 2
  chr = 6,
  pos1 = 0, 
  pos2 = 6000000,
  # Facet each model
  facet = T
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_09.png", mp, width = 6, height = 6)

H & B P values

DTF_Sask_2017

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue"),
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_10.png", mp, width = 12, height = 4, bg = "white")

DTF_Sask_2017_b

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue"),
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_11.png", mp, width = 12, height = 4, bg = "white")

Markers

Chr 6

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr6p3269280"
  )
# Save
ggsave("Fig_DTF_Sask_2017_02_01.png", mp, width = 6, height = 4)

Chr 6 + 2 + 5

Box

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280")
  )
# Save
ggsave("Fig_DTF_Sask_2017_02_02.png", mp, width = 6, height = 4)

Bar

# Plot
mp <- gg_Marker_Bar(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280")
  )
# Save
ggsave("Fig_DTF_Sask_2017_02_03.png", mp, width = 6, height = 4)

Volcano Plot

# Plot
mp <- gg_Volcano(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Highlight specific markers
  markers = "Lcu.2RBY.Chr6p2528817", 
  # Create alt labels for the markers
  labels = "Chr6p2528817"
  )
# Save
ggsave("Fig_DTF_Sask_2017_03_01.png", mp, width = 12, height = 4)

Minor Allele Frequency

# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Sask_2017(NYC).csv") %>%
  mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Sask_2017(NYC).csv") %>%
  mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) + 
  geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)), 
             color = "red", alpha = 0.7) +
  geom_point(color = "darkgreen", alpha = 0.5) +
  facet_grid(. ~ Model) +
  scale_shape_manual(values = 1:7) +
  theme_gwaspr() +
  labs(title = "DTF_Sask_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Sask_2017_03_02.png", mp, width = 10, height = 4)

Cotyledon Color

Since this is a simpler qualitative trait, we can assume there should be 1 large effect QTL.

Manhattan Plots

facet = T

Facet each model

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Facet each model
  facet = T 
  )
# Save
ggsave("Fig_Cotyledon_Color_01_01.png", mp, width = 12, height = 8, bg = "white")

facet = F

Plot models together (default)

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Facet each model
  facet = F # default setting
  )
# Save
ggsave("Fig_Cotyledon_Color_01_02.png", mp, width = 12, height = 4, bg = "white")

Custom

Now lets customize the manhattan plot with some selected markers

list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "Cotyledon_Color" 
  )

## # A tibble: 54 × 6
##    SNP                       Chr       Pos Traits          Models                         Max_LogP
##    <chr>                   <int>     <int> <chr>           <chr>                             <dbl>
##  1 Lcu.1GRN.Chr1p365986872     1 365986872 Cotyledon_Color BLINK; FarmCPU; GLM; MLM; MLMM    175. 
##  2 Lcu.1GRN.Chr1p361840399     1 361840399 Cotyledon_Color BLINK; FarmCPU                     83.7
##  3 Lcu.1GRN.Chr1p361856257     1 361856257 Cotyledon_Color MLMM                               39.5
##  4 Lcu.1GRN.Chr1p365318023     1 365318023 Cotyledon_Color GLM; MLM                           34.8
##  5 Lcu.1GRN.Chr1p365318027     1 365318027 Cotyledon_Color GLM; MLM                           34.6
##  6 Lcu.1GRN.Chr1p361407757     1 361407757 Cotyledon_Color FarmCPU                            29.5
##  7 Lcu.1GRN.Chr1p27007485      1  27007485 Cotyledon_Color MLMM                               23.1
##  8 Lcu.1GRN.Chr4p359784721     4 359784721 Cotyledon_Color GLM; MLM                           18.6
##  9 Lcu.1GRN.Chr4p359784737     4 359784737 Cotyledon_Color GLM; MLM                           18.3
## 10 Lcu.1GRN.Chr3p212553749     3 212553749 Cotyledon_Color GLM; MLM                           16.9
## # ℹ 44 more rows

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot only certain GWAS models
  models = c("MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3) 
  )
# Save
ggsave("Fig_Cotyledon_Color_01_03.png", mp, width = 12, height = 4, bg = "white")

Chr 1

full Chromosome

Plot only chromosome 1

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot just Chromosome 1
  chrom = 1,
  # Plot only certain GWAS models
  models = c("MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3),
  # Set a max p value
  pmax = 50
  )
# Save
ggsave("Fig_Cotyledon_Color_01_04.png", mp, width = 12, height = 4, bg = "white")

36-37 Mbp

Create a Zoomed in Plot with gg_Manhattan_Zoom

# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot just Chromosome 1
  chr = 1,
  pos1 = 360000000,
  pos2 = 370000000,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3),
  # Facet each model
  facet = T
  )
# Save
ggsave("Fig_Cotyledon_Color_01_05.png", mp, width = 8, height = 6)

H & B P values

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot only certain GWAS models
  models = c("MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3),
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_Cotyledon_Color_01_06.png", mp, width = 12, height = 4, bg = "white")

Marker Plots

Chr 1 365 Mbp

# Plot
mp <- gg_Marker_Pie(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr1p365986872",
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3") 
  )
# Save
ggsave("Fig_Cotyledon_Color_02_01.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr1p365986872 is convincing.

Chr 1 361 Mbp

# Plot
mp <- gg_Marker_Pie(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr1p361840399",
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3") 
  )
# Save
ggsave("Fig_Cotyledon_Color_02_02.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr1p361840399 is likely a false positive.

Both

# Plot
mp <- gg_Marker_Pie(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399"),
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3") 
  )
# Save
ggsave("Fig_Cotyledon_Color_02_03.png", mp, width = 6, height = 4)

Volcano Plot

# Plot
mp <- gg_Volcano(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"), 
  # Create alt labels for the markers
  labels = c("365Mbp","351Mbp","362Mbp") 
  )
# Save
ggsave("Fig_Cotyledon_Color_03_01.png", mp, width = 12, height = 4)

Minor Allele Frequency

# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.Cotyledon_Color(NYC).csv") %>%
  mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.Cotyledon_Color(NYC).csv") %>%
  mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) + 
  geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)), 
             color = "red", alpha = 0.7) +
  geom_point(color = "darkgreen", alpha = 0.5) +
  facet_grid(. ~ Model) +
  scale_shape_manual(values = 1:7) +
  theme_gwaspr() +
  labs(title = "Cotyledon_Color", y = "-log10(p)")
# Save
ggsave("Fig_Cotyledon_Color_03_02.png", mp, width = 10, height = 4)

LD Decay

library(genetics)
xG <- read.csv("Data/myG_hmp.csv", header = T)
dna <- data.frame(stringsAsFactors = F,
                  Symbol = c("A", "C", "G", "T", "U", 
                             "R", "Y", "S", "W", "K", "M", "N"),
                  Value  = c("A/A","C/C","G/G","T/T","U/U",
                             "A/G","C/T","G/C","A/T","G/T","A/C","N/N") )
xx <- xG[,c(-2,-5,-6,-7,-8,-9,-10,-11)]
for(i in 4:ncol(xx)) {
  xx[xx[,i]=="N", i] <- NA 
  xx[,i] <- plyr::mapvalues(xx[,i], dna$Symbol, dna$Value)
}
#
LD_Decay <- function(x, folder = "LD/", Chr = 1, Num = 200) {
  xc <- x %>% filter(chrom == Chr) 
  xr <- xc %>% column_to_rownames("rs")
  xr <- xr[,c(-1,-2)]
  rr <- round(runif(Num, 1, nrow(xr)))
  while(sum(duplicated(rr))>0) {
    ra <- round(runif(sum(duplicated(rr)), 1, nrow(xr)))
    rr <- rr[!duplicated(rr)]
    rr <- c(rr, ra)
  }
  rr <- rr[order(rr)]
  xr <- xr[rr,]
  #
  xi <- xr %>% t() %>% as.data.frame()
  myLD <- genetics::LD(genetics::makeGenotypes(xi))
  save(myLD, file = paste0(folder, "LD_Chrom_", Chr, "_", Num, ".Rdata"))
}
# Calculate LD per chromosome
LD_Decay(x = xx, Chr = 1, Num = 1000)
LD_Decay(x = xx, Chr = 2, Num = 1000)
LD_Decay(x = xx, Chr = 3, Num = 1000)
LD_Decay(x = xx, Chr = 4, Num = 1000)
LD_Decay(x = xx, Chr = 5, Num = 1000)
LD_Decay(x = xx, Chr = 6, Num = 1000)
LD_Decay(x = xx, Chr = 7, Num = 1000)

# Create function
movingAverage <- function(x, n = 5) {
  stats::filter(x, rep(1 / n, n), sides = 2)
}
# Prep data
xG <- read.csv("Data/myG_hmp.csv", header = T)
xx <- NULL
for(i in 1:7) {
  xc <- xG %>% filter(chrom == i) 
  load(paste0("LD/LD_Chrom_",i,"_1000.Rdata"))
  xi <- myLD$`R^2` %>% as.data.frame() %>% 
    rownames_to_column("SNP1") %>% 
    gather(SNP2, LD, 2:ncol(.)) %>% 
    filter(!is.na(LD)) %>%
    mutate(Chr = i,
           SNP1_d = plyr::mapvalues(SNP1, xc$rs, xc$pos, warn_missing = F),
           SNP2_d = plyr::mapvalues(SNP2, xc$rs, xc$pos, warn_missing = F),
           Distance = as.numeric(SNP2_d) - as.numeric(SNP1_d)) %>%
    arrange(Distance, rev(LD))
  #
  xii <- xi %>% filter(Distance < 1000000)
  myloess <- stats::loess(LD ~ Distance, data = xii, span=0.50)
  #
  xi <- xi %>% 
    mutate(Moving_Avg = movingAverage(LD, n = 100),
           Loess_Chr = ifelse(Distance < 1000000, predict(myloess), NA))
  #
  xx <- bind_rows(xx, xi)
}
#
x1 <- xx %>% group_by(Chr) %>%
  summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- xx %>% filter(Loess_Chr < 0.2) %>% group_by(Chr) %>% 
  summarise(Threshold_0.2 = min(Distance, na.rm = T))
#x3 <- xx %>% left_join(x1, by = "Chr") %>%
#  group_by(Chr) %>% filter(Moving_Avg < Mean_LD) %>%
#  summarise(Threshold_0.1 = min(Distance, na.rm = T))
myChr <-  x1 %>% left_join(x2, by = "Chr") #%>%
  #left_join(x3, by = "Chr")
#
xx <- left_join(xx, myChr, by = "Chr") %>%
  mutate(Chr = as.factor(Chr))
yy <- xx %>% filter(Distance < 1000000) %>%
  arrange(Distance, rev(LD))
myloess <- stats::loess(LD ~ Distance, data = yy, span=0.50)
yy <- yy %>% 
  mutate(Moving_Avg = movingAverage(LD, n = 100),
         Loess_Geno = ifelse(Distance < 1000000, predict(myloess), NA))
#
x1 <- yy %>% summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- yy %>% filter(Loess_Chr < 0.2) %>%  
  summarise(Threshold_0.2 = min(Distance, na.rm = T))
myGeno <-   cross_join(x1, x2)
# Plot first 1 Mbp
mp1 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Geno)) +
  geom_point(aes(y = LD), size = 0.2, pch = 15, alpha = 0.3) +
  #geom_line(aes(y = Moving_Avg), color = "red", lwd = 1) +
  geom_line(aes(y = Moving_Avg), alpha = 0.7) +
  geom_line() +
  geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
  geom_hline(data = myGeno, aes(yintercept = Mean_LD), color = "red", lty = 2) +
  geom_vline(data = myGeno, lty = 2, size = 0.3, alpha = 0.8,
             aes(xintercept = Threshold_0.2/1000)) +
  scale_x_continuous(seq(0, 1000, by = 100), expand = c(0,10)) +
  theme_gwaspr(legend.position = "none",
               axis.title.y = ggtext::element_markdown()) +
  labs(title = "LD of all markers within 1 Mbp", y = "r^2", x = "Kbp")
# Plot first 1 Mbp
mp2 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Chr)) +
  geom_line(aes(y = Moving_Avg), alpha = 0.5) +
  geom_line() +
  geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
  geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
  scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
  facet_wrap(paste("Chr =", Chr) + paste("Threshold =", Threshold_0.2) ~ ., 
             ncol = 7, scales = "free_x") +
  geom_vline(data = myChr, lty = 2, size = 0.3, alpha = 0.8,
             aes(xintercept = Threshold_0.2/1000)) +
  theme_gwaspr(legend.position = "none",
               axis.title.y = ggtext::element_markdown()) +
  labs(title = "By Chromosome", y = "r^2", x = "Kbp")
# Plot full chromsomes
mp3 <- ggplot(xx, aes(x = Distance/1000000, y = Moving_Avg)) +
  geom_line(size = 0.5, alpha = 0.5) +
  geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
  geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
  scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
  facet_wrap(paste("Chr =", Chr) ~ ., ncol = 7, scales = "free_x") +
  theme_gwaspr(legend.position = "none",
               axis.title.y = ggtext::element_markdown()) +
  labs(title = "All Markers", y = "r^2", x = "Mbp")
#yy <- yy %>% filter(Distance < 10000)
# Append
mp <- ggarrange(mp1, mp2, mp3, ncol = 1, nrow = 3)
# Save
ggsave("Fig_06_LD.png", mp , width = 12, height = 10)

Bonus

SNP Density Plot

Package Source: https://github.com/YinLiLin/R-CMplot

library(CMplot)
# Prep data
xx <- myG %>% select(rs, chrom, pos) 
# Plot
CMplot(xx, plot.type="d", bin.size=1e7, 
       chr.den.col=c("darkgreen", "darkgoldenrod3", "darkred"), 
       file="jpg", file.output=T, verbose=F, width=9, height=6,
       file.name = "LDP")

PCA

library(plotly)
library(htmlwidgets)
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA.csv") %>%
  left_join(myY, by = c("taxa"="Name"))

DTF Nepal 2017

https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Nepal_2017.html

# Prep data
myColors <- c("darkred", "darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
              color = ~DTF_Nepal_2017, colors = myColors) %>%
  add_markers() %>%
  layout(scene = list(title = "DTF Nepal 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Nepal_2017.html")
mp

DTF Sask 2017

https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Sask_2017.html

# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
              color = ~DTF_Sask_2017, colors = myColors) %>%
  add_markers() %>%
  layout(scene = list(title = "DTF Sask 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Sask_2017.html")
mp

Cotyledon Color

https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_Cotyledon_Color.html

# Prep data
myColors <- c("darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
              color = ~Cotyledon_Color, colors = myColors) %>%
  add_markers() %>%
  layout(scene = list(title = "Cotyledon_Color"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_Cotyledon_Color.html")
mp

GWAS Tutorial with gwaspr

An R tutorial on running genome-wide association studies (GWAS) with GAPIT and gwaspr