GWAS Tutorial with gwaspr

An R tutorial on running genome-wide association studies (GWAS) with GAPIT and gwaspr


Introduction

Resources

This vignette is a tutorial on running Genome-Wide Association Studies (GWAS) in with the packages GAPIT and plotting the results with gwaspr.


Data

Mandatory

Optional

  • myY.csv: Phenotype file
  • myG_hmp.csv: Genotype file in hapmap format
  • myCV.csv: (Optional) Covariate file
  • myM.csv: (Optional) Metadata file for the genotypes.

All phenotype and genotype data is derived from and available at: https://knowpulse.usask.ca/

Associated papers:

For this tutorial, we will use a Lentil Diversity Panel (LDP) consisting of 324 Lens culinaris cultivars, representative of the global genetic diversity of cultivated lentils.

Origins of the Lentil Diversity Panel (LDP).
Origins of the Lentil Diversity Panel (LDP).

Install GAPIT

install.packages("devtools")
devtools::install_github("jiabowang/GAPIT")


Basic GWAS R Code

Load your Data

# Read in phenotype data
myY <- read.csv("Data/myY.csv")
# Read in genotype data (note: header = F)
myG <- read.csv("Data/myG_hmp.csv", header = F)

Run the GWAS

library(GAPIT)
# Simplest GWAS
myGAPIT <- GAPIT( Y = myY, G = myG )
# Custom GWAS
myGAPIT <- GAPIT( Y = myY,
                  G = myG, 
                  CV = myCV,
                  KI = myKI,
                  PCA.total = 4,
                  model = c("MLM", "BLINK")
                )


GWAS Models

  • GLM General Linear Model: uses population structure (Q) as a cofactor.
  • MLM Mixed Linear Model: adds Kinship (K) as a random effect. AKA a Q+K model.
  • CMLM Compressed Mixed Linear Model: reduces the confounding effect between the testing markers and the individuals genetic effects by replacing them with their corresponding groups based on cluster analysis defined by the user. MLM is the equivelent of each individual being its own group, and GLM is the equivlent of having only one group for all individuals.
  • MLMM Multiple Locus Mixed Linear Model: uses forward-backward stepwise linear mixed-model regression to include associated markers as covariates.
  • SUPER Settlement of MLM Under Progressively Exclusive Relationship: uses a FaST-LMM algorithm to solve a computational burden for large samples. K is derived from a smaller number of associated markers.
  • FarmCPU Fixed and random model Circulating Probability Unification: uses iterations to fit associated markers selected using a maximum likelihood method as cofactors.
  • BLINK BLINK: similar to FarmCPU but eliminates the assumption of causal genes are evenly distributed across the genome. Markers that are in LD with the most significant marker are excluded. Also uses a different method to determine markers for K (BIC).


Data Preparation

The code above will produce errors and not run properly. This is due to the input data not being properly formatted and will be a common occurance. Lets explore why and try to fix these errors.

First install gwaspr package which contains tidyverse and ggplot packages for plotting the GAPIT results later.

devtools::install_github("derekmichaelwright/gwaspr")
library(gwaspr)

Phenotype Data

Our population (The LDP) has been phenotyped for three traits:

  • Cotyledon_Color: a qualitative trait describing cotyledon color (Red = 0, Yellow = 1).
  • DTF_Nepal_2017: a quantitative trait describing days from sowing to flowering in a 2017 Nepal field trial.
  • DTF_Sask_2017: a quantitative trait describing days from sowing to flowering in a 2017 Saskatchewan field trial.
# Read in phenotype data
myY <- read.csv("Data/myY.csv")

First we must convert our qualitative trait Cotyledon_Color from character to numeric format.

Red = 0

Yellow = 1

Green = NA

missing data = NA

# Convert qualitative trait to numeric
myY <- myY %>% 
  mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c("Red", "Yellow"), 0:1),
         Cotyledon_Color = as.numeric(Cotyledon_Color))

Now we can visualize our data.

# Prep data
xx <- myY %>%
  gather(Trait, Value, 2:ncol(.)) %>%
  mutate(Trait = factor(Trait, levels = unique(Trait)))
# Plot
mp <- ggplot(xx, aes(x = Value)) + 
  geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) + 
  facet_wrap(. ~ Trait, ncol = 3, scales = "free") +
  theme_gwaspr() +
  labs(title = "Phenotype Data")
# Save
ggsave("Fig_01_myY_Phenotypes.png", mp, width = 8, height = 3)


Genotype Data

Our population, the LDP, was genotyped using an exome caputure array and filtered with the following criteria to yield a data set of 336,367 SNPs, stored in our object myG.

  • Only Bi-allelic = TRUE
  • Minimum Read Depth = 3
  • Minor Allele Frequency = >5%
  • Maximum Missing Frequency = 25%
  • Heterozygous = <25%
# Read in genotype data (note: header = F)
myG <- read.csv("Data/myG_hmp.csv", header = F)

The gwaspr package is built for genotype data in the hapmap format. Although GAPIT can also be run with numeric formatted genotype data, when using hapmap formatted genotype data, it is important to set header = F. This will make the column names the first row of your myG object, as demonstrated below:

# Map + marker Info
myG[1:10,1:11]
##                      V1      V2    V3     V4     V5       V6     V7       V8        V9   V10    V11
## 1                    rs alleles chrom    pos strand assembly center protLSID assayLSID panel QCcode
## 2  Lcu.1GRN.Chr1p853882     A/G     1 853882   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 3  Lcu.1GRN.Chr1p854117     G/A     1 854117   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 4  Lcu.1GRN.Chr1p854159     A/C     1 854159   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 5  Lcu.1GRN.Chr1p854174     C/G     1 854174   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 6  Lcu.1GRN.Chr1p870836     A/G     1 870836   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 7  Lcu.1GRN.Chr1p870898     C/T     1 870898   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 8  Lcu.1GRN.Chr1p870903     G/A     1 870903   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 9  Lcu.1GRN.Chr1p871108     A/G     1 871108   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
## 10 Lcu.1GRN.Chr1p872492     T/C     1 872492   <NA>     <NA>   <NA>     <NA>      <NA>  <NA>   <NA>
# genotrype calls
myG[1:10,12:17]
##             V12             V13            V14            V15          V16               V17
## 1  X3156.11_AGL CDC_Asterix_AGL CDC_Cherie_AGL CDC_Glamis_AGL CDC_Gold_AGL CDC_Greenstar_AGL
## 2             G               A              G              A            G                 A
## 3             G               G              G              A            G                 G
## 4             C               C              C              C            C                 A
## 5             G               G              G              G            G                 C
## 6             G               G              N              G            G                 A
## 7             T               C              N              C            T                 C
## 8             G               G              N              A            G                 G
## 9             A               A              A              A            A                 A
## 10            T               T              T              T            T                 N

Marker Details

gwaspr has a funtions to output graphs and details about our marker data.

Note: This functions can be slow to run on large marker sets.

# Plot
gg_myG_Details(filename = "Data/myG_hmp.csv", myPrefix = "Fig_02")


Covariates + Kinship

GAPIT will automatically calculate kinship matrices and performs PCA on the genotype data to be used as a covariate to account for population structure. However, custom Kinship matrices and covariates can be used as well. Just make sure they are formatted similar to the output files GAPIT uses for covariates and kinship.

GWAS_Results/GAPIT.Genotype.PCA.csv

GWAS_Results/GAPIT.Genotype.Kin_Zhang.csv

For more information on my CV see the following papers:

# Read in my custom covariate data
myCV <- read.csv("Data/myCV.csv")
# Plot
mp <- ggplot(myCV, aes(x = b_coef * 10000)) + 
  geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) + 
  theme_gwaspr() +
  labs(title = "Covariate Data")
# Save
ggsave("Fig_03_myCV_Covariates.png", mp, width = 5, height = 3)


Common Errors

There are a number of common errors one might encounter when attempting to do GWAS with GAPIT. We will go through a few of these:

Incorect data uploading

myG <- read.csv("Genotype_Data.csv", header = T) when it should be header = F

Notice how the first row of our genotype data is actually the column names.

Numeric Phenotype Data

For our qualitative trait Cotyledon_Color, we needed to convert

  • Red to 0
  • Yellow to 1

Genotype Names

Genotype names not matching between the phenotype file and genotype file is an easy error to miss. R does not like column names that begin with a number. E.g., the genotype 3156-11_AGL should have its named changed to X3156-11_AGL. Another common problem can come from dashses (-) and periods (.), E.g.. CDC_QG.1_AGL or CDC_QG-1_AGL, or spaces () and underscores (_), E.g., CDC Asterix AGL or CDC_Asterix_AGL.

setdiff(myY$Name, t(myG[1,])) # should be character(0)
## [1] "CDC Asterix AGL" "3156-11_AGL"     "CDC_QG-1_AGL"
setdiff(t(myG[1,]), myY$Name) # should show just the first 11 columns of myG
##  [1] "rs"              "alleles"         "chrom"           "pos"             "strand"          "assembly"        "center"          "protLSID"       
##  [9] "assayLSID"       "panel"           "QCcode"          "X3156.11_AGL"    "CDC_Asterix_AGL" "CDC_QG.1_AGL"
# Change names of select individuals
myY <- myY %>% 
  mutate(Name = plyr::mapvalues(Name, 
           c("3156-11_AGL",  "CDC Asterix AGL", "CDC_QG-1_AGL"), 
           c("X3156.11_AGL", "CDC_Asterix_AGL", "CDC_QG.1_AGL")))
setdiff(myY$Name, t(myG[1,])) # should be character(0)
## character(0)
setdiff(t(myG[1,]), myY$Name) # should show just the first 11 columns of myG
##  [1] "rs"        "alleles"   "chrom"     "pos"       "strand"    "assembly"  "center"    "protLSID"  "assayLSID" "panel"     "QCcode"

Run GWAS with GAPIT

Load GAPIT

library(GAPIT)

Run GWAS

For our purposes we will run GWAS with the following models: GLM, MLM, MLMM, FarmCPU, BLINK.

Y = phenotype file

G = genotype file

CV = covariate file

setwd("GWAS_Results/")
# Custom GWAS
GAPIT( Y = myY, 
       G = myG, 
       PCA.total = 4,
       model = c("GLM", "MLM", "MLMM", "FarmCPU", "BLINK"),
       Random.model = F,  # Optional: use if GAPIT returns an error
       Phenotype.View = F # Optional: use if GAPIT returns an error
      )

PCA

Why set PCA.total = 4?

Principle components of the marker data are used to account for population structure. By plotting our eigenvalues by the number of principle components we can use the “kink method” to choose how many PCs to use in our GWAS. From PC4 to PC5, there is little change in the eigenvalues, meaning that additional PCs will not account for more of the variance within our data (i.e., they are unnecessary).

In our case, we are using an output (the PCA of our marker data) from the GWAS to make this decision. If this is your first time running a GWAS, setting PCA.total = 3 is good practice, as your GWAS can be easily rerun if you later determine you need to adjust settings.

# Prep data
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA_eigenvalues.csv") %>% 
  mutate(PC = 1:324, `Eigen Values` = x,
         fill = ifelse(PC == 4, "PC", ""),
         `Percent of Variation` = 100 * `Eigen Values` / sum(`Eigen Values`) ) %>%
  slice(1:10)
# Plot
mp <- ggplot(xx,  aes(x = PC, y = `Percent of Variation`)) + 
  geom_line(linewidth = 1, alpha = 0.7) + 
  geom_point(aes(fill = fill), size = 3, pch = 21) +
  scale_fill_manual(values = c("darkgreen", "darkred")) +
  scale_x_continuous(breaks = 1:10) +
  theme_gwaspr(legend.position = "none") +
  labs(title = "PCA determination")
# Save
ggsave("Fig_04_PCA_Eigen_Values.png", width = 6, height = 4)


User-inputted Kinship and Covariates

We will rerun GWAS on the DTF traits using the b coefficient from a photothermal model derived from Wright et al. (2020). This should act to remove or diminish temperature related associations, and thus emphasize photoperiod related associations.

# Prep data
myY2 <- myY %>% select(Name, DTF_Sask_2017_b=DTF_Sask_2017)
# Run GWAS with covariate
GAPIT( Y = myY2, G = myG, CV = myCV,
       model = c("MLM", "FarmCPU", "BLINK", "MLMM", "GLM"),
       PCA.total = 0, 
       Random.model = F,  # Optional: use if GAPIT returns an error
       Phenotype.View = F # Optional: use if GAPIT returns an error
      )

Post GWAS Preparation

After running a genome-wide association study, there are some quality control and post-GWAS steps we can take to evaluate our results and identify false positives.


Prepare Data

Convert our Cotyledon_Color back to its character values.

myY <- myY %>% 
  mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c(0,1), c("Red","Yellow")))

Reread in our myG file but with header = T.

# Reread in genotype data (Note: header = T)
myG <- read.csv("Data/myG_hmp.csv", header = T)

Check which traits and models have ran with is.ran()

is_ran(folder = "GWAS_Results/")
##             Trait MLM MLMM FarmCPU BLINK GLM CMLM SUPER
## 1 Cotyledon_Color   X    X       X     X   X   NA    NA
## 2  DTF_Nepal_2017   X    X       X     X   X   NA    NA
## 3   DTF_Sask_2017   X    X       X     X   X   NA    NA
## 4 DTF_Sask_2017_b   X    X       X     X   X   NA    NA

Check if the results files are ordered by p.value with is_ordered(). If not this could cause problems

is_ordered(folder = "GWAS_Results/")
##             Trait MLM MLMM FarmCPU BLINK GLM CMLM SUPER
## 1 Cotyledon_Color   X    X       X     X   X   NA    NA
## 2  DTF_Nepal_2017   X    X       X     X   X   NA    NA
## 3   DTF_Sask_2017   X    X       X     X   X   NA    NA
## 4 DTF_Sask_2017_b   X    X       X     X   X   NA    NA

If they are not ordered, they can be with the order_GWAS_Results() function

order_GWAS_Results(folder = "GWAS_Results/")

Create a summary table of all significant associations

# Create summary of GWAS results
xx <- table_GWAS_Results(folder = "GWAS_Results/", threshold = 6.8, sug.threshold = 5.3)
# Save
write.csv(xx, "Supplemtnal_Table_01.csv", row.names = F)

Significance threshold

  • suggestive association = \(p<5*10^{-6}=0.000005\) -> -log10(0.000005) = 5.3
  • bonferroni threshold = \(p<5*10^{-8}=0.00000005\) -> -log10(0.00000005) = 7.3
  • custom threshold = \(p<0.05/n(markers)=0.05/336367\) -> -log10(0.05/336367) = 6.8

Calculate significance threshold

-log10(0.05)
## [1] 1.30103
-log10(0.000005)
## [1] 5.30103
-log10(0.00000005)
## [1] 7.30103
-log10( 0.05 / (nrow(myG)-1) )
## [1] 6.827842
-log10( 0.01 / (nrow(myG)-1) )
## [1] 7.526812

DTF Nepal 2017

Since Days To Flower (DTF) is a quantitative trait, we can expect many smaller effect QTLs.

Manhattan Plots

Plot models together (default)

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017" 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_01.png", mp, width = 12, height = 4, bg = "white")


Facet each model

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Facet each model
  facet = T 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_02.png", mp, width = 12, height = 12, bg = "white")


Now lets customize the manhattan plot with some selected markers

list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "DTF_Nepal_2017" 
  )
## # A tibble: 42 × 6
##    SNP                       Chr       Pos Traits         Models                         Max_LogP
##    <chr>                   <int>     <int> <chr>          <chr>                             <dbl>
##  1 Lcu.1GRN.Chr2p44545877      2  44545877 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM     40.1
##  2 Lcu.1GRN.Chr2p44546658      2  44546658 DTF_Nepal_2017 GLM; MLM                           39.8
##  3 Lcu.1GRN.Chr2p44558948      2  44558948 DTF_Nepal_2017 GLM; MLM                           39.5
##  4 Lcu.1GRN.Chr3p389298069     3 389298069 DTF_Nepal_2017 GLM; MLM                           34.8
##  5 Lcu.1GRN.Chr3p389318049     3 389318049 DTF_Nepal_2017 GLM; MLM                           34.8
##  6 Lcu.1GRN.Chr3p389319560     3 389319560 DTF_Nepal_2017 GLM; MLM                           34.8
##  7 Lcu.1GRN.Chr5p1658484       5   1658484 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM     31.7
##  8 Lcu.1GRN.Chr5p73165919      5  73165919 DTF_Nepal_2017 GLM                                26.6
##  9 Lcu.1GRN.Chr5p73166268      5  73166268 DTF_Nepal_2017 GLM                                26.6
## 10 Lcu.1GRN.Chr4p497141127     4 497141127 DTF_Nepal_2017 GLM                                24.7
## # ℹ 32 more rows
# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","blue"),
  # Specify number of legend rows
  legend.rows = 1 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_03.png", mp, width = 12, height = 4, bg = "white")


Chr 2 Association

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot just Chromosome 2
  chrom = 2,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr2p44545877",
  # Create alt labels for the markers
  labels = "44Mbp",
  # Specify Color for each marker vline
  vline.colors = "red" 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_04.png", mp, width = 12, height = 4, bg = "white")


# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot just Chromosome 2
  chr = 2,
  pos1 = 40000000,
  pos2 = 50000000,
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr2p44545877",
  # Create alt labels for the markers
  labels = "44545877",
  # Specify Color for each marker vline
  vline.color = "red",
  #
  facet = T
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_05.png", mp, width = 6, height = 6)


Chr 5 Association

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot just Chromosome 5
  chrom = 5,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr5p1658484",
  # Create alt labels for the markers
  labels = "16Mbp",
  # Specify Color for each marker vline
  vline.colors = "blue" 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_06.png", mp, width = 12, height = 4, bg = "white")


# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Nepal_2017", 
  # Highlight specific markers 
  markers = "Lcu.1GRN.Chr5p1658484", 
  # Create alt labels for the markers
  labels = "1658484", 
  # Plot just Chromosome 5
  chr = 5, 
  pos1 = 1000000, 
  pos2 = 2000000,
  # Specify Color for each marker vline
  vline.color = "red",
  #
  facet = T
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_07.png", mp, width = 6, height = 6)


H & B P Values

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","blue"),
  # Specify number of legend rows
  legend.rows = 1,
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_DTF_Nepal_2017_01_08.png", mp, width = 12, height = 4, bg = "white")


Markers

2 strong association peaks can be identified on chromosomes 2 and 5.

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr2p44545877",
  # Specify colors for alleles
  marker.colors = c("darkgoldenrod3","darkgreen") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_02_01.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr2p44545877 is associated with early flowering in Nepal.


# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "DTF_Nepal_2017", 
  # Select markers to plot
  markers = "Lcu.1GRN.Chr5p1658484",
  # Specify colors for alleles
  marker.colors = c("darkgoldenrod3","darkgreen") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_02_02.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr5p1658484 is associated with early flowering in Nepal.


Now lets plot both markers

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484"),
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3","darkgoldenrod2","darkgreen") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_02_03.png", mp, width = 6, height = 4)


Volcano Plot

# Plot
mp <- gg_Volcano(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Nepal_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877", 
              "Lcu.1GRN.Chr5p1658484"), 
  # Create alt labels for the markers
  labels = c("44545877","1658484") 
  )
# Save
ggsave("Fig_DTF_Nepal_2017_03_01.png", mp, width = 12, height = 4)


Minor Allele Frequency

# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Nepal_2017.csv") %>%
  mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Nepal_2017.csv") %>%
  mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) + 
  geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)), 
             color = "red", alpha = 0.7) +
  geom_point(color = "darkgreen", alpha = 0.5) +
  facet_grid(. ~ Model) +
  theme_gwaspr() +
  labs(title = "DTF_Nepal_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Nepal_2017_03_02.png", mp, width = 10, height = 4)


DTF Sask 2017

Saskatchewan has drastically different temperature and photoperiods than Nepal, so we should expect different associations.In addition, we ran this trait twice, with and without the use of a covariate (the b coefficient from a photothermal model)

Manhattan Plots

Plot models together (default)

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017" 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_01.png", mp, width = 12, height = 4, bg = "white")


# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b" 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_02.png", mp, width = 12, height = 4, bg = "white")


Facet each model

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Facet each model
  facet = T 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_03.png", mp, width = 12, height = 14, bg = "white")


# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Facet each model
  facet = T
  )
ggsave("Fig_DTF_Sask_2017_01_04.png", mp, width = 12, height = 14, bg = "white")


Now lets customize the manhattan plot with some selected markers

list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "DTF_Sask_2017"
  )
## # A tibble: 48 × 6
##    SNP                       Chr       Pos Traits        Models              Max_LogP
##    <chr>                   <int>     <int> <chr>         <chr>                  <dbl>
##  1 Lcu.1GRN.Chr2p45088100      2  45088100 DTF_Sask_2017 BLINK; FarmCPU          15.5
##  2 Lcu.1GRN.Chr2p44558948      2  44558948 DTF_Sask_2017 GLM                     14.7
##  3 Lcu.1GRN.Chr3p180388915     3 180388915 DTF_Sask_2017 GLM                     14.0
##  4 Lcu.1GRN.Chr2p44546658      2  44546658 DTF_Sask_2017 GLM                     13.7
##  5 Lcu.1GRN.Chr3p180388113     3 180388113 DTF_Sask_2017 GLM                     13.5
##  6 Lcu.1GRN.Chr3p180387971     3 180387971 DTF_Sask_2017 GLM                     13.4
##  7 Lcu.1GRN.Chr2p86072752      2  86072752 DTF_Sask_2017 BLINK                   13.1
##  8 Lcu.1GRN.Chr2p44545877      2  44545877 DTF_Sask_2017 GLM                     13.0
##  9 Lcu.1GRN.Chr1p437359352     1 437359352 DTF_Sask_2017 BLINK; FarmCPU; GLM     12.7
## 10 Lcu.1GRN.Chr1p437368329     1 437368329 DTF_Sask_2017 GLM                     12.7
## # ℹ 38 more rows
list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "DTF_Sask_2017_b",
  # Select only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK")
  )
## # A tibble: 30 × 6
##    SNP                       Chr       Pos Traits          Models                    Max_LogP
##    <chr>                   <int>     <int> <chr>           <chr>                        <dbl>
##  1 Lcu.1GRN.Chr6p3269280       6   3269280 DTF_Sask_2017_b BLINK; FarmCPU; MLM; MLMM     32.9
##  2 Lcu.1GRN.Chr1p437359352     1 437359352 DTF_Sask_2017_b BLINK; FarmCPU                17.5
##  3 Lcu.1GRN.Chr3p239208186     3 239208186 DTF_Sask_2017_b BLINK                         14.0
##  4 Lcu.1GRN.Chr6p324049462     6 324049462 DTF_Sask_2017_b BLINK                         13.1
##  5 Lcu.1GRN.Chr1p374734708     1 374734708 DTF_Sask_2017_b BLINK                         12.7
##  6 Lcu.1GRN.Chr2p572729078     2 572729078 DTF_Sask_2017_b BLINK                         12.4
##  7 Lcu.1GRN.Chr2p48311792      2  48311792 DTF_Sask_2017_b BLINK                         11.7
##  8 Lcu.1GRN.Chr3p433371307     3 433371307 DTF_Sask_2017_b BLINK; MLMM                   11.6
##  9 Lcu.1GRN.Chr7p186572874     7 186572874 DTF_Sask_2017_b BLINK                         11.5
## 10 Lcu.1GRN.Chr7p23361774      7  23361774 DTF_Sask_2017_b BLINK                         11.0
## # ℹ 20 more rows
# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue") 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_05.png", mp, width = 12, height = 4, bg = "white")


# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue") 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_06.png", mp, width = 12, height = 4, bg = "white")


Chr 6 Association

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Plot just Chromosome 6
  chrom = 6,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr6p3269280",
  # Create alt labels for the markers
  labels = "32Mbp",
  # Specify Color for each marker vline
  vline.colors = "blue" 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_07.png", mp, width = 12, height = 4, bg = "white")


# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b", 
  # Specify a title
  title = "DTF_Sask_2017 | CV = b", 
  # Highlight specific markers
  markers = "Lcu.1GRN.Chr6p3269280", 
  # Create alt labels for the markers
  labels = "3269280", 
  # Specify Color for each marker vline
  vline.color = "blue",
  # Plot just Chromosome 2
  chr = 6,
  pos1 = 0, 
  pos2 = 6000000,
  #
  facet = T 
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_08.png", mp, width = 6, height = 6)


H & B P values

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue"),
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_09.png", mp, width = 12, height = 4, bg = "white")


# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "DTF_Sask_2017_b",
  # Specify a title
  title = "DTF_Sask_2017 | CV = b",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280"),
  # Create alt labels for the markers
  labels = c("44Mbp","16Mbp","32Mbp"),
  # Specify Color for each marker vline
  vline.colors = c("red","red","blue"),
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_DTF_Sask_2017_01_10.png", mp, width = 12, height = 4, bg = "white")


Markers

# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr6p3269280"
  )
# Save
ggsave("Fig_DTF_Sask_2017_02_01.png", mp, width = 6, height = 4)


# Plot
mp <- gg_Marker_Box(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr2p44545877",
              "Lcu.1GRN.Chr5p1658484",
              "Lcu.1GRN.Chr6p3269280")
  )
# Save
ggsave("Fig_DTF_Sask_2017_02_02.png", mp, width = 6, height = 4)


Volcano Plot

# Plot
mp <- gg_Volcano(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "DTF_Sask_2017",
  # Highlight specific markers
  markers = "Lcu.2RBY.Chr6p2528817", 
  # Create alt labels for the markers
  labels = "Chr6p2528817"
  )
# Save
ggsave("Fig_DTF_Sask_2017_03_01.png", mp, width = 12, height = 4)


Minor Allele Frequency

# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Sask_2017.csv") %>%
  mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Sask_2017.csv") %>%
  mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) + 
  geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)), 
             color = "red", alpha = 0.7) +
  geom_point(color = "darkgreen", alpha = 0.5) +
  facet_grid(. ~ Model) +
  theme_gwaspr() +
  labs(title = "DTF_Sask_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Sask_2017_03_02.png", mp, width = 10, height = 4)


Cotyledon Color

Since this is a simpler qualitative trait, we can assume there should be 1 large effect QTL.


Manhattan Plots

Plot models together (default)

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color" 
  )
# Save
ggsave("Fig_Cotyledon_Color_01_01.png", mp, width = 12, height = 4, bg = "white")


Facet each model

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Facet each model
  facet = T 
  )
# Save
ggsave("Fig_Cotyledon_Color_01_02.png", mp, width = 12, height = 12, bg = "white")


Now lets customize the manhattan plot with some selected markers

list_Top_Markers(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  traits = "Cotyledon_Color" 
  )
## # A tibble: 55 × 6
##    SNP                       Chr       Pos Traits          Models                    Max_LogP
##    <chr>                   <int>     <int> <chr>           <chr>                        <dbl>
##  1 Lcu.1GRN.Chr1p365986872     1 365986872 Cotyledon_Color BLINK; FarmCPU; MLM; MLMM    157. 
##  2 Lcu.1GRN.Chr1p361840399     1 361840399 Cotyledon_Color BLINK; FarmCPU; GLM           55.3
##  3 Lcu.1GRN.Chr1p382208787     1 382208787 Cotyledon_Color GLM                           33.8
##  4 Lcu.1GRN.Chr1p361910110     1 361910110 Cotyledon_Color MLMM                          33.1
##  5 Lcu.1GRN.Chr1p361856257     1 361856257 Cotyledon_Color GLM; MLMM                     32.2
##  6 Lcu.1GRN.Chr4p359784721     4 359784721 Cotyledon_Color GLM; MLM                      29.8
##  7 Lcu.1GRN.Chr1p365318023     1 365318023 Cotyledon_Color MLM                           26.6
##  8 Lcu.1GRN.Chr1p365318027     1 365318027 Cotyledon_Color MLM                           26.3
##  9 Lcu.1GRN.Chr6p390278241     6 390278241 Cotyledon_Color GLM                           23.0
## 10 Lcu.1GRN.Chr4p244472485     4 244472485 Cotyledon_Color GLM                           22.7
## # ℹ 45 more rows
# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot only certain GWAS models
  models = c("MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3) 
  )
# Save
ggsave("Fig_Cotyledon_Color_01_03.png", mp, width = 12, height = 4, bg = "white")


Chr 1 Association

Plot only chromosome 1

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot just Chromosome 1
  chrom = 1,
  # Plot only certain GWAS models
  models = c("MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3),
  # Set a max p value
  pmax = 50
  )
# Save
ggsave("Fig_Cotyledon_Color_01_04.png", mp, width = 12, height = 4, bg = "white")


Create a Zoomed in Plot with gg_Manhattan_Zoom

# Plot
mp <- gg_Manhattan_Zoom(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot just Chromosome 1
  chr = 1,
  pos1 = 360000000,
  pos2 = 370000000,
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3),
  #
  facet = T
  )
# Save
ggsave("Fig_Cotyledon_Color_01_05.png", mp, width = 8, height = 6)


H & B P values

# Plot
mp <- gg_Manhattan(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/", 
  # Select a trait to plot
  trait = "Cotyledon_Color", 
  # Plot only certain GWAS models
  models = c("MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Create alt labels for the markers
  labels = c("365Mbp","361Mbp","362Mbp"),
  # Specify Color for each marker vline
  vline.colors = rep("green",3),
  # Plot adjusted p values
  plotHBPvalues = T,
  # Select threshold for adjusted p values
  threshold = 1.3
  )
# Save
ggsave("Fig_Cotyledon_Color_01_06.png", mp, width = 12, height = 4, bg = "white")


Marker Plots

# Plot
mp <- gg_Marker_Pie(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr1p365986872",
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3") 
  )
# Save
ggsave("Fig_Cotyledon_Color_02_01.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr1p365986872 is convincing.


# Plot
mp <- gg_Marker_Pie(
  # Genotype data
  xG = myG,
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr1p361840399",
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3") 
  )
# Save
ggsave("Fig_Cotyledon_Color_02_02.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr1p361840399 is likely a false positive.


# Plot
mp <- gg_Marker_Pie(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY, 
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Select markers to plot
  markers = "Lcu.1GRN.Chr1p362336814",
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3") 
  )
# Save
ggsave("Fig_Cotyledon_Color_02_03.png", mp, width = 6, height = 4)

The marker Lcu.1GRN.Chr1p362336814 is also likely a false positive.


# Plot
mp <- gg_Marker_Pie(
  # Genotype data
  xG = myG, 
  # Phenotype data
  xY = myY,
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Select markers to plot
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"),
  # Specify colors for alleles
  marker.colors = c("darkred","darkgoldenrod3") 
  )
# Save
ggsave("Fig_Cotyledon_Color_02_04.png", mp, width = 6, height = 4)


Volcano Plot

# Plot
mp <- gg_Volcano(
  # Specify a folder with GWAS results
  folder = "GWAS_Results/",
  # Select a trait to plot
  trait = "Cotyledon_Color",
  # Plot only certain GWAS models
  models = c("MLM","MLMM","FarmCPU","BLINK"),
  # Highlight specific markers
  markers = c("Lcu.1GRN.Chr1p365986872",
              "Lcu.1GRN.Chr1p361840399",
              "Lcu.1GRN.Chr1p362336814"), 
  # Create alt labels for the markers
  labels = c("365Mbp","351Mbp","362Mbp") 
  )
# Save
ggsave("Fig_Cotyledon_Color_03_01.png", mp, width = 12, height = 4)


Minor Allele Frequency

# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.Cotyledon_Color.csv") %>%
  mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.Cotyledon_Color.csv") %>%
  mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) + 
  geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)), 
             color = "red", alpha = 0.7) +
  geom_point(color = "darkgreen", alpha = 0.5) +
  facet_grid(. ~ Model) +
  theme_gwaspr() +
  labs(title = "Cotyledon_Color", y = "-log10(p)")
# Save
ggsave("Fig_Cotyledon_Color_03_02.png", mp, width = 10, height = 4)


Bonus

PCA

library(plotly)
library(htmlwidgets)
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA.csv") %>%
  left_join(myY, by = c("taxa"="Name"))

DTF Nepal 2017

https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Nepal_2017.html

# Prep data
myColors <- c("darkred", "darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
              color = ~DTF_Nepal_2017, colors = myColors) %>%
  add_markers() %>%
  layout(scene = list(title = "DTF Nepal 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Nepal_2017.html")
mp

DTF Sask 2017

https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Sask_2017.html

# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
              color = ~DTF_Sask_2017, colors = myColors) %>%
  add_markers() %>%
  layout(scene = list(title = "DTF Sask 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Sask_2017.html")
mp

Cotyledon Color

https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_Cotyledon_Color.html

# Prep data
myColors <- c("darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
              color = ~Cotyledon_Color, colors = myColors) %>%
  add_markers() %>%
  layout(scene = list(title = "Cotyledon_Color"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_Cotyledon_Color.html")
mp

SNP Density Plot

Package Source: https://github.com/YinLiLin/R-CMplot

library(CMplot)
# Prep data
xx <- myG %>% select(rs, chrom, pos) 
# Plot
CMplot(xx, plot.type="d", bin.size=1e7, 
       chr.den.col=c("darkgreen", "darkgoldenrod3", "darkred"), 
       file="jpg", file.output=T, verbose=F, width=9, height=6,
       file.name = "LDP")

LD Decay

library(genetics)
xG <- read.csv("Data/myG_hmp.csv", header = T)
dna <- data.frame(stringsAsFactors = F,
                  Symbol = c("A", "C", "G", "T", "U", 
                             "R", "Y", "S", "W", "K", "M", "N"),
                  Value  = c("A/A","C/C","G/G","T/T","U/U",
                             "A/G","C/T","G/C","A/T","G/T","A/C","N/N") )
xx <- xG[,c(-2,-5,-6,-7,-8,-9,-10,-11)]
for(i in 4:ncol(xx)) {
  xx[xx[,i]=="N", i] <- NA 
  xx[,i] <- plyr::mapvalues(xx[,i], dna$Symbol, dna$Value)
}
#
LD_Decay <- function(x, folder = "LD/", Chr = 1, Num = 200) {
  xc <- x %>% filter(chrom == Chr) 
  xr <- xc %>% column_to_rownames("rs")
  xr <- xr[,c(-1,-2)]
  rr <- round(runif(Num, 1, nrow(xr)))
  while(sum(duplicated(rr))>0) {
    ra <- round(runif(sum(duplicated(rr)), 1, nrow(xr)))
    rr <- rr[!duplicated(rr)]
    rr <- c(rr, ra)
  }
  rr <- rr[order(rr)]
  xr <- xr[rr,]
  #
  xi <- xr %>% t() %>% as.data.frame()
  myLD <- genetics::LD(genetics::makeGenotypes(xi))
  save(myLD, file = paste0(folder, "LD_Chrom_", Chr, "_", Num, ".Rdata"))
}
# Calculate LD per chromosome
LD_Decay(x = xx, Chr = 1, Num = 1000)
LD_Decay(x = xx, Chr = 2, Num = 1000)
LD_Decay(x = xx, Chr = 3, Num = 1000)
LD_Decay(x = xx, Chr = 4, Num = 1000)
LD_Decay(x = xx, Chr = 5, Num = 1000)
LD_Decay(x = xx, Chr = 6, Num = 1000)
LD_Decay(x = xx, Chr = 7, Num = 1000)
# Create function
movingAverage <- function(x, n = 5) {
  stats::filter(x, rep(1 / n, n), sides = 2)
}
# Prep data
xG <- read.csv("Data/myG_hmp.csv", header = T)
xx <- NULL
for(i in 1:7) {
  xc <- xG %>% filter(chrom == i) 
  load(paste0("LD/LD_Chrom_",i,"_1000.Rdata"))
  xi <- myLD$`R^2` %>% as.data.frame() %>% 
    rownames_to_column("SNP1") %>% 
    gather(SNP2, LD, 2:ncol(.)) %>% 
    filter(!is.na(LD)) %>%
    mutate(Chr = i,
           SNP1_d = plyr::mapvalues(SNP1, xc$rs, xc$pos, warn_missing = F),
           SNP2_d = plyr::mapvalues(SNP2, xc$rs, xc$pos, warn_missing = F),
           Distance = as.numeric(SNP2_d) - as.numeric(SNP1_d)) %>%
    arrange(Distance, rev(LD))
  #
  xii <- xi %>% filter(Distance < 1000000)
  myloess <- stats::loess(LD ~ Distance, data = xii, span=0.50)
  #
  xi <- xi %>% 
    mutate(Moving_Avg = movingAverage(LD, n = 100),
           Loess_Chr = ifelse(Distance < 1000000, predict(myloess), NA))
  #
  xx <- bind_rows(xx, xi)
}
#
x1 <- xx %>% group_by(Chr) %>%
  summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- xx %>% filter(Loess_Chr < 0.2) %>% group_by(Chr) %>% 
  summarise(Threshold_0.2 = min(Distance, na.rm = T))
#x3 <- xx %>% left_join(x1, by = "Chr") %>%
#  group_by(Chr) %>% filter(Moving_Avg < Mean_LD) %>%
#  summarise(Threshold_0.1 = min(Distance, na.rm = T))
myChr <-  x1 %>% left_join(x2, by = "Chr") #%>%
  #left_join(x3, by = "Chr")
#
xx <- left_join(xx, myChr, by = "Chr") %>%
  mutate(Chr = as.factor(Chr))
yy <- xx %>% filter(Distance < 1000000) %>%
  arrange(Distance, rev(LD))
myloess <- stats::loess(LD ~ Distance, data = yy, span=0.50)
yy <- yy %>% 
  mutate(Moving_Avg = movingAverage(LD, n = 100),
         Loess_Geno = ifelse(Distance < 1000000, predict(myloess), NA))
#
x1 <- yy %>% summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- yy %>% filter(Loess_Chr < 0.2) %>%  
  summarise(Threshold_0.2 = min(Distance, na.rm = T))
myGeno <-   cross_join(x1, x2)
# Plot first 1 Mbp
mp1 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Geno)) +
  geom_point(aes(y = LD), size = 0.2, pch = 15, alpha = 0.25) +
  #geom_line(aes(y = Moving_Avg), color = "red", lwd = 1) +
  geom_line(aes(y = Moving_Avg), alpha = 0.7) +
  geom_line() +
  geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
  geom_hline(data = myGeno, aes(yintercept = Mean_LD), color = "red", lty = 2) +
  geom_vline(data = myGeno, lty = 2, size = 0.3, alpha = 0.8,
             aes(xintercept = Threshold_0.2/1000)) +
  scale_x_continuous(seq(0, 1000, by = 100), expand = c(0,10)) +
  theme_gwaspr(legend.position = "none",
               axis.title.y = ggtext::element_markdown()) +
  labs(title = "LD of all markers within 1 Mbp", y = "r^2", x = "Kbp")
# Plot first 1 Mbp
mp2 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Chr)) +
  geom_line(aes(y = Moving_Avg), alpha = 0.5) +
  geom_line() +
  geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
  geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
  scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
  facet_wrap(paste("Chr =", Chr) + paste("Threshold =", Threshold_0.2) ~ ., 
             ncol = 7, scales = "free_x") +
  geom_vline(data = myChr, lty = 2, size = 0.3, alpha = 0.8,
             aes(xintercept = Threshold_0.2/1000)) +
  theme_gwaspr(legend.position = "none",
               axis.title.y = ggtext::element_markdown()) +
  labs(title = "By Chromosome", y = "r^2", x = "Kbp")
# Plot full chromsomes
mp3 <- ggplot(xx, aes(x = Distance/1000000, y = Moving_Avg)) +
  geom_line(size = 0.5, alpha = 0.5) +
  geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
  geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
  scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
  facet_wrap(paste("Chr =", Chr) ~ ., ncol = 7, scales = "free_x") +
  theme_gwaspr(legend.position = "none",
               axis.title.y = ggtext::element_markdown()) +
  labs(title = "All Markers", y = "r^2", x = "Mbp")
#yy <- yy %>% filter(Distance < 10000)
# Append
mp <- ggarrange(mp1, mp2, mp3, ncol = 1, nrow = 3)
# Save
ggsave("Fig_06_LD.png", mp , width = 12, height = 10)