GWAS Tutorial with gwaspr
An R tutorial on running genome-wide association studies (GWAS) with GAPIT and gwaspr
Introduction
Resources
This vignette is a tutorial on running Genome-Wide
Association Studies (GWAS) in
with the packages GAPIT
and plotting the results with gwaspr
.
- GAPIT Website: https://www.maizegenetics.net/gapit
- GAPIT Manual: http://www.zzlab.net/GAPIT/gapit_help_document.pdf
- GAPIT github: https://github.com/jiabowang/GAPIT
- GAPIT Issues: https://github.com/jiabowang/GAPIT/issues
- gwaspr Website: https://derekmichaelwright.github.io/gwaspr/
Data
Mandatory
Optional
myY.csv
: Phenotype filemyG_hmp.csv
: Genotype file in hapmap formatmyCV.csv
: (Optional) Covariate filemyM.csv
: (Optional) Metadata file for the genotypes.
All phenotype and genotype data is derived from and available at: https://knowpulse.usask.ca/
Associated papers:
For this tutorial, we will use a Lentil Diversity Panel (LDP) consisting of 324 Lens culinaris cultivars, representative of the global genetic diversity of cultivated lentils.
Install GAPIT
Basic GWAS R Code
Load your Data
# Read in phenotype data
myY <- read.csv("Data/myY.csv")
# Read in genotype data (note: header = F)
myG <- read.csv("Data/myG_hmp.csv", header = F)
Run the GWAS
# Custom GWAS
myGAPIT <- GAPIT( Y = myY,
G = myG,
CV = myCV, # Optional Covariate File
KI = myKI, # Optional Kinship file
PCA.total = 4,
model = c("GLM","MLM","MLMM","FarmCPU","BLINK")
)
GWAS Models
GLM
General Linear Model: uses population structure (Q) as a cofactor.MLM
Mixed Linear Model: adds Kinship (K) as a random effect. AKA a Q+K model.CMLM
Compressed Mixed Linear Model: reduces the confounding effect between the testing markers and the individuals genetic effects by replacing them with their corresponding groups based on cluster analysis defined by the user. MLM is the equivelent of each individual being its own group, and GLM is the equivlent of having only one group for all individuals.MLMM
Multiple Locus Mixed Linear Model: uses forward-backward stepwise linear mixed-model regression to include associated markers as covariates.SUPER
Settlement of MLM Under Progressively Exclusive Relationship: uses a FaST-LMM algorithm to solve a computational burden for large samples. K is derived from a smaller number of associated markers.FarmCPU
Fixed and random model Circulating Probability Unification: uses iterations to fit associated markers selected using a maximum likelihood method as cofactors.BLINK
BLINK: similar to FarmCPU but eliminates the assumption of causal genes are evenly distributed across the genome. Markers that are in LD with the most significant marker are excluded. Also uses a different method to determine markers for K (BIC).
Data Preparation
The code above will produce errors and not run properly. This is due to the input data not being properly formatted and will be a common occurance. Lets explore why and try to fix these errors.
First install gwaspr
package which contains
tidyverse
and ggplot
packages for plotting the
GAPIT results later.
Phenotype Data
Our population (The LDP) has been phenotyped for three traits:
- Cotyledon_Color: a qualitative trait describing cotyledon color (Red = 0, Yellow = 1).
- DTF_Nepal_2017: a quantitative trait describing days from sowing to flowering in a 2017 Nepal field trial.
- DTF_Sask_2017: a quantitative trait describing days from sowing to flowering in a 2017 Saskatchewan field trial.
First we must convert our qualitative trait
Cotyledon_Color from character
to
numeric
format.
Red
= 0
Yellow
= 1
Green
= NA
missing data
= NA
# Convert qualitative trait to numeric
myY <- myY %>%
mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c("Red", "Yellow"), 0:1),
Cotyledon_Color = as.numeric(Cotyledon_Color))
Now we can visualize our data.
# Prep data
xx <- myY %>%
gather(Trait, Value, 2:ncol(.)) %>%
mutate(Trait = factor(Trait, levels = unique(Trait)))
# Plot
mp <- ggplot(xx, aes(x = Value)) +
geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) +
facet_wrap(. ~ Trait, ncol = 3, scales = "free") +
theme_gwaspr() +
labs(title = "Phenotype Data")
# Save
ggsave("Fig_01_myY_Phenotypes.png", mp, width = 8, height = 3)
Genotype Data
Our population, the LDP, was genotyped using an exome caputure
array and filtered with the following criteria to yield a data set
of 336,367 SNPs, stored in our object myG
.
- Only Bi-allelic = TRUE
- Minimum Read Depth = 3
- Minor Allele Frequency = >5%
- Maximum Missing Frequency = 25%
- Heterozygous = <25%
The gwaspr
package is built for genotype data in the
hapmap format. Although GAPIT
can also be
run with numeric formatted genotype data, when using hapmap formatted
genotype data, it is important to set header = F
. This will
make the column names the first row of your myG
object, as
demonstrated below:
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 1 rs alleles chrom pos strand assembly center protLSID assayLSID panel QCcode
## 2 Lcu.1GRN.Chr1p853882 A/G 1 853882 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 Lcu.1GRN.Chr1p854117 G/A 1 854117 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 Lcu.1GRN.Chr1p854159 A/C 1 854159 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 Lcu.1GRN.Chr1p854174 C/G 1 854174 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 Lcu.1GRN.Chr1p870836 A/G 1 870836 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 7 Lcu.1GRN.Chr1p870898 C/T 1 870898 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 8 Lcu.1GRN.Chr1p870903 G/A 1 870903 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 9 Lcu.1GRN.Chr1p871108 A/G 1 871108 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 10 Lcu.1GRN.Chr1p872492 T/C 1 872492 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## V12 V13 V14 V15 V16 V17
## 1 X3156.11_AGL CDC_Asterix_AGL CDC_Cherie_AGL CDC_Glamis_AGL CDC_Gold_AGL CDC_Greenstar_AGL
## 2 G A G A G A
## 3 G G G A G G
## 4 C C C C C A
## 5 G G G G G C
## 6 G G N G G A
## 7 T C N C T C
## 8 G G N A G G
## 9 A A A A A A
## 10 T T T T T N
Covariates + Kinship
GAPIT will automatically calculate kinship matrices and performs PCA
on the genotype data to be used as a covariate to account for population
structure. However, custom Kinship matrices and covariates can be used
as well. Just make sure they are formatted similar to the output files
GAPIT
uses for covariates and kinship.
GWAS_Results/GAPIT.Genotype.PCA.csv
GWAS_Results/GAPIT.Genotype.Kin_Zhang.csv
For more information on my CV see the following papers:
# Plot
mp <- ggplot(myCV, aes(x = b_coef * 10000)) +
geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) +
theme_gwaspr() +
labs(title = "Covariate Data")
# Save
ggsave("Fig_03_myCV_Covariates.png", mp, width = 5, height = 3)
Common Errors
There are a number of common errors one might encounter when attempting to do GWAS with GAPIT. We will go through a few of these:
Incorect data uploading
myG <- read.csv("Genotype_Data.csv", header = T)
when
it should be header = F
Notice how the first row of our genotype data is actually the column names.
Numeric Phenotype Data
For our qualitative trait Cotyledon_Color, we needed to convert
Red
to0
Yellow
to1
Genotype Names
Genotype names not matching between the phenotype file and genotype
file is an easy error to miss. R
does not like column names
that begin with a number. E.g., the genotype
3156-11_AGL
should have its named changed to
X3156-11_AGL
. Another common problem can come from dashses
(-
) and periods (.
), E.g..
CDC_QG.1_AGL
or CDC_QG-1_AGL
, or spaces
() and underscores (
_
), E.g.,
CDC Asterix AGL
or CDC_Asterix_AGL
.
## [1] "CDC Asterix AGL" "3156-11_AGL" "CDC_QG-1_AGL"
## [1] "rs" "alleles" "chrom" "pos" "strand" "assembly" "center" "protLSID"
## [9] "assayLSID" "panel" "QCcode" "X3156.11_AGL" "CDC_Asterix_AGL" "CDC_QG.1_AGL"
# Change names of select individuals
myY <- myY %>%
mutate(Name = plyr::mapvalues(Name,
c("3156-11_AGL", "CDC Asterix AGL", "CDC_QG-1_AGL"),
c("X3156.11_AGL", "CDC_Asterix_AGL", "CDC_QG.1_AGL")))
## character(0)
## [1] "rs" "alleles" "chrom" "pos" "strand" "assembly" "center" "protLSID" "assayLSID" "panel" "QCcode"
Run GWAS with GAPIT
Run GWAS
For our purposes we will run GWAS with the following models:
GLM
, MLM
, MLMM
,
FarmCPU
, BLINK
.
Y
= phenotype file
G
= genotype file
CV
= covariate file
setwd("GWAS_Results/")
# Custom GWAS
GAPIT( Y = myY,
G = myG,
PCA.total = 4,
model = c("FarmCPU", "BLINK"),#"GLM", "MLM", "MLMM",
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
GAPIT( Y = myY,
G = myG,
PCA.total = 4,
model = "BLINK",
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
GAPIT( Y = myY[,1:2],
G = myG,
PCA.total = 4,
model = "FarmCPU",
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
GAPIT( Y = myY[,-2],
G = myG,
PCA.total = 4,
model = "MLMM",
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
GAPIT( Y = myY,
G = myG,
PCA.total = 4,
model = "MLM",
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
GAPIT( Y = myY,
G = myG,
PCA.total = 4,
model = "GLM",
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
PCA
Why set PCA.total = 4
?
Principle components of the marker data are used to account for population structure. By plotting our eigenvalues by the number of principle components we can use the “kink method” to choose how many PCs to use in our GWAS. From PC4 to PC5, there is little change in the eigenvalues, meaning that additional PCs will not account for more of the variance within our data (i.e., they are unnecessary).
In our case, we are using an output (the PCA of our marker data) from
the GWAS to make this decision. If this is your first time running a
GWAS, setting PCA.total = 3
is good practice, as your GWAS
can be easily rerun if you later determine you need to adjust
settings.
# Prep data
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA_eigenvalues.csv") %>%
mutate(PC = 1:324, `Eigen Values` = x,
fill = ifelse(PC == 4, "PC", ""),
`Percent of Variation` = 100 * `Eigen Values` / sum(`Eigen Values`) ) %>%
slice(1:10)
# Plot
mp <- ggplot(xx, aes(x = PC, y = `Percent of Variation`)) +
geom_line(linewidth = 1, alpha = 0.7) +
geom_point(aes(fill = fill), size = 3, pch = 21) +
scale_fill_manual(values = c("darkgreen", "darkred")) +
scale_x_continuous(breaks = 1:10) +
theme_gwaspr(legend.position = "none") +
labs(title = "PCA determination")
# Save
ggsave("Fig_04_PCA_Eigen_Values.png", width = 6, height = 4)
User-inputted Kinship and Covariates
We will rerun GWAS on the DTF traits using the b coefficient from a photothermal model derived from Wright et al. (2020). This should act to remove or diminish temperature related associations, and thus emphasize photoperiod related associations.
# Prep data
myY2 <- myY %>% select(Name, DTF_Sask_2017_b=DTF_Sask_2017)
# Run GWAS with covariate
GAPIT( Y = myY2, G = myG, CV = myCV,
model = c("MLM", "FarmCPU", "BLINK", "MLMM", "GLM"),
PCA.total = 0,
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
Post GWAS Preparation
After running a genome-wide association study, there are some quality control and post-GWAS steps we can take to evaluate our results and identify false positives.
Prepare Data
Convert our Cotyledon_Color
back to its character
values.
myY <- myY %>%
mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c(0,1), c("Red","Yellow")))
Reread in our myG
file but with
header = T
.
Check which traits and models have ran with is.ran()
## Trait GLM MLM MLMM FarmCPU BLINK CMLM SUPER
## 1 Cotyledon_Color X X X X X NA NA
## 2 DTF_Nepal_2017 X X X X X NA NA
## 3 DTF_Sask_2017 X X X X X NA NA
## 4 DTF_Sask_2017_b X X X X X NA NA
Check if the results files are ordered by p.value with
is_ordered()
. If not this could cause problems
## Trait GLM MLM MLMM FarmCPU BLINK CMLM SUPER FarmCPU_Kansas BLINK_Kansas
## 1 Cotyledon_Color X X X X X NA NA X X
## 2 DTF_Nepal_2017 X X X X X NA NA X X
## 3 DTF_Sask_2017 X X X X X NA NA X X
## 4 DTF_Sask_2017_b X X X X X NA NA X X
If they are not ordered, they can be with the
order_GWAS_Results()
function
Create a summary table of all significant associations
# Create summary of GWAS results
xx <- table_GWAS_Results(folder = "GWAS_Results/", threshold = 6.8, sug.threshold = 5.3)
# Save
write.csv(xx, "Supplemtnal_Table_01.csv", row.names = F)
Significance threshold
- suggestive association = \(p<5*10^{-6}=0.000005\) -> -log10(0.000005) = 5.3
- bonferroni threshold = \(p<5*10^{-8}=0.00000005\) -> -log10(0.00000005) = 7.3
- custom threshold = \(p<0.05/n(markers)=0.05/336367\) -> -log10(0.05/336367) = 6.8
Calculate significance threshold
## [1] 1.30103
## [1] 5.30103
## [1] 7.30103
## [1] 6.827842
## [1] 7.526812
DTF Nepal 2017
Since Days To Flower (DTF) is a quantitative trait, we can expect many smaller effect QTLs.
Manhattan Plots
Plot models together (default)
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017"
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_01.png", mp, width = 12, height = 4, bg = "white")
Facet each model
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Facet each model
facet = T
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_02.png", mp, width = 12, height = 8, bg = "white")
Now lets customize the manhattan plot with some selected markers
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "DTF_Nepal_2017"
)
## # A tibble: 45 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr5p1658484 5 1658484 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM 34.0
## 2 Lcu.1GRN.Chr2p44546658 2 44546658 DTF_Nepal_2017 FarmCPU; GLM; MLM 33.0
## 3 Lcu.1GRN.Chr2p44545877 2 44545877 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM 22.7
## 4 Lcu.1GRN.Chr2p44558948 2 44558948 DTF_Nepal_2017 GLM; MLM 22.2
## 5 Lcu.1GRN.Chr3p389298069 3 389298069 DTF_Nepal_2017 GLM; MLM 19.0
## 6 Lcu.1GRN.Chr3p389318049 3 389318049 DTF_Nepal_2017 GLM; MLM 19.0
## 7 Lcu.1GRN.Chr3p389319560 3 389319560 DTF_Nepal_2017 GLM; MLM 19.0
## 8 Lcu.1GRN.Chr5p1650591 5 1650591 DTF_Nepal_2017 FarmCPU; GLM; MLM 18.7
## 9 Lcu.1GRN.Chr3p345157947 3 345157947 DTF_Nepal_2017 FarmCPU 15.9
## 10 Lcu.1GRN.Chr5p2101990 5 2101990 DTF_Nepal_2017 FarmCPU 14.8
## # ℹ 35 more rows
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","blue"),
# Specify number of legend rows
legend.rows = 1
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_03.png", mp, width = 12, height = 4, bg = "white")
Chr 2 Association
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot just Chromosome 2
chrom = 2,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = "Lcu.1GRN.Chr2p44545877",
# Create alt labels for the markers
labels = "44Mbp",
# Specify Color for each marker vline
vline.colors = "red"
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_04.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot just Chromosome 2
chr = 2,
pos1 = 40000000,
pos2 = 50000000,
# Highlight specific markers
markers = "Lcu.1GRN.Chr2p44545877",
# Create alt labels for the markers
labels = "44545877",
# Specify Color for each marker vline
vline.color = "red",
#
facet = T
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_05.png", mp, width = 6, height = 6)
Chr 5 Association
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot just Chromosome 5
chrom = 5,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = "Lcu.1GRN.Chr5p1658484",
# Create alt labels for the markers
labels = "16Mbp",
# Specify Color for each marker vline
vline.colors = "blue"
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_06.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Highlight specific markers
markers = "Lcu.1GRN.Chr5p1658484",
# Create alt labels for the markers
labels = "1658484",
# Plot just Chromosome 5
chr = 5,
pos1 = 1000000,
pos2 = 2000000,
# Specify Color for each marker vline
vline.color = "red",
#
facet = T
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_07.png", mp, width = 6, height = 6)
H & B P Values
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","blue"),
# Specify number of legend rows
legend.rows = 1,
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_08.png", mp, width = 12, height = 4, bg = "white")
Markers
2 strong association peaks can be identified on chromosomes 2 and 5.
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Select markers to plot
markers = "Lcu.1GRN.Chr2p44545877",
# Specify colors for alleles
marker.colors = c("darkgoldenrod3","darkgreen")
)
# Save
ggsave("Fig_DTF_Nepal_2017_02_01.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr2p44545877
is associated with
early flowering in Nepal.
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Select markers to plot
markers = "Lcu.1GRN.Chr5p1658484",
# Specify colors for alleles
marker.colors = c("darkgoldenrod3","darkgreen")
)
# Save
ggsave("Fig_DTF_Nepal_2017_02_02.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr5p1658484
is associated with
early flowering in Nepal.
Now lets plot both markers
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Select markers to plot
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3","steelblue","darkgreen")
)
# Save
ggsave("Fig_DTF_Nepal_2017_02_03.png", mp, width = 6, height = 4)
# Plot
mp <- gg_Marker_Bar(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Select markers to plot
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3","steelblue","darkgreen")
)
# Save
ggsave("Fig_DTF_Nepal_2017_02_04.png", mp, width = 6, height = 4)
Volcano Plot
# Plot
mp <- gg_Volcano(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Create alt labels for the markers
labels = c("44545877","1658484")
)
# Save
ggsave("Fig_DTF_Nepal_2017_03_01.png", mp, width = 12, height = 4)
Minor Allele Frequency
# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Nepal_2017(NYC).csv") %>%
mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Nepal_2017(NYC).csv") %>%
mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) +
geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)),
color = "red", alpha = 0.7) +
geom_point(color = "darkgreen", alpha = 0.5) +
facet_grid(. ~ Model) +
theme_gwaspr() +
labs(title = "DTF_Nepal_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Nepal_2017_03_02.png", mp, width = 10, height = 4)
DTF Sask 2017
Saskatchewan has drastically different temperature and photoperiods than Nepal, so we should expect different associations.In addition, we ran this trait twice, with and without the use of a covariate (the b coefficient from a photothermal model)
Manhattan Plots
Plot models together (default)
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017"
)
# Save
ggsave("Fig_DTF_Sask_2017_01_01.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b"
)
# Save
ggsave("Fig_DTF_Sask_2017_01_02.png", mp, width = 12, height = 4, bg = "white")
Facet each model
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Facet each model
facet = T
)
# Save
ggsave("Fig_DTF_Sask_2017_01_03.png", mp, width = 12, height = 8, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Facet each model
facet = T
)
ggsave("Fig_DTF_Sask_2017_01_04.png", mp, width = 12, height = 14, bg = "white")
Now lets customize the manhattan plot with some selected markers
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "DTF_Sask_2017"
)
## # A tibble: 49 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr2p44558948 2 44558948 DTF_Sask_2017 BLINK 15.2
## 2 Lcu.1GRN.Chr2p45088100 2 45088100 DTF_Sask_2017 BLINK; FarmCPU; GLM 14.5
## 3 Lcu.1GRN.Chr2p45092147 2 45092147 DTF_Sask_2017 FarmCPU; GLM 13.3
## 4 Lcu.1GRN.Chr1p437359352 1 437359352 DTF_Sask_2017 BLINK; GLM 12.1
## 5 Lcu.1GRN.Chr1p437368329 1 437368329 DTF_Sask_2017 GLM 12.1
## 6 Lcu.1GRN.Chr1p437368794 1 437368794 DTF_Sask_2017 GLM 12.1
## 7 Lcu.1GRN.Chr6p236151499 6 236151499 DTF_Sask_2017 BLINK 12.1
## 8 Lcu.1GRN.Chr2p48982831 2 48982831 DTF_Sask_2017 GLM 11.5
## 9 Lcu.1GRN.Chr4p5017613 4 5017613 DTF_Sask_2017 BLINK 11.5
## 10 Lcu.1GRN.Chr3p448489765 3 448489765 DTF_Sask_2017 BLINK 10.5
## # ℹ 39 more rows
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "DTF_Sask_2017_b",
# Select only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK")
)
## # A tibble: 39 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr6p3269280 6 3269280 DTF_Sask_2017_b BLINK; FarmCPU; MLM; MLMM 32.9
## 2 Lcu.1GRN.Chr1p437374598 1 437374598 DTF_Sask_2017_b FarmCPU 30.6
## 3 Lcu.1GRN.Chr1p446411579 1 446411579 DTF_Sask_2017_b FarmCPU 28.2
## 4 Lcu.1GRN.Chr6p3270522 6 3270522 DTF_Sask_2017_b FarmCPU 21.8
## 5 Lcu.1GRN.Chr3p448586972 3 448586972 DTF_Sask_2017_b FarmCPU 20.4
## 6 Lcu.1GRN.Chr3p437386398 3 437386398 DTF_Sask_2017_b FarmCPU 18.6
## 7 Lcu.1GRN.Chr6p431657465 6 431657465 DTF_Sask_2017_b FarmCPU 17.8
## 8 Lcu.1GRN.Chr1p437359352 1 437359352 DTF_Sask_2017_b BLINK; FarmCPU 17.5
## 9 Lcu.1GRN.Chr3p239208186 3 239208186 DTF_Sask_2017_b BLINK 14.0
## 10 Lcu.1GRN.Chr3p437318468 3 437318468 DTF_Sask_2017_b FarmCPU 13.3
## # ℹ 29 more rows
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue")
)
# Save
ggsave("Fig_DTF_Sask_2017_01_05.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue")
)
# Save
ggsave("Fig_DTF_Sask_2017_01_06.png", mp, width = 12, height = 4, bg = "white")
Chr 6 Association
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Plot just Chromosome 6
chrom = 6,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = "Lcu.1GRN.Chr6p3269280",
# Create alt labels for the markers
labels = "32Mbp",
# Specify Color for each marker vline
vline.colors = "blue"
)
# Save
ggsave("Fig_DTF_Sask_2017_01_07.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Highlight specific markers
markers = "Lcu.1GRN.Chr6p3269280",
# Create alt labels for the markers
labels = "3269280",
# Specify Color for each marker vline
vline.color = "blue",
# Plot just Chromosome 2
chr = 6,
pos1 = 0,
pos2 = 6000000,
#
facet = T
)
# Save
ggsave("Fig_DTF_Sask_2017_01_08.png", mp, width = 6, height = 6)
H & B P values
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue"),
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_DTF_Sask_2017_01_09.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue"),
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_DTF_Sask_2017_01_10.png", mp, width = 12, height = 4, bg = "white")
Markers
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Sask_2017",
# Select markers to plot
markers = "Lcu.1GRN.Chr6p3269280"
)
# Save
ggsave("Fig_DTF_Sask_2017_02_01.png", mp, width = 6, height = 4)
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Sask_2017",
# Select markers to plot
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280")
)
# Save
ggsave("Fig_DTF_Sask_2017_02_02.png", mp, width = 6, height = 4)
Volcano Plot
# Plot
mp <- gg_Volcano(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Highlight specific markers
markers = "Lcu.2RBY.Chr6p2528817",
# Create alt labels for the markers
labels = "Chr6p2528817"
)
# Save
ggsave("Fig_DTF_Sask_2017_03_01.png", mp, width = 12, height = 4)
Minor Allele Frequency
# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Sask_2017(NYC).csv") %>%
mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Sask_2017(NYC).csv") %>%
mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) +
geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)),
color = "red", alpha = 0.7) +
geom_point(color = "darkgreen", alpha = 0.5) +
facet_grid(. ~ Model) +
theme_gwaspr() +
labs(title = "DTF_Sask_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Sask_2017_03_02.png", mp, width = 10, height = 4)
Cotyledon Color
Since this is a simpler qualitative trait, we can assume there should be 1 large effect QTL.
Manhattan Plots
Plot models together (default)
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color"
)
# Save
ggsave("Fig_Cotyledon_Color_01_01.png", mp, width = 12, height = 4, bg = "white")
Facet each model
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Facet each model
facet = T
)
# Save
ggsave("Fig_Cotyledon_Color_01_02.png", mp, width = 12, height = 8, bg = "white")
Now lets customize the manhattan plot with some selected markers
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "Cotyledon_Color"
)
## # A tibble: 54 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr1p365986872 1 365986872 Cotyledon_Color BLINK; FarmCPU; GLM; MLM; MLMM 175.
## 2 Lcu.1GRN.Chr1p361840399 1 361840399 Cotyledon_Color BLINK; FarmCPU 83.7
## 3 Lcu.1GRN.Chr1p361856257 1 361856257 Cotyledon_Color MLMM 39.5
## 4 Lcu.1GRN.Chr1p365318023 1 365318023 Cotyledon_Color GLM; MLM 34.8
## 5 Lcu.1GRN.Chr1p365318027 1 365318027 Cotyledon_Color GLM; MLM 34.6
## 6 Lcu.1GRN.Chr1p361407757 1 361407757 Cotyledon_Color FarmCPU 29.5
## 7 Lcu.1GRN.Chr1p27007485 1 27007485 Cotyledon_Color MLMM 23.1
## 8 Lcu.1GRN.Chr4p359784721 4 359784721 Cotyledon_Color GLM; MLM 18.6
## 9 Lcu.1GRN.Chr4p359784737 4 359784737 Cotyledon_Color GLM; MLM 18.3
## 10 Lcu.1GRN.Chr3p212553749 3 212553749 Cotyledon_Color GLM; MLM 16.9
## # ℹ 44 more rows
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot only certain GWAS models
models = c("MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3)
)
# Save
ggsave("Fig_Cotyledon_Color_01_03.png", mp, width = 12, height = 4, bg = "white")
Chr 1 Association
Plot only chromosome 1
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot just Chromosome 1
chrom = 1,
# Plot only certain GWAS models
models = c("MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3),
# Set a max p value
pmax = 50
)
# Save
ggsave("Fig_Cotyledon_Color_01_04.png", mp, width = 12, height = 4, bg = "white")
Create a Zoomed in Plot with gg_Manhattan_Zoom
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot just Chromosome 1
chr = 1,
pos1 = 360000000,
pos2 = 370000000,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3),
#
facet = T
)
# Save
ggsave("Fig_Cotyledon_Color_01_05.png", mp, width = 8, height = 6)
H & B P values
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot only certain GWAS models
models = c("MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3),
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_Cotyledon_Color_01_06.png", mp, width = 12, height = 4, bg = "white")
Marker Plots
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = "Lcu.1GRN.Chr1p365986872",
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_01.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr1p365986872
is convincing.
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = "Lcu.1GRN.Chr1p361840399",
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_02.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr1p361840399
is likely a false
positive.
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = "Lcu.1GRN.Chr1p362336814",
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_03.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr1p362336814
is also likely a
false positive.
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_04.png", mp, width = 6, height = 4)
Volcano Plot
# Plot
mp <- gg_Volcano(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","351Mbp","362Mbp")
)
# Save
ggsave("Fig_Cotyledon_Color_03_01.png", mp, width = 12, height = 4)
Minor Allele Frequency
# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.Cotyledon_Color(NYC).csv") %>%
mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.Cotyledon_Color(NYC).csv") %>%
mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) +
geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)),
color = "red", alpha = 0.7) +
geom_point(color = "darkgreen", alpha = 0.5) +
facet_grid(. ~ Model) +
theme_gwaspr() +
labs(title = "Cotyledon_Color", y = "-log10(p)")
# Save
ggsave("Fig_Cotyledon_Color_03_02.png", mp, width = 10, height = 4)
Bonus
PCA
library(plotly)
library(htmlwidgets)
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA.csv") %>%
left_join(myY, by = c("taxa"="Name"))
DTF Nepal 2017
https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Nepal_2017.html
# Prep data
myColors <- c("darkred", "darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
color = ~DTF_Nepal_2017, colors = myColors) %>%
add_markers() %>%
layout(scene = list(title = "DTF Nepal 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Nepal_2017.html")
mp
DTF Sask 2017
https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Sask_2017.html
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
color = ~DTF_Sask_2017, colors = myColors) %>%
add_markers() %>%
layout(scene = list(title = "DTF Sask 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Sask_2017.html")
mp
Cotyledon Color
https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_Cotyledon_Color.html
# Prep data
myColors <- c("darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
color = ~Cotyledon_Color, colors = myColors) %>%
add_markers() %>%
layout(scene = list(title = "Cotyledon_Color"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_Cotyledon_Color.html")
mp
SNP Density Plot
Package Source: https://github.com/YinLiLin/R-CMplot
library(CMplot)
# Prep data
xx <- myG %>% select(rs, chrom, pos)
# Plot
CMplot(xx, plot.type="d", bin.size=1e7,
chr.den.col=c("darkgreen", "darkgoldenrod3", "darkred"),
file="jpg", file.output=T, verbose=F, width=9, height=6,
file.name = "LDP")
LD Decay
library(genetics)
xG <- read.csv("Data/myG_hmp.csv", header = T)
dna <- data.frame(stringsAsFactors = F,
Symbol = c("A", "C", "G", "T", "U",
"R", "Y", "S", "W", "K", "M", "N"),
Value = c("A/A","C/C","G/G","T/T","U/U",
"A/G","C/T","G/C","A/T","G/T","A/C","N/N") )
xx <- xG[,c(-2,-5,-6,-7,-8,-9,-10,-11)]
for(i in 4:ncol(xx)) {
xx[xx[,i]=="N", i] <- NA
xx[,i] <- plyr::mapvalues(xx[,i], dna$Symbol, dna$Value)
}
#
LD_Decay <- function(x, folder = "LD/", Chr = 1, Num = 200) {
xc <- x %>% filter(chrom == Chr)
xr <- xc %>% column_to_rownames("rs")
xr <- xr[,c(-1,-2)]
rr <- round(runif(Num, 1, nrow(xr)))
while(sum(duplicated(rr))>0) {
ra <- round(runif(sum(duplicated(rr)), 1, nrow(xr)))
rr <- rr[!duplicated(rr)]
rr <- c(rr, ra)
}
rr <- rr[order(rr)]
xr <- xr[rr,]
#
xi <- xr %>% t() %>% as.data.frame()
myLD <- genetics::LD(genetics::makeGenotypes(xi))
save(myLD, file = paste0(folder, "LD_Chrom_", Chr, "_", Num, ".Rdata"))
}
# Calculate LD per chromosome
LD_Decay(x = xx, Chr = 1, Num = 1000)
LD_Decay(x = xx, Chr = 2, Num = 1000)
LD_Decay(x = xx, Chr = 3, Num = 1000)
LD_Decay(x = xx, Chr = 4, Num = 1000)
LD_Decay(x = xx, Chr = 5, Num = 1000)
LD_Decay(x = xx, Chr = 6, Num = 1000)
LD_Decay(x = xx, Chr = 7, Num = 1000)
# Create function
movingAverage <- function(x, n = 5) {
stats::filter(x, rep(1 / n, n), sides = 2)
}
# Prep data
xG <- read.csv("Data/myG_hmp.csv", header = T)
xx <- NULL
for(i in 1:7) {
xc <- xG %>% filter(chrom == i)
load(paste0("LD/LD_Chrom_",i,"_1000.Rdata"))
xi <- myLD$`R^2` %>% as.data.frame() %>%
rownames_to_column("SNP1") %>%
gather(SNP2, LD, 2:ncol(.)) %>%
filter(!is.na(LD)) %>%
mutate(Chr = i,
SNP1_d = plyr::mapvalues(SNP1, xc$rs, xc$pos, warn_missing = F),
SNP2_d = plyr::mapvalues(SNP2, xc$rs, xc$pos, warn_missing = F),
Distance = as.numeric(SNP2_d) - as.numeric(SNP1_d)) %>%
arrange(Distance, rev(LD))
#
xii <- xi %>% filter(Distance < 1000000)
myloess <- stats::loess(LD ~ Distance, data = xii, span=0.50)
#
xi <- xi %>%
mutate(Moving_Avg = movingAverage(LD, n = 100),
Loess_Chr = ifelse(Distance < 1000000, predict(myloess), NA))
#
xx <- bind_rows(xx, xi)
}
#
x1 <- xx %>% group_by(Chr) %>%
summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- xx %>% filter(Loess_Chr < 0.2) %>% group_by(Chr) %>%
summarise(Threshold_0.2 = min(Distance, na.rm = T))
#x3 <- xx %>% left_join(x1, by = "Chr") %>%
# group_by(Chr) %>% filter(Moving_Avg < Mean_LD) %>%
# summarise(Threshold_0.1 = min(Distance, na.rm = T))
myChr <- x1 %>% left_join(x2, by = "Chr") #%>%
#left_join(x3, by = "Chr")
#
xx <- left_join(xx, myChr, by = "Chr") %>%
mutate(Chr = as.factor(Chr))
yy <- xx %>% filter(Distance < 1000000) %>%
arrange(Distance, rev(LD))
myloess <- stats::loess(LD ~ Distance, data = yy, span=0.50)
yy <- yy %>%
mutate(Moving_Avg = movingAverage(LD, n = 100),
Loess_Geno = ifelse(Distance < 1000000, predict(myloess), NA))
#
x1 <- yy %>% summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- yy %>% filter(Loess_Chr < 0.2) %>%
summarise(Threshold_0.2 = min(Distance, na.rm = T))
myGeno <- cross_join(x1, x2)
# Plot first 1 Mbp
mp1 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Geno)) +
geom_point(aes(y = LD), size = 0.2, pch = 15, alpha = 0.25) +
#geom_line(aes(y = Moving_Avg), color = "red", lwd = 1) +
geom_line(aes(y = Moving_Avg), alpha = 0.7) +
geom_line() +
geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
geom_hline(data = myGeno, aes(yintercept = Mean_LD), color = "red", lty = 2) +
geom_vline(data = myGeno, lty = 2, size = 0.3, alpha = 0.8,
aes(xintercept = Threshold_0.2/1000)) +
scale_x_continuous(seq(0, 1000, by = 100), expand = c(0,10)) +
theme_gwaspr(legend.position = "none",
axis.title.y = ggtext::element_markdown()) +
labs(title = "LD of all markers within 1 Mbp", y = "r^2", x = "Kbp")
# Plot first 1 Mbp
mp2 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Chr)) +
geom_line(aes(y = Moving_Avg), alpha = 0.5) +
geom_line() +
geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
facet_wrap(paste("Chr =", Chr) + paste("Threshold =", Threshold_0.2) ~ .,
ncol = 7, scales = "free_x") +
geom_vline(data = myChr, lty = 2, size = 0.3, alpha = 0.8,
aes(xintercept = Threshold_0.2/1000)) +
theme_gwaspr(legend.position = "none",
axis.title.y = ggtext::element_markdown()) +
labs(title = "By Chromosome", y = "r^2", x = "Kbp")
# Plot full chromsomes
mp3 <- ggplot(xx, aes(x = Distance/1000000, y = Moving_Avg)) +
geom_line(size = 0.5, alpha = 0.5) +
geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
facet_wrap(paste("Chr =", Chr) ~ ., ncol = 7, scales = "free_x") +
theme_gwaspr(legend.position = "none",
axis.title.y = ggtext::element_markdown()) +
labs(title = "All Markers", y = "r^2", x = "Mbp")
#yy <- yy %>% filter(Distance < 10000)
# Append
mp <- ggarrange(mp1, mp2, mp3, ncol = 1, nrow = 3)
# Save
ggsave("Fig_06_LD.png", mp , width = 12, height = 10)