GWAS Tutorial with gwaspr
An R tutorial on running genome-wide association studies (GWAS) with GAPIT and gwaspr
Introduction
Resources
This vignette is a tutorial on running Genome-Wide
Association Studies (GWAS) in
with the packages GAPIT
and plotting the results with gwaspr
.
- GAPIT Website: https://www.maizegenetics.net/gapit
- GAPIT Manual: http://www.zzlab.net/GAPIT/gapit_help_document.pdf
- GAPIT github: https://github.com/jiabowang/GAPIT
- GAPIT Issues: https://github.com/jiabowang/GAPIT/issues
- gwaspr Website: https://derekmichaelwright.github.io/gwaspr/
Data
Mandatory
Optional
myY.csv
: Phenotype filemyG_hmp.csv
: Genotype file in hapmap formatmyCV.csv
: (Optional) Covariate filemyM.csv
: (Optional) Metadata file for the genotypes.
All phenotype and genotype data is derived from and available at: https://knowpulse.usask.ca/
Associated papers:
For this tutorial, we will use a Lentil Diversity Panel (LDP) consisting of 324 Lens culinaris cultivars, representative of the global genetic diversity of cultivated lentils.
Install GAPIT
Basic GWAS R Code
Load your Data
# Read in phenotype data
myY <- read.csv("Data/myY.csv")
# Read in genotype data (note: header = F)
myG <- read.csv("Data/myG_hmp.csv", header = F)
Run the GWAS
# Custom GWAS
myGAPIT <- GAPIT( Y = myY,
G = myG,
CV = myCV,
KI = myKI,
PCA.total = 4,
model = c("MLM", "BLINK")
)
GWAS Models
GLM
General Linear Model: uses population structure (Q) as a cofactor.MLM
Mixed Linear Model: adds Kinship (K) as a random effect. AKA a Q+K model.CMLM
Compressed Mixed Linear Model: reduces the confounding effect between the testing markers and the individuals genetic effects by replacing them with their corresponding groups based on cluster analysis defined by the user. MLM is the equivelent of each individual being its own group, and GLM is the equivlent of having only one group for all individuals.MLMM
Multiple Locus Mixed Linear Model: uses forward-backward stepwise linear mixed-model regression to include associated markers as covariates.SUPER
Settlement of MLM Under Progressively Exclusive Relationship: uses a FaST-LMM algorithm to solve a computational burden for large samples. K is derived from a smaller number of associated markers.FarmCPU
Fixed and random model Circulating Probability Unification: uses iterations to fit associated markers selected using a maximum likelihood method as cofactors.BLINK
BLINK: similar to FarmCPU but eliminates the assumption of causal genes are evenly distributed across the genome. Markers that are in LD with the most significant marker are excluded. Also uses a different method to determine markers for K (BIC).
Data Preparation
The code above will produce errors and not run properly. This is due to the input data not being properly formatted and will be a common occurance. Lets explore why and try to fix these errors.
First install gwaspr
package which contains
tidyverse
and ggplot
packages for plotting the
GAPIT results later.
Phenotype Data
Our population (The LDP) has been phenotyped for three traits:
- Cotyledon_Color: a qualitative trait describing cotyledon color (Red = 0, Yellow = 1).
- DTF_Nepal_2017: a quantitative trait describing days from sowing to flowering in a 2017 Nepal field trial.
- DTF_Sask_2017: a quantitative trait describing days from sowing to flowering in a 2017 Saskatchewan field trial.
First we must convert our qualitative trait
Cotyledon_Color from character
to
numeric
format.
Red
= 0
Yellow
= 1
Green
= NA
missing data
= NA
# Convert qualitative trait to numeric
myY <- myY %>%
mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c("Red", "Yellow"), 0:1),
Cotyledon_Color = as.numeric(Cotyledon_Color))
Now we can visualize our data.
# Prep data
xx <- myY %>%
gather(Trait, Value, 2:ncol(.)) %>%
mutate(Trait = factor(Trait, levels = unique(Trait)))
# Plot
mp <- ggplot(xx, aes(x = Value)) +
geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) +
facet_wrap(. ~ Trait, ncol = 3, scales = "free") +
theme_gwaspr() +
labs(title = "Phenotype Data")
# Save
ggsave("Fig_01_myY_Phenotypes.png", mp, width = 8, height = 3)
Genotype Data
Our population, the LDP, was genotyped using an exome caputure
array and filtered with the following criteria to yield a data set
of 336,367 SNPs, stored in our object myG
.
- Only Bi-allelic = TRUE
- Minimum Read Depth = 3
- Minor Allele Frequency = >5%
- Maximum Missing Frequency = 25%
- Heterozygous = <25%
The gwaspr
package is built for genotype data in the
hapmap format. Although GAPIT
can also be
run with numeric formatted genotype data, when using hapmap formatted
genotype data, it is important to set header = F
. This will
make the column names the first row of your myG
object, as
demonstrated below:
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 1 rs alleles chrom pos strand assembly center protLSID assayLSID panel QCcode
## 2 Lcu.1GRN.Chr1p853882 A/G 1 853882 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 Lcu.1GRN.Chr1p854117 G/A 1 854117 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 Lcu.1GRN.Chr1p854159 A/C 1 854159 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 Lcu.1GRN.Chr1p854174 C/G 1 854174 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 Lcu.1GRN.Chr1p870836 A/G 1 870836 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 7 Lcu.1GRN.Chr1p870898 C/T 1 870898 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 8 Lcu.1GRN.Chr1p870903 G/A 1 870903 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 9 Lcu.1GRN.Chr1p871108 A/G 1 871108 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 10 Lcu.1GRN.Chr1p872492 T/C 1 872492 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## V12 V13 V14 V15 V16 V17
## 1 X3156.11_AGL CDC_Asterix_AGL CDC_Cherie_AGL CDC_Glamis_AGL CDC_Gold_AGL CDC_Greenstar_AGL
## 2 G A G A G A
## 3 G G G A G G
## 4 C C C C C A
## 5 G G G G G C
## 6 G G N G G A
## 7 T C N C T C
## 8 G G N A G G
## 9 A A A A A A
## 10 T T T T T N
Covariates + Kinship
GAPIT will automatically calculate kinship matrices and performs PCA
on the genotype data to be used as a covariate to account for population
structure. However, custom Kinship matrices and covariates can be used
as well. Just make sure they are formatted similar to the output files
GAPIT
uses for covariates and kinship.
GWAS_Results/GAPIT.Genotype.PCA.csv
GWAS_Results/GAPIT.Genotype.Kin_Zhang.csv
For more information on my CV see the following papers:
# Plot
mp <- ggplot(myCV, aes(x = b_coef * 10000)) +
geom_histogram(color = "black", fill = "darkgreen", alpha = 0.7) +
theme_gwaspr() +
labs(title = "Covariate Data")
# Save
ggsave("Fig_03_myCV_Covariates.png", mp, width = 5, height = 3)
Common Errors
There are a number of common errors one might encounter when attempting to do GWAS with GAPIT. We will go through a few of these:
Incorect data uploading
myG <- read.csv("Genotype_Data.csv", header = T)
when
it should be header = F
Notice how the first row of our genotype data is actually the column names.
Numeric Phenotype Data
For our qualitative trait Cotyledon_Color, we needed to convert
Red
to0
Yellow
to1
Genotype Names
Genotype names not matching between the phenotype file and genotype
file is an easy error to miss. R
does not like column names
that begin with a number. E.g., the genotype
3156-11_AGL
should have its named changed to
X3156-11_AGL
. Another common problem can come from dashses
(-
) and periods (.
), E.g..
CDC_QG.1_AGL
or CDC_QG-1_AGL
, or spaces
() and underscores (
_
), E.g.,
CDC Asterix AGL
or CDC_Asterix_AGL
.
## [1] "CDC Asterix AGL" "3156-11_AGL" "CDC_QG-1_AGL"
## [1] "rs" "alleles" "chrom" "pos" "strand" "assembly" "center" "protLSID"
## [9] "assayLSID" "panel" "QCcode" "X3156.11_AGL" "CDC_Asterix_AGL" "CDC_QG.1_AGL"
# Change names of select individuals
myY <- myY %>%
mutate(Name = plyr::mapvalues(Name,
c("3156-11_AGL", "CDC Asterix AGL", "CDC_QG-1_AGL"),
c("X3156.11_AGL", "CDC_Asterix_AGL", "CDC_QG.1_AGL")))
## character(0)
## [1] "rs" "alleles" "chrom" "pos" "strand" "assembly" "center" "protLSID" "assayLSID" "panel" "QCcode"
Run GWAS with GAPIT
Run GWAS
For our purposes we will run GWAS with the following models:
GLM
, MLM
, MLMM
,
FarmCPU
, BLINK
.
Y
= phenotype file
G
= genotype file
CV
= covariate file
setwd("GWAS_Results/")
# Custom GWAS
GAPIT( Y = myY,
G = myG,
PCA.total = 4,
model = c("GLM", "MLM", "MLMM", "FarmCPU", "BLINK"),
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
PCA
Why set PCA.total = 4
?
Principle components of the marker data are used to account for population structure. By plotting our eigenvalues by the number of principle components we can use the “kink method” to choose how many PCs to use in our GWAS. From PC4 to PC5, there is little change in the eigenvalues, meaning that additional PCs will not account for more of the variance within our data (i.e., they are unnecessary).
In our case, we are using an output (the PCA of our marker data) from
the GWAS to make this decision. If this is your first time running a
GWAS, setting PCA.total = 3
is good practice, as your GWAS
can be easily rerun if you later determine you need to adjust
settings.
# Prep data
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA_eigenvalues.csv") %>%
mutate(PC = 1:324, `Eigen Values` = x,
fill = ifelse(PC == 4, "PC", ""),
`Percent of Variation` = 100 * `Eigen Values` / sum(`Eigen Values`) ) %>%
slice(1:10)
# Plot
mp <- ggplot(xx, aes(x = PC, y = `Percent of Variation`)) +
geom_line(linewidth = 1, alpha = 0.7) +
geom_point(aes(fill = fill), size = 3, pch = 21) +
scale_fill_manual(values = c("darkgreen", "darkred")) +
scale_x_continuous(breaks = 1:10) +
theme_gwaspr(legend.position = "none") +
labs(title = "PCA determination")
# Save
ggsave("Fig_04_PCA_Eigen_Values.png", width = 6, height = 4)
User-inputted Kinship and Covariates
We will rerun GWAS on the DTF traits using the b coefficient from a photothermal model derived from Wright et al. (2020). This should act to remove or diminish temperature related associations, and thus emphasize photoperiod related associations.
# Prep data
myY2 <- myY %>% select(Name, DTF_Sask_2017_b=DTF_Sask_2017)
# Run GWAS with covariate
GAPIT( Y = myY2, G = myG, CV = myCV,
model = c("MLM", "FarmCPU", "BLINK", "MLMM", "GLM"),
PCA.total = 0,
Random.model = F, # Optional: use if GAPIT returns an error
Phenotype.View = F # Optional: use if GAPIT returns an error
)
Post GWAS Preparation
After running a genome-wide association study, there are some quality control and post-GWAS steps we can take to evaluate our results and identify false positives.
Prepare Data
Convert our Cotyledon_Color
back to its character
values.
myY <- myY %>%
mutate(Cotyledon_Color = plyr::mapvalues(Cotyledon_Color, c(0,1), c("Red","Yellow")))
Reread in our myG
file but with
header = T
.
Check which traits and models have ran with is.ran()
## Trait MLM MLMM FarmCPU BLINK GLM CMLM SUPER
## 1 Cotyledon_Color X X X X X NA NA
## 2 DTF_Nepal_2017 X X X X X NA NA
## 3 DTF_Sask_2017 X X X X X NA NA
## 4 DTF_Sask_2017_b X X X X X NA NA
Check if the results files are ordered by p.value with
is_ordered()
. If not this could cause problems
## Trait MLM MLMM FarmCPU BLINK GLM CMLM SUPER
## 1 Cotyledon_Color X X X X X NA NA
## 2 DTF_Nepal_2017 X X X X X NA NA
## 3 DTF_Sask_2017 X X X X X NA NA
## 4 DTF_Sask_2017_b X X X X X NA NA
If they are not ordered, they can be with the
order_GWAS_Results()
function
Create a summary table of all significant associations
# Create summary of GWAS results
xx <- table_GWAS_Results(folder = "GWAS_Results/", threshold = 6.8, sug.threshold = 5.3)
# Save
write.csv(xx, "Supplemtnal_Table_01.csv", row.names = F)
Significance threshold
- suggestive association = \(p<5*10^{-6}=0.000005\) -> -log10(0.000005) = 5.3
- bonferroni threshold = \(p<5*10^{-8}=0.00000005\) -> -log10(0.00000005) = 7.3
- custom threshold = \(p<0.05/n(markers)=0.05/336367\) -> -log10(0.05/336367) = 6.8
Calculate significance threshold
## [1] 1.30103
## [1] 5.30103
## [1] 7.30103
## [1] 6.827842
## [1] 7.526812
DTF Nepal 2017
Since Days To Flower (DTF) is a quantitative trait, we can expect many smaller effect QTLs.
Manhattan Plots
Plot models together (default)
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017"
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_01.png", mp, width = 12, height = 4, bg = "white")
Facet each model
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Facet each model
facet = T
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_02.png", mp, width = 12, height = 12, bg = "white")
Now lets customize the manhattan plot with some selected markers
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "DTF_Nepal_2017"
)
## # A tibble: 42 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr2p44545877 2 44545877 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM 40.1
## 2 Lcu.1GRN.Chr2p44546658 2 44546658 DTF_Nepal_2017 GLM; MLM 39.8
## 3 Lcu.1GRN.Chr2p44558948 2 44558948 DTF_Nepal_2017 GLM; MLM 39.5
## 4 Lcu.1GRN.Chr3p389298069 3 389298069 DTF_Nepal_2017 GLM; MLM 34.8
## 5 Lcu.1GRN.Chr3p389318049 3 389318049 DTF_Nepal_2017 GLM; MLM 34.8
## 6 Lcu.1GRN.Chr3p389319560 3 389319560 DTF_Nepal_2017 GLM; MLM 34.8
## 7 Lcu.1GRN.Chr5p1658484 5 1658484 DTF_Nepal_2017 BLINK; FarmCPU; GLM; MLM; MLMM 31.7
## 8 Lcu.1GRN.Chr5p73165919 5 73165919 DTF_Nepal_2017 GLM 26.6
## 9 Lcu.1GRN.Chr5p73166268 5 73166268 DTF_Nepal_2017 GLM 26.6
## 10 Lcu.1GRN.Chr4p497141127 4 497141127 DTF_Nepal_2017 GLM 24.7
## # ℹ 32 more rows
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","blue"),
# Specify number of legend rows
legend.rows = 1
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_03.png", mp, width = 12, height = 4, bg = "white")
Chr 2 Association
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot just Chromosome 2
chrom = 2,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = "Lcu.1GRN.Chr2p44545877",
# Create alt labels for the markers
labels = "44Mbp",
# Specify Color for each marker vline
vline.colors = "red"
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_04.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot just Chromosome 2
chr = 2,
pos1 = 40000000,
pos2 = 50000000,
# Highlight specific markers
markers = "Lcu.1GRN.Chr2p44545877",
# Create alt labels for the markers
labels = "44545877",
# Specify Color for each marker vline
vline.color = "red",
#
facet = T
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_05.png", mp, width = 6, height = 6)
Chr 5 Association
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot just Chromosome 5
chrom = 5,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = "Lcu.1GRN.Chr5p1658484",
# Create alt labels for the markers
labels = "16Mbp",
# Specify Color for each marker vline
vline.colors = "blue"
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_06.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Highlight specific markers
markers = "Lcu.1GRN.Chr5p1658484",
# Create alt labels for the markers
labels = "1658484",
# Plot just Chromosome 5
chr = 5,
pos1 = 1000000,
pos2 = 2000000,
# Specify Color for each marker vline
vline.color = "red",
#
facet = T
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_07.png", mp, width = 6, height = 6)
H & B P Values
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","blue"),
# Specify number of legend rows
legend.rows = 1,
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_DTF_Nepal_2017_01_08.png", mp, width = 12, height = 4, bg = "white")
Markers
2 strong association peaks can be identified on chromosomes 2 and 5.
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Select markers to plot
markers = "Lcu.1GRN.Chr2p44545877",
# Specify colors for alleles
marker.colors = c("darkgoldenrod3","darkgreen")
)
# Save
ggsave("Fig_DTF_Nepal_2017_02_01.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr2p44545877
is associated with
early flowering in Nepal.
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Select markers to plot
markers = "Lcu.1GRN.Chr5p1658484",
# Specify colors for alleles
marker.colors = c("darkgoldenrod3","darkgreen")
)
# Save
ggsave("Fig_DTF_Nepal_2017_02_02.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr5p1658484
is associated with
early flowering in Nepal.
Now lets plot both markers
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Select markers to plot
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3","darkgoldenrod2","darkgreen")
)
# Save
ggsave("Fig_DTF_Nepal_2017_02_03.png", mp, width = 6, height = 4)
Volcano Plot
# Plot
mp <- gg_Volcano(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Nepal_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484"),
# Create alt labels for the markers
labels = c("44545877","1658484")
)
# Save
ggsave("Fig_DTF_Nepal_2017_03_01.png", mp, width = 12, height = 4)
Minor Allele Frequency
# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Nepal_2017.csv") %>%
mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Nepal_2017.csv") %>%
mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) +
geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)),
color = "red", alpha = 0.7) +
geom_point(color = "darkgreen", alpha = 0.5) +
facet_grid(. ~ Model) +
theme_gwaspr() +
labs(title = "DTF_Nepal_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Nepal_2017_03_02.png", mp, width = 10, height = 4)
DTF Sask 2017
Saskatchewan has drastically different temperature and photoperiods than Nepal, so we should expect different associations.In addition, we ran this trait twice, with and without the use of a covariate (the b coefficient from a photothermal model)
Manhattan Plots
Plot models together (default)
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017"
)
# Save
ggsave("Fig_DTF_Sask_2017_01_01.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b"
)
# Save
ggsave("Fig_DTF_Sask_2017_01_02.png", mp, width = 12, height = 4, bg = "white")
Facet each model
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Facet each model
facet = T
)
# Save
ggsave("Fig_DTF_Sask_2017_01_03.png", mp, width = 12, height = 14, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Facet each model
facet = T
)
ggsave("Fig_DTF_Sask_2017_01_04.png", mp, width = 12, height = 14, bg = "white")
Now lets customize the manhattan plot with some selected markers
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "DTF_Sask_2017"
)
## # A tibble: 48 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr2p45088100 2 45088100 DTF_Sask_2017 BLINK; FarmCPU 15.5
## 2 Lcu.1GRN.Chr2p44558948 2 44558948 DTF_Sask_2017 GLM 14.7
## 3 Lcu.1GRN.Chr3p180388915 3 180388915 DTF_Sask_2017 GLM 14.0
## 4 Lcu.1GRN.Chr2p44546658 2 44546658 DTF_Sask_2017 GLM 13.7
## 5 Lcu.1GRN.Chr3p180388113 3 180388113 DTF_Sask_2017 GLM 13.5
## 6 Lcu.1GRN.Chr3p180387971 3 180387971 DTF_Sask_2017 GLM 13.4
## 7 Lcu.1GRN.Chr2p86072752 2 86072752 DTF_Sask_2017 BLINK 13.1
## 8 Lcu.1GRN.Chr2p44545877 2 44545877 DTF_Sask_2017 GLM 13.0
## 9 Lcu.1GRN.Chr1p437359352 1 437359352 DTF_Sask_2017 BLINK; FarmCPU; GLM 12.7
## 10 Lcu.1GRN.Chr1p437368329 1 437368329 DTF_Sask_2017 GLM 12.7
## # ℹ 38 more rows
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "DTF_Sask_2017_b",
# Select only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK")
)
## # A tibble: 30 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr6p3269280 6 3269280 DTF_Sask_2017_b BLINK; FarmCPU; MLM; MLMM 32.9
## 2 Lcu.1GRN.Chr1p437359352 1 437359352 DTF_Sask_2017_b BLINK; FarmCPU 17.5
## 3 Lcu.1GRN.Chr3p239208186 3 239208186 DTF_Sask_2017_b BLINK 14.0
## 4 Lcu.1GRN.Chr6p324049462 6 324049462 DTF_Sask_2017_b BLINK 13.1
## 5 Lcu.1GRN.Chr1p374734708 1 374734708 DTF_Sask_2017_b BLINK 12.7
## 6 Lcu.1GRN.Chr2p572729078 2 572729078 DTF_Sask_2017_b BLINK 12.4
## 7 Lcu.1GRN.Chr2p48311792 2 48311792 DTF_Sask_2017_b BLINK 11.7
## 8 Lcu.1GRN.Chr3p433371307 3 433371307 DTF_Sask_2017_b BLINK; MLMM 11.6
## 9 Lcu.1GRN.Chr7p186572874 7 186572874 DTF_Sask_2017_b BLINK 11.5
## 10 Lcu.1GRN.Chr7p23361774 7 23361774 DTF_Sask_2017_b BLINK 11.0
## # ℹ 20 more rows
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue")
)
# Save
ggsave("Fig_DTF_Sask_2017_01_05.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue")
)
# Save
ggsave("Fig_DTF_Sask_2017_01_06.png", mp, width = 12, height = 4, bg = "white")
Chr 6 Association
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Plot just Chromosome 6
chrom = 6,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = "Lcu.1GRN.Chr6p3269280",
# Create alt labels for the markers
labels = "32Mbp",
# Specify Color for each marker vline
vline.colors = "blue"
)
# Save
ggsave("Fig_DTF_Sask_2017_01_07.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Highlight specific markers
markers = "Lcu.1GRN.Chr6p3269280",
# Create alt labels for the markers
labels = "3269280",
# Specify Color for each marker vline
vline.color = "blue",
# Plot just Chromosome 2
chr = 6,
pos1 = 0,
pos2 = 6000000,
#
facet = T
)
# Save
ggsave("Fig_DTF_Sask_2017_01_08.png", mp, width = 6, height = 6)
H & B P values
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue"),
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_DTF_Sask_2017_01_09.png", mp, width = 12, height = 4, bg = "white")
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017_b",
# Specify a title
title = "DTF_Sask_2017 | CV = b",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280"),
# Create alt labels for the markers
labels = c("44Mbp","16Mbp","32Mbp"),
# Specify Color for each marker vline
vline.colors = c("red","red","blue"),
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_DTF_Sask_2017_01_10.png", mp, width = 12, height = 4, bg = "white")
Markers
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Sask_2017",
# Select markers to plot
markers = "Lcu.1GRN.Chr6p3269280"
)
# Save
ggsave("Fig_DTF_Sask_2017_02_01.png", mp, width = 6, height = 4)
# Plot
mp <- gg_Marker_Box(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "DTF_Sask_2017",
# Select markers to plot
markers = c("Lcu.1GRN.Chr2p44545877",
"Lcu.1GRN.Chr5p1658484",
"Lcu.1GRN.Chr6p3269280")
)
# Save
ggsave("Fig_DTF_Sask_2017_02_02.png", mp, width = 6, height = 4)
Volcano Plot
# Plot
mp <- gg_Volcano(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "DTF_Sask_2017",
# Highlight specific markers
markers = "Lcu.2RBY.Chr6p2528817",
# Create alt labels for the markers
labels = "Chr6p2528817"
)
# Save
ggsave("Fig_DTF_Sask_2017_03_01.png", mp, width = 12, height = 4)
Minor Allele Frequency
# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.DTF_Sask_2017.csv") %>%
mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.DTF_Sask_2017.csv") %>%
mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) +
geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)),
color = "red", alpha = 0.7) +
geom_point(color = "darkgreen", alpha = 0.5) +
facet_grid(. ~ Model) +
theme_gwaspr() +
labs(title = "DTF_Sask_2017", y = "-log10(p)")
# Save
ggsave("Fig_DTF_Sask_2017_03_02.png", mp, width = 10, height = 4)
Cotyledon Color
Since this is a simpler qualitative trait, we can assume there should be 1 large effect QTL.
Manhattan Plots
Plot models together (default)
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color"
)
# Save
ggsave("Fig_Cotyledon_Color_01_01.png", mp, width = 12, height = 4, bg = "white")
Facet each model
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Facet each model
facet = T
)
# Save
ggsave("Fig_Cotyledon_Color_01_02.png", mp, width = 12, height = 12, bg = "white")
Now lets customize the manhattan plot with some selected markers
list_Top_Markers(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
traits = "Cotyledon_Color"
)
## # A tibble: 55 × 6
## SNP Chr Pos Traits Models Max_LogP
## <chr> <int> <int> <chr> <chr> <dbl>
## 1 Lcu.1GRN.Chr1p365986872 1 365986872 Cotyledon_Color BLINK; FarmCPU; MLM; MLMM 157.
## 2 Lcu.1GRN.Chr1p361840399 1 361840399 Cotyledon_Color BLINK; FarmCPU; GLM 55.3
## 3 Lcu.1GRN.Chr1p382208787 1 382208787 Cotyledon_Color GLM 33.8
## 4 Lcu.1GRN.Chr1p361910110 1 361910110 Cotyledon_Color MLMM 33.1
## 5 Lcu.1GRN.Chr1p361856257 1 361856257 Cotyledon_Color GLM; MLMM 32.2
## 6 Lcu.1GRN.Chr4p359784721 4 359784721 Cotyledon_Color GLM; MLM 29.8
## 7 Lcu.1GRN.Chr1p365318023 1 365318023 Cotyledon_Color MLM 26.6
## 8 Lcu.1GRN.Chr1p365318027 1 365318027 Cotyledon_Color MLM 26.3
## 9 Lcu.1GRN.Chr6p390278241 6 390278241 Cotyledon_Color GLM 23.0
## 10 Lcu.1GRN.Chr4p244472485 4 244472485 Cotyledon_Color GLM 22.7
## # ℹ 45 more rows
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot only certain GWAS models
models = c("MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3)
)
# Save
ggsave("Fig_Cotyledon_Color_01_03.png", mp, width = 12, height = 4, bg = "white")
Chr 1 Association
Plot only chromosome 1
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot just Chromosome 1
chrom = 1,
# Plot only certain GWAS models
models = c("MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3),
# Set a max p value
pmax = 50
)
# Save
ggsave("Fig_Cotyledon_Color_01_04.png", mp, width = 12, height = 4, bg = "white")
Create a Zoomed in Plot with gg_Manhattan_Zoom
# Plot
mp <- gg_Manhattan_Zoom(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot just Chromosome 1
chr = 1,
pos1 = 360000000,
pos2 = 370000000,
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3),
#
facet = T
)
# Save
ggsave("Fig_Cotyledon_Color_01_05.png", mp, width = 8, height = 6)
H & B P values
# Plot
mp <- gg_Manhattan(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot only certain GWAS models
models = c("MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","361Mbp","362Mbp"),
# Specify Color for each marker vline
vline.colors = rep("green",3),
# Plot adjusted p values
plotHBPvalues = T,
# Select threshold for adjusted p values
threshold = 1.3
)
# Save
ggsave("Fig_Cotyledon_Color_01_06.png", mp, width = 12, height = 4, bg = "white")
Marker Plots
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = "Lcu.1GRN.Chr1p365986872",
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_01.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr1p365986872
is convincing.
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = "Lcu.1GRN.Chr1p361840399",
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_02.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr1p361840399
is likely a false
positive.
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = "Lcu.1GRN.Chr1p362336814",
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_03.png", mp, width = 6, height = 4)
The marker Lcu.1GRN.Chr1p362336814
is also likely a
false positive.
# Plot
mp <- gg_Marker_Pie(
# Genotype data
xG = myG,
# Phenotype data
xY = myY,
# Select a trait to plot
trait = "Cotyledon_Color",
# Select markers to plot
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Specify colors for alleles
marker.colors = c("darkred","darkgoldenrod3")
)
# Save
ggsave("Fig_Cotyledon_Color_02_04.png", mp, width = 6, height = 4)
Volcano Plot
# Plot
mp <- gg_Volcano(
# Specify a folder with GWAS results
folder = "GWAS_Results/",
# Select a trait to plot
trait = "Cotyledon_Color",
# Plot only certain GWAS models
models = c("MLM","MLMM","FarmCPU","BLINK"),
# Highlight specific markers
markers = c("Lcu.1GRN.Chr1p365986872",
"Lcu.1GRN.Chr1p361840399",
"Lcu.1GRN.Chr1p362336814"),
# Create alt labels for the markers
labels = c("365Mbp","351Mbp","362Mbp")
)
# Save
ggsave("Fig_Cotyledon_Color_03_01.png", mp, width = 12, height = 4)
Minor Allele Frequency
# Prep data
x1 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.MLM.Cotyledon_Color.csv") %>%
mutate(Model = "MLM")
x2 <- read.csv("GWAS_Results/GAPIT.Association.GWAS_Results.BLINK.Cotyledon_Color.csv") %>%
mutate(Model = "BLINK")
xx <- bind_rows(x1, x2) %>% mutate(Model = factor(Model, levels = c("MLM","BLINK")))
# Plot
mp <- ggplot(xx, aes(x = MAF, y = -log10(P.value), pch = as.factor(Chr))) +
geom_hline(yintercept = -log10(0.05 / (nrow(myG)-1)),
color = "red", alpha = 0.7) +
geom_point(color = "darkgreen", alpha = 0.5) +
facet_grid(. ~ Model) +
theme_gwaspr() +
labs(title = "Cotyledon_Color", y = "-log10(p)")
# Save
ggsave("Fig_Cotyledon_Color_03_02.png", mp, width = 10, height = 4)
Bonus
PCA
library(plotly)
library(htmlwidgets)
xx <- read.csv("GWAS_Results/GAPIT.Genotype.PCA.csv") %>%
left_join(myY, by = c("taxa"="Name"))
DTF Nepal 2017
https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Nepal_2017.html
# Prep data
myColors <- c("darkred", "darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
color = ~DTF_Nepal_2017, colors = myColors) %>%
add_markers() %>%
layout(scene = list(title = "DTF Nepal 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Nepal_2017.html")
mp
DTF Sask 2017
https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_DTF_Sask_2017.html
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
color = ~DTF_Sask_2017, colors = myColors) %>%
add_markers() %>%
layout(scene = list(title = "DTF Sask 2017"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_DTF_Sask_2017.html")
mp
Cotyledon Color
https://derekmichaelwright.github.io/dblogr/academic/gwas_tutorial/Fig_PCA_Cotyledon_Color.html
# Prep data
myColors <- c("darkgoldenrod3", "darkblue")
# Plot
mp <- plot_ly(xx, x = ~PC1, y = ~PC2, z = ~PC3, text = ~taxa,
color = ~Cotyledon_Color, colors = myColors) %>%
add_markers() %>%
layout(scene = list(title = "Cotyledon_Color"))
# Save
saveWidget(as_widget(mp), "Fig_PCA_Cotyledon_Color.html")
mp
SNP Density Plot
Package Source: https://github.com/YinLiLin/R-CMplot
library(CMplot)
# Prep data
xx <- myG %>% select(rs, chrom, pos)
# Plot
CMplot(xx, plot.type="d", bin.size=1e7,
chr.den.col=c("darkgreen", "darkgoldenrod3", "darkred"),
file="jpg", file.output=T, verbose=F, width=9, height=6,
file.name = "LDP")
LD Decay
library(genetics)
xG <- read.csv("Data/myG_hmp.csv", header = T)
dna <- data.frame(stringsAsFactors = F,
Symbol = c("A", "C", "G", "T", "U",
"R", "Y", "S", "W", "K", "M", "N"),
Value = c("A/A","C/C","G/G","T/T","U/U",
"A/G","C/T","G/C","A/T","G/T","A/C","N/N") )
xx <- xG[,c(-2,-5,-6,-7,-8,-9,-10,-11)]
for(i in 4:ncol(xx)) {
xx[xx[,i]=="N", i] <- NA
xx[,i] <- plyr::mapvalues(xx[,i], dna$Symbol, dna$Value)
}
#
LD_Decay <- function(x, folder = "LD/", Chr = 1, Num = 200) {
xc <- x %>% filter(chrom == Chr)
xr <- xc %>% column_to_rownames("rs")
xr <- xr[,c(-1,-2)]
rr <- round(runif(Num, 1, nrow(xr)))
while(sum(duplicated(rr))>0) {
ra <- round(runif(sum(duplicated(rr)), 1, nrow(xr)))
rr <- rr[!duplicated(rr)]
rr <- c(rr, ra)
}
rr <- rr[order(rr)]
xr <- xr[rr,]
#
xi <- xr %>% t() %>% as.data.frame()
myLD <- genetics::LD(genetics::makeGenotypes(xi))
save(myLD, file = paste0(folder, "LD_Chrom_", Chr, "_", Num, ".Rdata"))
}
# Calculate LD per chromosome
LD_Decay(x = xx, Chr = 1, Num = 1000)
LD_Decay(x = xx, Chr = 2, Num = 1000)
LD_Decay(x = xx, Chr = 3, Num = 1000)
LD_Decay(x = xx, Chr = 4, Num = 1000)
LD_Decay(x = xx, Chr = 5, Num = 1000)
LD_Decay(x = xx, Chr = 6, Num = 1000)
LD_Decay(x = xx, Chr = 7, Num = 1000)
# Create function
movingAverage <- function(x, n = 5) {
stats::filter(x, rep(1 / n, n), sides = 2)
}
# Prep data
xG <- read.csv("Data/myG_hmp.csv", header = T)
xx <- NULL
for(i in 1:7) {
xc <- xG %>% filter(chrom == i)
load(paste0("LD/LD_Chrom_",i,"_1000.Rdata"))
xi <- myLD$`R^2` %>% as.data.frame() %>%
rownames_to_column("SNP1") %>%
gather(SNP2, LD, 2:ncol(.)) %>%
filter(!is.na(LD)) %>%
mutate(Chr = i,
SNP1_d = plyr::mapvalues(SNP1, xc$rs, xc$pos, warn_missing = F),
SNP2_d = plyr::mapvalues(SNP2, xc$rs, xc$pos, warn_missing = F),
Distance = as.numeric(SNP2_d) - as.numeric(SNP1_d)) %>%
arrange(Distance, rev(LD))
#
xii <- xi %>% filter(Distance < 1000000)
myloess <- stats::loess(LD ~ Distance, data = xii, span=0.50)
#
xi <- xi %>%
mutate(Moving_Avg = movingAverage(LD, n = 100),
Loess_Chr = ifelse(Distance < 1000000, predict(myloess), NA))
#
xx <- bind_rows(xx, xi)
}
#
x1 <- xx %>% group_by(Chr) %>%
summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- xx %>% filter(Loess_Chr < 0.2) %>% group_by(Chr) %>%
summarise(Threshold_0.2 = min(Distance, na.rm = T))
#x3 <- xx %>% left_join(x1, by = "Chr") %>%
# group_by(Chr) %>% filter(Moving_Avg < Mean_LD) %>%
# summarise(Threshold_0.1 = min(Distance, na.rm = T))
myChr <- x1 %>% left_join(x2, by = "Chr") #%>%
#left_join(x3, by = "Chr")
#
xx <- left_join(xx, myChr, by = "Chr") %>%
mutate(Chr = as.factor(Chr))
yy <- xx %>% filter(Distance < 1000000) %>%
arrange(Distance, rev(LD))
myloess <- stats::loess(LD ~ Distance, data = yy, span=0.50)
yy <- yy %>%
mutate(Moving_Avg = movingAverage(LD, n = 100),
Loess_Geno = ifelse(Distance < 1000000, predict(myloess), NA))
#
x1 <- yy %>% summarise(Mean_LD = mean(Moving_Avg, na.rm = T))
x2 <- yy %>% filter(Loess_Chr < 0.2) %>%
summarise(Threshold_0.2 = min(Distance, na.rm = T))
myGeno <- cross_join(x1, x2)
# Plot first 1 Mbp
mp1 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Geno)) +
geom_point(aes(y = LD), size = 0.2, pch = 15, alpha = 0.25) +
#geom_line(aes(y = Moving_Avg), color = "red", lwd = 1) +
geom_line(aes(y = Moving_Avg), alpha = 0.7) +
geom_line() +
geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
geom_hline(data = myGeno, aes(yintercept = Mean_LD), color = "red", lty = 2) +
geom_vline(data = myGeno, lty = 2, size = 0.3, alpha = 0.8,
aes(xintercept = Threshold_0.2/1000)) +
scale_x_continuous(seq(0, 1000, by = 100), expand = c(0,10)) +
theme_gwaspr(legend.position = "none",
axis.title.y = ggtext::element_markdown()) +
labs(title = "LD of all markers within 1 Mbp", y = "r^2", x = "Kbp")
# Plot first 1 Mbp
mp2 <- ggplot(yy, aes(x = Distance/1000, y = Loess_Chr)) +
geom_line(aes(y = Moving_Avg), alpha = 0.5) +
geom_line() +
geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
facet_wrap(paste("Chr =", Chr) + paste("Threshold =", Threshold_0.2) ~ .,
ncol = 7, scales = "free_x") +
geom_vline(data = myChr, lty = 2, size = 0.3, alpha = 0.8,
aes(xintercept = Threshold_0.2/1000)) +
theme_gwaspr(legend.position = "none",
axis.title.y = ggtext::element_markdown()) +
labs(title = "By Chromosome", y = "r^2", x = "Kbp")
# Plot full chromsomes
mp3 <- ggplot(xx, aes(x = Distance/1000000, y = Moving_Avg)) +
geom_line(size = 0.5, alpha = 0.5) +
geom_hline(data = myChr, aes(yintercept = Mean_LD), color = "red", lty = 2) +
geom_hline(yintercept = 0.2, color = "blue", lty = 2) +
scale_y_continuous(breaks = seq(0, 0.5, by = 0.1), limits = c(0,0.5)) +
facet_wrap(paste("Chr =", Chr) ~ ., ncol = 7, scales = "free_x") +
theme_gwaspr(legend.position = "none",
axis.title.y = ggtext::element_markdown()) +
labs(title = "All Markers", y = "r^2", x = "Mbp")
#yy <- yy %>% filter(Distance < 10000)
# Append
mp <- ggarrange(mp1, mp2, mp3, ncol = 1, nrow = 3)
# Save
ggsave("Fig_06_LD.png", mp , width = 12, height = 10)