This is a multi-step overview of Hail for genetic association studies. As of March 2023, there are three vignettes (articles), described in the following subsections.
Tutorial overview
The first vignette follows the Hail 0.2 GWAS tutorial.
We’ll see how to use a very small excerpt from the 1000 genomes study to produce
Along the way, we illustrate and adjust for population stratification:
Larger data problem – 1000 genomes data with T2T reference
We have arranged a serialization of genotypes on chromosome 17 for 3202 1000 genomes samples. Code is provided to explore population stratification with this richer set of genotypes. Exercises investigate sampling loci, manipulation of annotation, and exploratory GWAS.
Working with UK Biobank summary statistics
Some of the code of interest in this vignette needs to be run in Rstudio to take advantage of quarto-based mixing of R and python.
Exercises address interface production to simplify querying of available phenotypes and extraction of findings based on strength of association.