Integrative genetic epidemiology with OpenGWAS, OpenCRAVAT, and Bioconductor

Authors: Vincent Carey2,
Last modified: 16 Mar, 2021.

Overview

Description

Along with the topic of your workshop, include how students can expect to spend their time. For the description may also include information about what type of workshop it is (e.g. instructor-led live demo, lab, lecture + lab, etc.). Instructors are strongly recommended to provide completely worked examples for lab sessions, and a set of stand-alone notes that can be read and understood outside of the workshop.

Pre-requisites

List any workshop prerequisites, for example:

  • Basic knowledge of R syntax
  • Familiarity with human SNPs

Participation

Students will use R to acquire variant data from resources like TCGA, select variant annotation resources available in OpenCRAVAT, and produce reports on variant function and impact.

R / Bioconductor packages used

GenomicRanges

Time outline

Activity Time
Concepts of genetic variation 10m
MRC Integrative Epidemiology Unit resources 10m
OpenCRAVAT 10m
Exercises 30m

Workshop goals and objectives

The interpretation of genetic variants is fundamental to all aspects of clinical genetics and genetic epidemiology. MRC OpenGWAS (publication https://doi.org/10.1101/2020.08.10.244293) is a data repository and API suite providing interactive access to statistics and metadata for hundreds of billions of human genetic variants. OpenCRAVAT (web site opencravat.org, publication DOI 10.1200/CCI.19.00132) is a system that amalgamates over 100 variant annotation resources and simplifies the development of rich characterizations of structural and functional contexts of genetic variants. Bioconductor (bioconductor.org) is an ecosystem of data structures and software packages that can be used in many contexts in genome biology and computational biomedicine. The gwaslake workshop adapts Bioconductor programming patterns and flexible containerization to simplify exploration, annotation, and interpretation of OpenGWAS variants assembled on a large collection of cohorts and phenotypes.

Learning goals

In this workshop, we will guide you through the assembly of variants from diverse sources, in diverse formats, for flexible annotation using OpenCRAVAT. Bioconductor data structures and app designs are used to provide high-level conveniences for representation and analysis of cohorts arising in genetic epidemiology and cancer genomics.

  • Import variation data from OpenGWAS and other human cohorts to R
  • Use Bioconductor tools and Rstudio apps to configure and execute annotation task processes defined by OpenCRAVAT
  • Retrieve annotation reports generated by OpenCRAVAT, moving the results forward in interactive workflows addressing epidemiologic and clinical interpretation

Through live demonstrations and interactive small-group exercises, you will learn how to:

Learning objectives

  • Acquire data on human genetic variation for investigation with Bioconductor
  • Select annotation goals and tools from those available in OpenCRAVAT
  • Produce reports on functional and structural impacts of genetic variants using OpenCRAVAT in Bioconductor

The activities undertaken in this workshop will require some familiarity with either jupyter notebooks or Rstudio, but no programming per se will be necessary to take advantage of the workshop material.

Learning Objectives:

  1. Identify the fundamental components of working with human genetic variants in a cloud-native environment, making use of diverse GWAS and PheWAS results

  2. Understand how to define a series of annotation tasks for variant data assembled on individuals or cohorts, and how to use OpenCRAVAT to execute these tasks with a Bioconductor/shiny app

  3. Use the outputs of OpenCRAVAT annotators to interpret effects of genetic variants on disease risk or other concepts of interest in genetic epidemiology.