Demo doc template for the Bioconductor 3.12 workshops at Harvard

Authors: Vince Carey², Another Author³.
Last modified: 1 Nov, 2020.

Overview

Description

This document is a technical illustration of how workshop documents are authored and rendered. The content of this document will change completely as the workshop content is specified.

Pre-requisites [should be specd for each workshop]

Basic knowledge of R syntax
Familiarity with Rstudio
Basic understanding of modern genomics. For example, the distinction between whole genome sequencing and RNA-seq should be clear.
Basic familiarity with the concepts of statistical analysis, such as the definition of the t-test for comparing sample means, the interpretation of histograms. An understanding of experimental design is helpful.
Readings:

Participation

Describe how students will be expected to participate in the workshop.

R / Bioconductor packages used

List any R / Bioconductor packages that will be explicitly covered.

Time outline

An example for a 45-minute workshop:

Activity	Time
Brief intro to R/Rstudio	10m
Biological context	10m
Packages to be used	10m
Analytical approach to the question	15m
Simple exercises	10m
Review	5m

Workshop goals and objectives

List “big picture” student-centered workshop goals and learning objectives. Learning goals and objectives are related, but not the same thing. These goals and objectives will help some people to decide whether to attend the conference for training purposes, so please make these as precise and accurate as possible.

Learning goals are high-level descriptions of what participants will learn and be able to do after the workshop is over. Learning objectives, on the other hand, describe in very specific and measurable terms specific skills or knowledge attained. The Bloom’s Taxonomy may be a useful framework for defining and describing your goals and objectives, although there are others.

Learning goals

Some examples:

describe how to…
identify methods for…
understand the difference between…

Learning objectives

analyze xyz data to produce…
create xyz plots
evaluate xyz data for artifacts

Workshop Content

suppressPackageStartupMessages({
  library(TnT)
  library(knitr)
  library(tibble)
  library(biocwk312)
})

Thinking about tables

In the following chunk, we retrieve and render a table produced in Lambert et al. (2018).

lamtab = biocwk312::lamb_main_20201101
kable(lamtab[1:5,])

ID	Name	DBD	Is TF?	TF assessment	Binding mode	Motif status	Notes	Comments	Committee notes	MTW Notes	TRH Notes	SL notes	AJ notes	Disagree on Assessment	Disagree on Binding	Author1	Assesment1	Binding1	Comment1	Notes1	Author2	Assesment2	Binding2	Comment2	Notes2	Vaquerizas 2009 TF classification	CisBP considers it as a TF?	TFclass considers it as a TF?	TF-CAT classification	Is a GO TF	PDB
ENSG00000137203	TFAP2A	AP-2	Yes	Known motif	1 Monomer or homomultimer	High-throughput in vitro	NA	NA	NA	NA	NA	NA	NA	NA	NA	Sam Lambert	Has known motif	1 Monomer or homomultimer	NA	NA	Yimeng Yin	Has known motif	1 Monomer or homomultimer	NA	NA	a	Yes	Yes	TF Gene_DNA-Binding: sequence-specific_DNA Binding; Transactivation_PMIDS:11522791;15475956	Yes	NA
ENSG00000008196	TFAP2B	AP-2	Yes	Known motif	1 Monomer or homomultimer	High-throughput in vitro	NA	NA	NA	NA	NA	NA	NA	NA	NA	Matt Weirauch	Has known motif	1 Monomer or homomultimer	NA	NA	Yimeng Yin	Has known motif	1 Monomer or homomultimer	NA	NA	a	Yes	Yes	TF Gene_DNA-Binding: sequence-specific_DNA Binding; Transactivation_PMIDS:7555706	Yes	NA
ENSG00000087510	TFAP2C	AP-2	Yes	Known motif	1 Monomer or homomultimer	High-throughput in vitro	NA	NA	NA	NA	NA	NA	NA	NA	NA	Matt Weirauch	Has known motif	1 Monomer or homomultimer	NA	NA	Yimeng Yin	Has known motif	1 Monomer or homomultimer	NA	NA	a	Yes	Yes	No	Yes	NA
ENSG00000008197	TFAP2D	AP-2	Yes	Known motif	1 Monomer or homomultimer	In vivo/Misc source	Only known motifs are from Transfac or HocoMoco - origin is uncertain	Binds the same GCCTGAGGC sequence as the other AP-2s (PMID: 24789576)	NA	NA	NA	NA	Binds the same GCCTGAGGC sequence as the other AP-2:s based on PMID: 24789576	Disagree	NA	Arttu Jolma	Likely to be sequence specific TF	1 Monomer or homomultimer	Binds GCCTGAGGC sequence based on PMID: 24789576	NA	Sam Lambert	Has known motif	1 Monomer or homomultimer	Source of Hocomoco motif is unclear	NA	a	Yes	Yes	No	Yes	NA
ENSG00000116819	TFAP2E	AP-2	Yes	Known motif	1 Monomer or homomultimer	High-throughput in vitro	NA	NA	NA	NA	NA	NA	NA	NA	NA	Sam Lambert	Has known motif	1 Monomer or homomultimer	NA	NA	Laura Campitelli	Has known motif	1 Monomer or homomultimer	NA	NA	a	Yes	Yes	TF Gene_DNA-Binding: sequence-specific_DNA Binding_PMIDS:14572467	Yes	NA

When rendered in HTML, the table is searchable, thanks to ‘DT::datatable’.

Note that each transcription factor is accompanied by an Ensembl identifier.
We ‘wrap’ the identifiers to create a hyperlink, and then render using the ‘escape=FALSE’ setting for ‘datatable’.

usw_pref = "https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g="
wr_ens = function(x, pref) {
 paste0("<A href='", pref, x, "'>", x, "</A>")
}
head(wr_ens(lamtab$ID, usw_pref)) # test
#> [1] "<A href='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000137203'>ENSG00000137203</A>"
#> [2] "<A href='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000008196'>ENSG00000008196</A>"
#> [3] "<A href='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000087510'>ENSG00000087510</A>"
#> [4] "<A href='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000008197'>ENSG00000008197</A>"
#> [5] "<A href='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000116819'>ENSG00000116819</A>"
#> [6] "<A href='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000117713'>ENSG00000117713</A>"
lamtab2 = lamtab
lamtab2$ID = wr_ens(lamtab2$ID, usw_pref)
DT::datatable(lamtab2, escape=FALSE)
#> Warning in instance$preRenderHook(instance): It seems your data is too big
#> for client-side DataTables. You may consider server-side processing: https://
#> rstudio.github.io/DT/server.html

This is all taken care of for the user of the TFutils package in the function ‘browse_lambert_main()’, which also includes links for PubMed IDs scattered in Lambert’s published Excel table.

Exercise: How many records mention transcription factor ‘YY1’?

Thinking about visualization

Visualization of genomic data is closely linked to annotation. Annotation can be unwieldy and cumbersome to interrogate. We have collected some slices of genomic data and annotation to help with exploration of new approaches to interactive visualization in reports.

Reference positions of transcripts. We have information on a selection of transcripts on chromosome 17 in the vicinity of gene ORMDL3.

head(biocwk312::txdata_near_ormdl3,3)
#> GRanges object with 3 ranges and 4 metadata columns:
#>                   seqnames            ranges strand |           tx_id
#>                      <Rle>         <IRanges>  <Rle> |     <character>
#>   ENSE00003715979       17 37924458-37924570      + | ENST00000620683
#>   ENSE00003732795       17 37977972-37979038      - | ENST00000618047
#>   ENSE00003734132       17 37978470-37979038      - | ENST00000617125
#>                     gene_name         gene_id     type
#>                   <character>     <character> <factor>
#>   ENSE00003715979     TBC1D3D ENSG00000274419     exon
#>   ENSE00003732795     TBC1D3L ENSG00000274512     exon
#>   ENSE00003734132     TBC1D3C ENSG00000278299     exon
#>   -------
#>   seqinfo: 1 sequence from GRCh38 genome

Positions and annotation of GWAS hits. We used the gwascat package to get a copy of the EBI/EMBL GWAS catalog on 1 November 2020, and limited the information to records pertaining to the interval 38-43 Mb on chr17. We also limited the number of metadata fields on the hits.

names(S4Vectors::mcols(biocwk312::hits_near_ormdl3_trunc10))
#> [1] "DISEASE/TRAIT"             "STRONGEST SNP-RISK ALLELE"
#> [3] "SNPS"                      "RISK ALLELE FREQUENCY"    
#> [5] "DATE ADDED TO CATALOG"     "PUBMEDID"                 
#> [7] "REPORTED GENE(S)"          "PVALUE_MLOG"              
#> [9] "value"
head(biocwk312::hits_near_ormdl3_trunc10, 3)
#> GRanges object with 3 ranges and 9 metadata columns:
#>       seqnames    ranges strand |          DISEASE/TRAIT
#>          <Rle> <IRanges>  <Rle> |            <character>
#>   [1]       17  39766006      * | Primary biliary chol..
#>   [2]       17  39965740      * |                 Asthma
#>   [3]       17  39905943      * |                 Asthma
#>       STRONGEST SNP-RISK ALLELE        SNPS RISK ALLELE FREQUENCY
#>                     <character> <character>             <numeric>
#>   [1]                rs907092-A    rs907092                  0.45
#>   [2]               rs3894194-A   rs3894194                  0.45
#>   [3]               rs2305480-G   rs2305480                  0.55
#>       DATE ADDED TO CATALOG  PUBMEDID REPORTED GENE(S) PVALUE_MLOG     value
#>                      <Date> <numeric>      <character>   <numeric> <numeric>
#>   [1]            2009-06-21  19458352            IKZF3     5.09691   5.09691
#>   [2]            2010-10-14  20860503            GSDMA     8.30103   8.30103
#>   [3]            2010-10-14  20860503            GSDMB     7.00000   7.00000
#>   -------
#>   seqinfo: 24 sequences from GRCh38 genome

An interactive visualization of GWAS hits and nearby genes.

We’ll use the TnT package along with some functions built for the workshop to visualize GWAS hit locations in the context of transcripts.

The display below is interactive. Clicking the mouse or trackpad near the middle will zoom in. A mouse wheel will also control zoom. At greater magnifications, annotations will be displayed, or you can click on a feature to get textual annotation. You can drag the display left or right after magnification. To restore initial state, reload the page in the browser.

# first setup transcripts
trxview = TxTrackFromGRanges(biocwk312::txdata_near_ormdl3, 
      label = "Transcript\n Structure",
      color = "grey2",height = 300)
trxview = reset_tooltip(trxview)
trxview = reset_color(trxview)
trxview = reset_display_label(trxview)
# second setup GWAS hits
hitview = TnT::PinTrack(biocwk312::hits_near_ormdl3_trunc10, color="blue")
# render
TnTGenome(list(hitview, trxview), view.range=GRanges("17", IRanges(39.5e6, 40.5e6)))

Exercise: There is a single GWAS hit on chr17 between positions 39,932,000 and 39,934,000. Use zoom and drag to find it. What is the rs number for the SNP, what is the disease/trait with which this SNP is associated, and what is the risk allele frequency? These can be determined by clicking on the head of the blue pin plotted in the interval given.

References

Lambert, Samuel A., Arttu Jolma, Laura F. Campitelli, Pratyush K. Das, Yimeng Yin, Mihai Albu, Xiaoting Chen, Jussi Taipale, Timothy R. Hughes, and Matthew T. Weirauch. 2018. “The Human Transcription Factors.” Cell 172 (4): 650–65. https://doi.org/10.1016/j.cell.2018.01.029.

stvjc@channing.harvard.edu ↩︎
Channing Division of Network Medicine, Brigham and Women’s Hospital↩︎
Another Institution↩︎

biocwk312 workshops demo

Vince Carey¹

Demo doc template for the Bioconductor 3.12 workshops at Harvard

Overview

Description

Pre-requisites [should be specd for each workshop]

Participation

R / Bioconductor packages used

Time outline

Workshop goals and objectives

Learning goals

Learning objectives

Workshop Content

Thinking about tables

Thinking about visualization

References

biocwk312 workshops demo

Vince Carey1

Demo doc template for the Bioconductor 3.12 workshops at Harvard

Overview

Description

Pre-requisites [should be specd for each workshop]

Participation

R / Bioconductor packages used

Time outline

Workshop goals and objectives

Learning goals

Learning objectives

Workshop Content

Thinking about tables

Thinking about visualization

References

Vince Carey¹