A3 Dictionaries and metadata on COVID-19 resources

Introduction

Data representation often involves various levels of explicitness. A “variable name” is a short token that can be used in programming to refer to some quantity or outcome or event of interest.

A data dictionary maps variable names to more explicit definitions of quantities, outcomes, events, and may include more information on context of measurement.

Example: CDC’s data dictionary on vaccination trends

We have included a version of a CDC data dictionary. It is published as a multisheet Excel workbook.

Code to retrieve the names of the sheets, using excel_sheets:

library(readxl)
pa = system.file("cdc/VACCDataDictionary_v36_12082022.xlsx", package="teachCovidData")
shn = excel_sheets(pa)
shn

##  [1] "0. Notes"                        "1. Vaccinations_US_Jurisdiction"
##  [3] "2. Vaccinations_US_Trends"       "3. Vaccinations_US_Demograp"    
##  [5] "4. Vaccination_Age_Sex_Trends"   "5. Vaccinations_US_County"      
##  [7] "6. Vaccination_CaseTrends_AgeGp" "7. Booster Dose Eligibility"    
##  [9] "8. Primary and Booster Chart"    "9. Jurisdiction Abbreviations"

The first sheet is an overview:

p1 = read_xlsx(pa, 1)
knitr::kable(head(as.data.frame(p1)))

CDC COVID-19 Vaccine Administration and Distribution data	…2	…3
Recent as of 11/17/2022 @ 8:00 AM ET	NA	NA
Historical data available for download:	NA	Associated CDC COVID Data Tracker Site:
COVID-19 Vaccinations in the United States, Jurisdiction	→	Vaccinations in the United States
COVID-19 Vaccination Trends in the United States, National and Jurisdictional	→	Vaccination Trends
COVID-19 Vaccination Age and Sex Trends in the United States, National and Jurisdictional	→	Vaccination Demographic Trends
COVID-19 Vaccinations in the United States, County level	→	Vaccinations by County

We process all sheets using process_datadict:

library(teachCovidData)
pd = process_datadict(pa)

An interactive view of the second sheet:

library(DT)
datatable(pd[[2]])

To survey the dictionary in R, use datadict_app(pd) after performing the calculations above, or with example(datadict_app).

Vincent J. Carey, stvjc at channing.harvard.edu

2023-01-07

Introduction

Example: CDC’s data dictionary on vaccination trends