Data representation often involves various levels of explicitness. A “variable name” is a short token that can be used in programming to refer to some quantity or outcome or event of interest.
A data dictionary maps variable names to more explicit definitions of quantities, outcomes, events, and may include more information on context of measurement.
We have included a version of a CDC data dictionary. It is published as a multisheet Excel workbook.
Code to retrieve the names of the sheets, using excel_sheets
pa = system.file("cdc/VACCDataDictionary_v36_12082022.xlsx", package="teachCovidData")
shn = excel_sheets(pa)
## [1] "0. Notes" "1. Vaccinations_US_Jurisdiction"
## [3] "2. Vaccinations_US_Trends" "3. Vaccinations_US_Demograp"
## [5] "4. Vaccination_Age_Sex_Trends" "5. Vaccinations_US_County"
## [7] "6. Vaccination_CaseTrends_AgeGp" "7. Booster Dose Eligibility"
## [9] "8. Primary and Booster Chart" "9. Jurisdiction Abbreviations"
The first sheet is an overview:
p1 = read_xlsx(pa, 1)
CDC COVID-19 Vaccine Administration and Distribution data | …2 | …3 |
Recent as of 11/17/2022 @ 8:00 AM ET | NA | NA |
Historical data available for download: | NA | Associated CDC COVID Data Tracker Site: |
COVID-19 Vaccinations in the United States, Jurisdiction | → | Vaccinations in the United States |
COVID-19 Vaccination Trends in the United States, National and Jurisdictional | → | Vaccination Trends |
COVID-19 Vaccination Age and Sex Trends in the United States, National and Jurisdictional | → | Vaccination Demographic Trends |
COVID-19 Vaccinations in the United States, County level | → | Vaccinations by County |
We process all sheets using process_datadict
pd = process_datadict(pa)
An interactive view of the second sheet:
To survey the dictionary in R, use datadict_app(pd)
after performing the calculations above, or with example(datadict_app)