vignettes/BiocFHIR.Rmd
BiocFHIR.Rmd
FHIR stands for Fast Health Interoperability Resources.
The Wikipedia article is a useful overview. The official website is fhir.org.
This R package addresses very basic tasks of parsing FHIR R4 documents in JSON format. The overall information model of FHIR documents is complex and various decisions are made to help extract and annotate fields presumed to have high value. Submit github issues if important fields are not being propagated.
Install this package using
BiocManager::install("BiocFHIR")
We use jsonlite::fromJSON
to import a randomly selected FHIR document from a collection simulated by the MITRE corporation. See the associated site for details.
We’ll drill down through the hierarchy of elements collected in a FHIR document with some base R commands, after importing the JSON.
testf = dir(system.file("json", package="BiocFHIR"), full=TRUE)
tt = fromJSON(testf)
names(tt)
## [1] "resourceType" "type" "entry"
tt[1:2]
## $resourceType
## [1] "Bundle"
##
## $type
## [1] "transaction"
tte = tt$entry
class(tte)
## [1] "data.frame"
dim(tte)
## [1] 301 3
## [1] "fullUrl" "resource" "request"
tter = tte$resource
dim(tter)
## [1] 301 72
## [1] "resourceType" "id" "text" "extension" "identifier"
## [6] "name"
table(tter$resourceType)
##
## AllergyIntolerance CarePlan CareTeam
## 8 3 3
## Claim Condition DiagnosticReport
## 46 15 3
## Encounter ExplanationOfBenefit Immunization
## 37 37 10
## MedicationRequest Observation Organization
## 9 114 3
## Patient Practitioner Procedure
## 1 3 9
It is by filtering the data frame tter
that we acquire information that may be useful in data analysis. The data frame is sparse: many fields are not used in many records. Code in this package attempts to produce useful tables from the sparse information.
As a prologue to table extraction, we do some basic decomposition of tter
using process_fhir_bundle
.
bu1 = process_fhir_bundle(testf) # just give file path
bu1
## BiocFHIR FHIR.bundle instance.
## resource types are:
## AllergyIntolerance CarePlan ... Patient Procedure
bu1
is just a list of data.frames, but with considerable nesting of data.frames and lists within the basic data.frames corresponding to the major FHIR concepts. “Flattening” of such structures is not fully automatic.
We use process_Condition
to extract information.
cond1 = process_Condition(bu1$Condition)
datatable(cond1)
We have collected 50 documents from the synthea resource. These were obtained using random draws from the 1180 records provided. A temporary folder holding them can be produced as follows:
tset = make_test_json_set()
tset[1]
## [1] "/tmp/Rtmp08RF2S/jsontest/Angel97_Swift555_c072e6ad-b03f-4eee-abe0-2dbc93bbadfe.json"
We import ten documents into a list.
myl = lapply(tset[1:10], process_fhir_bundle)
myl[1:2]
## [[1]]
## BiocFHIR FHIR.bundle instance.
## resource types are:
## AllergyIntolerance CarePlan ... Patient Procedure
##
## [[2]]
## BiocFHIR FHIR.bundle instance.
## resource types are:
## CarePlan Claim ... Patient Procedure
sapply(myl,length)
## [1] 10 9 7 9 9 9 9 9 9 10
We see with the last command that documents can have different numbers of components present.
## R version 4.3.1 Patched (2023-07-22 r84743)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /home/stvjc/R-430-dist/lib/R/lib/libRblas.so
## LAPACK: /home/stvjc/R-430-dist/lib/R/lib/libRlapack.so; LAPACK version 3.11.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] jsonlite_1.8.7 DT_0.28 BiocFHIR_1.1.4 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.3.0 sass_0.4.7 utf8_1.2.3
## [4] generics_0.1.3 stringi_1.7.12 digest_0.6.33
## [7] magrittr_2.0.3 evaluate_0.21 bookdown_0.35
## [10] fastmap_1.1.1 rprojroot_2.0.3 graph_1.78.0
## [13] promises_1.2.1 BiocManager_1.30.22 purrr_1.0.2
## [16] fansi_1.0.4 crosstalk_1.2.0 textshaping_0.3.6
## [19] jquerylib_0.1.4 cli_3.6.1 shiny_1.7.5
## [22] rlang_1.1.1 visNetwork_2.1.2 ellipsis_0.3.2
## [25] cachem_1.0.8 yaml_2.3.7 BiocBaseUtils_1.2.0
## [28] tools_4.3.1 memoise_2.0.1 dplyr_1.1.2
## [31] httpuv_1.6.11 BiocGenerics_0.46.0 vctrs_0.6.3
## [34] R6_2.5.1 mime_0.12 stats4_4.3.1
## [37] lifecycle_1.0.3 stringr_1.5.0 htmlwidgets_1.6.2
## [40] fs_1.6.3 ragg_1.2.5 pkgconfig_2.0.3
## [43] desc_1.4.2 pkgdown_2.0.7 pillar_1.9.0
## [46] bslib_0.5.1 later_1.3.1 glue_1.6.2
## [49] Rcpp_1.0.11 systemfonts_1.0.4 xfun_0.40
## [52] tibble_3.2.1 tidyselect_1.2.0 knitr_1.43
## [55] xtable_1.8-4 htmltools_0.5.6 rmarkdown_2.24
## [58] compiler_4.3.1