use the extract_data facility defined in ellmer's doc to obtain summary information about an html document, tailored to vignettes in bioconductor

Usage

vig2data(
  url = "https://bioconductor.org/packages/release/bioc/html/Voyager.html",
  maxnchar = 30000,
  n_pdf_pages = 10
)

Arguments

url: character(1) URL for an html bioconductor vignettes
maxnchar: numeric(1) text is truncated to a substring with this length
n_pdf_pages: numeric(1) maximum number of pages to extract text from for pdf vignettes

Value

a list with components author, topics, focused, coherence, and persuasion

Note

Based on code from https://cran.r-project.org/web/packages/ellmer/vignettes/structured-data.html March 15 2025. Requires that OPENAI_API_KEY is available in environment.

Examples

if (interactive()) {
# be sure OPENAI_API_KEY is available to Sys.getenv
tst = vig2data()
str(tst)
}