Skip to contents

Use the extract_data facility defined in ellmer's doc to obtain summary information about textual content. Originally tailored to vignettes in bioconductor; it is newly generalized to handle any pdf, html or text in URL.

Usage

vig2data(
  url = "https://bioconductor.org/packages/release/bioc/html/Voyager.html",
  maxnchar = 30000,
  n_pdf_pages = 10,
  model = "claude-sonnet-4-5",
  provider = "anthropic",
  ...
)

Arguments

url

character(1) URL for an html bioconductor vignettes

maxnchar

numeric(1) text is truncated to a substring with this length

n_pdf_pages

numeric(1) maximum number of pages to extract text from for pdf vignettes

model

character(1) model identifier for the selected provider; defaults to "claude-sonnet-4-5" (Anthropic)

provider

character(1) LLM provider; see llm_env_var for supported values and the required environment variable for each. Defaults to "anthropic".

...

passed to the underlying chat_* function via llm_chat

Value

a list with components author, topics, focused, coherence, and persuasion

Note

Based on code from https://cran.r-project.org/web/packages/ellmer/vignettes/structured-data.html March 15 2025. The API key for the chosen provider must be available in the corresponding environment variable (see llm_env_var for the mapping).

Examples

if (interactive()) {
# ANTHROPIC_API_KEY must be set for the default provider
tst = vig2data()
str(tst)
}