Use the extract_data facility defined in ellmer's doc to obtain summary information about textual content. Originally tailored to vignettes in bioconductor; it is newly generalized to handle any pdf, html or text in URL.
Source:R/vig2data.R
vig2data.RdUse the extract_data facility defined in ellmer's doc to obtain summary information about textual content. Originally tailored to vignettes in bioconductor; it is newly generalized to handle any pdf, html or text in URL.
Usage
vig2data(
url = "https://bioconductor.org/packages/release/bioc/html/Voyager.html",
maxnchar = 30000,
n_pdf_pages = 10,
model = "claude-sonnet-4-5",
provider = "anthropic",
...
)Arguments
- url
character(1) URL for an html bioconductor vignettes
- maxnchar
numeric(1) text is truncated to a substring with this length
- n_pdf_pages
numeric(1) maximum number of pages to extract text from for pdf vignettes
- model
character(1) model identifier for the selected provider; defaults to "claude-sonnet-4-5" (Anthropic)
- provider
character(1) LLM provider; see
llm_env_varfor supported values and the required environment variable for each. Defaults to "anthropic".- ...
passed to the underlying
chat_*function viallm_chat
Note
Based on code from https://cran.r-project.org/web/packages/ellmer/vignettes/structured-data.html
March 15 2025. The API key for the chosen provider must be available in the corresponding
environment variable (see llm_env_var for the mapping).
Examples
if (interactive()) {
# ANTHROPIC_API_KEY must be set for the default provider
tst = vig2data()
str(tst)
}