{"id":27081073,"url":"https://dylanpieper.github.io/hellmer/","last_synced_at":"2025-04-06T02:06:50.288Z","repository":{"id":276838963,"uuid":"930424859","full_name":"dylanpieper/hellmer","owner":"dylanpieper","description":"Batch Processing for Chat Models","archived":false,"fork":false,"pushed_at":"2025-03-21T16:52:40.000Z","size":5176,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T22:51:13.707Z","etag":null,"topics":["batch","batch-processing","ellmer","llm","package","r"],"latest_commit_sha":null,"homepage":"https://dylanpieper.github.io/hellmer/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dylanpieper.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-10T16:03:08.000Z","updated_at":"2025-03-21T23:44:20.000Z","dependencies_parsed_at":"2025-03-19T14:41:35.407Z","dependency_job_id":null,"html_url":"https://github.com/dylanpieper/hellmer","commit_stats":null,"previous_names":["dylanpieper/hellmer"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fhellmer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fhellmer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fhellmer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fhellmer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dylanpieper","download_url":"https://codeload.github.com/dylanpieper/hellmer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247423512,"owners_count":20936626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch","batch-processing","ellmer","llm","package","r"],"created_at":"2025-04-06T02:01:44.636Z","updated_at":"2025-04-06T02:06:50.269Z","avatar_url":"https://github.com/dylanpieper.png","language":"R","readme":"# hellmer \u003cimg src=\"man/figures/hellmer-hex.png\" align=\"right\" width=\"140\"/\u003e\n\n[![CRAN status](https://www.r-pkg.org/badges/version/hellmer)](https://cran.r-pkg.org/package=hellmer) [![R-CMD-check](https://github.com/dylanpieper/hellmer/actions/workflows/testthat.yml/badge.svg)](https://github.com/dylanpieper/hellmer/actions/workflows/testthat.yml)\n\nEnable sequential and parallel batch processing for [chat models](https://ellmer.tidyverse.org/reference/index.html#chatbots) supported by `ellmer`.\n\n## Features\n\nProcess multiple chat interactions with:\n\n-   [Tooling](https://ellmer.tidyverse.org/articles/tool-calling.html) and [structured data extraction](https://ellmer.tidyverse.org/articles/structured-data.html)\n-   Judgments (i.e., thinking or reasoning) for structured data refinement\n-   Progress tracking and recovery\n-   Automatic retry with backoff\n-   Sound notifications\n\n## Installation\n\nYou can install the package from CRAN with:\n\n``` r\ninstall.packages(\"hellmer\")\n```\n\n## Setup API Keys\n\nAPI keys allow access to chat models are are stored as environmental variables. I recommend the `usethis` package to setup API keys in your `.Renviron` such as `OPENAI_API_KEY=your-key`.\n\n``` r\nusethis::edit_r_environ(scope = c(\"user\", \"project\"))\n```\n\n## Basic Usage\n\n### Sequential Processing\n\nSequential processing uses the current R process to call one chat at a time and save the data to the disk.\n\n``` r\nlibrary(hellmer)\n\nchat \u003c- chat_sequential(chat_openai(system_prompt = \"Reply concisely, one sentence\"))\n\nprompts \u003c- list(\n  \"What is R?\",\n  \"Explain base R versus tidyverse\"\n)\n\nbatch \u003c- chat$batch(prompts)\n```\n\nAccess the batch results:\n\n``` r\nbatch$progress()\n#\u003e $total_prompts\n#\u003e [1] 2\n#\u003e \n#\u003e $completed_prompts\n#\u003e [1] 2\n#\u003e \n#\u003e $completion_percentage\n#\u003e [1] 100\n#\u003e \n#\u003e $remaining_prompts\n#\u003e [1] 0\n#\u003e \n#\u003e $state_path\n#\u003e [1] \"/var/folders/.../chat_c5383b1279ae.rds\"\n\nbatch$texts()\n#\u003e [[1]]\n#\u003e [1] \"R is a programming language and software environment primarily used for \n#\u003e statistical computing and data analysis.\"\n#\u003e \n#\u003e [[2]]\n#\u003e [1] \"Base R refers to the R language's core packages and functionalities, \n#\u003e whereas Tidyverse is a collection of R packages designed for data science \n#\u003e that provides a more intuitive and consistent syntax.\"\n\nbatch$chats()\n#\u003e [[1]]\n#\u003e \u003cChat OpenAI/gpt-4o turns=3 tokens=22/18\u003e\n#\u003e ── system [0] ───────────────────────────────────────────────────────────────\n#\u003e Reply concisely, one sentence\n#\u003e ── user [22] ────────────────────────────────────────────────────────────────\n#\u003e What is R?\n#\u003e ── assistant [18] ───────────────────────────────────────────────────────────\n#\u003e R is a programming language and software environment primarily used for\n#\u003e statistical computing and data analysis.\n\n#\u003e [[2]]\n#\u003e \u003cChat OpenAI/gpt-4o turns=3 tokens=24/37\u003e\n#\u003e ── system [0] ───────────────────────────────────────────────────────────────\n#\u003e Reply concisely, one sentence\n#\u003e ── user [24] ────────────────────────────────────────────────────────────────\n#\u003e Explain base R versus tidyverse\n#\u003e ── assistant [37] ───────────────────────────────────────────────────────────\n#\u003e Base R refers to the R language's core packages and functionalities, whereas \n#\u003e Tidyverse is a collection of R packages designed for data science \n#\u003e that provides a more intuitive and consistent syntax.\n```\n\n### Parallel Processing\n\nParallel processing spins up multiple R processes, or parallel workers, to chat at the same time.\n\nBy default, the upper limit for number of `workers` = `parallel::detectCores()`, and the number of prompts to process at a time is `chunk_size` = `parallel::detectCores() * 5`. Each chat in a chunk is distributed across the available R processes. When a chunk is finished, the data is saved to the disk.\n\n``` r\nchat \u003c- chat_future(chat_openai(system_prompt = \"Reply concisely, one sentence\"))\n```\n\nFor maximum performance, set `chunk_size` to the number of prompts, which is \\~4-5x faster. However, progress will not be saved to the disk until all chats are processed.\n\n``` r\nbatch \u003c- chat$batch(\n  prompts, \n  chunk_size = length(prompts)\n)\n```\n\n## Features\n\n### Tooling\n\nRegister and use tools/function calling:\n\n``` r\nget_current_time \u003c- function(tz = \"UTC\") {\n  format(Sys.time(), tz = tz, usetz = TRUE)\n}\n\nchat$register_tool(tool(\n  get_current_time,\n  \"Gets the current time in the given time zone.\",\n  tz = type_string(\n    \"The time zone to get the current time in. Defaults to `\\\"UTC\\\"`.\",\n    required = FALSE\n  )\n))\n\nprompts \u003c- list(\n  \"What time is it in Chicago?\",\n  \"What time is it in New York?\"\n)\n\nbatch \u003c- chat$batch(prompts)\n\nbatch$texts()\n#\u003e [[1]]\n#\u003e [1] \"The current time in Chicago is 9:29 AM CDT.\"\n#\u003e \n#\u003e [[2]]\n#\u003e [1] \"The current time in New York is 10:29 AM EDT.\"\n```\n\n### Structured Data Extraction\n\nExtract structured data using type specifications:\n\n``` r\ntype_sentiment \u003c- type_object(\n  \"Extract sentiment scores\",\n  positive_score = type_number(\"Positive sentiment score, 0.00 to 1.00\"),\n  negative_score = type_number(\"Negative sentiment score, 0.00 to 1.00\"),\n  neutral_score = type_number(\"Neutral sentiment score, 0.00 to 1.00\")\n)\n\nprompts \u003c- list(\n  \"The R community is really supportive and welcoming.\",\n  \"R has both base functions and tidyverse functions for data manipulation.\",\n  \"R's object-oriented system is confusing, inconsistent, and painful to use.\"\n)\n\nbatch \u003c- chat$batch(prompts, type_spec = type_sentiment)\n\nbatch$texts()\n#\u003e [[1]]\n#\u003e $positive_score\n#\u003e [1] 0.95\n#\u003e \n#\u003e $negative_score\n#\u003e [1] 0.05\n#\u003e \n#\u003e $neutral_score\n#\u003e [1] 0\n#\u003e ...\n```\n\nTo ask the chat model to evaluate and refine structured data extractions, implement iterative thinking or reasoning into the turns of the chat using the `judgements` parameter (increases token use):\n\n``` r\nbatch \u003c- chat$batch(prompts, type_spec = type_sentiment, judgements = 1)\n\nbatch$texts()\n#\u003e [[1]]\n#\u003e [[1]]$positive_score\n#\u003e [1] 0.95\n#\u003e \n#\u003e [[1]]$negative_score\n#\u003e [1] 0\n#\u003e \n#\u003e [[1]]$neutral_score\n#\u003e [1] 0.05\n#\u003e ...\n```\n\n![Console output of LLM streaming the evaluation and refinement of the structured data extractions using `progress` = `FALSE` and `echo` = `TRUE`.](man/figures/judgements.gif)\n\n### Progress Tracking and Recovery\n\nBatch processing automatically saves progress to an `.rds` file on the disk and allows you to resume interrupted operations:\n\n``` r\nbatch \u003c- chat$batch(prompts, state_path = \"chat_state.rds\")\nbatch$progress()\n```\n\nIf `state_path` is not defined, a temporary file will be created by default.\n\n### Automatic Retry\n\nAutomatically retry failed requests with exponential backoff, which acts as a wide guardrail against temporary API errors. `ellmer` uses `httr2` to act as a narrow guardrail against specific API errors and limits with most chat provider functions defaulting to retry one time.\n\nBe aware that this retry is a brute force approach, and as long as all other validation passes, the retry will persist. However, it will stop if it detects an authorization or API key issue.\n\n``` r\nbatch \u003c- chat$batch(\n  prompts = prompts,   # list or vector of prompts\n  max_retries = 3,     # maximum retry attempts\n  initial_delay = 20,  # initial delay in seconds\n  max_delay = 80,      # maximum delay between retries\n  backoff_factor = 2   # multiply delay by this factor after each retry\n)\n```\n\n### Sound Notifications\n\nToggle sound notifications on batch completion, interruption, and error:\n\n``` r\nchat \u003c- chat_sequential(\n  chat_openai,\n  beep = TRUE\n)\n```\n\n### Echoing\n\nBy default, the chat `echo` is set to `FALSE` to show a progress bar. However, you can still configure `echo` in the `$batch` call by first setting `progress` to `FALSE`:\n\n``` r\nbatch \u003c- chat$batch(prompts, progress = FALSE, echo = \"all\")\n#\u003e \u003e What is R?\n#\u003e \u003c R is a programming language and software environment used for statistical computing,\n#\u003e \u003c data analysis, and graphical representation.\n#\u003e \u003c \n#\u003e \u003e Explain base R versus tidyverse\n#\u003e \u003c Base R refers to the functions and paradigms built into the R language, while\n#\u003e \u003c tidyverse is a collection of R packages designed for data science, emphasizing \n#\u003e \u003c a more consistent and human-readable syntax for data manipulation.\n#\u003e \u003c \n```\n\n### Methods\n\n-   `progress()`: Returns processing status\n-   `texts()`: Returns response texts in the same format as the input prompts (i.e., a list if prompts were provided as a list, or a character vector if prompts were provided as a vector). When a type specification is provided, it returns structured data instead of plain text.\n-   `chats()`: Returns a list of chat objects\n\n## Further Reading\n\n-   [Using Ellmer Chat Models](https://dylanpieper.github.io/hellmer/articles/using-chat-models.html)\n","funding_links":[],"categories":["Ellmer-Verse"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/dylanpieper.github.io%2Fhellmer%2F","html_url":"https://awesome.ecosyste.ms/projects/dylanpieper.github.io%2Fhellmer%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/dylanpieper.github.io%2Fhellmer%2F/lists"}