{"id":48467042,"url":"https://github.com/justi/ruby_llm-contract","last_synced_at":"2026-04-26T18:01:46.790Z","repository":{"id":346040140,"uuid":"1187564021","full_name":"justi/ruby_llm-contract","owner":"justi","description":"Handle LLM output variance for ruby_llm — retry on malformed JSON or rule violations, escalate to a smarter model, measure variance on datasets, gate CI on regressions.","archived":false,"fork":false,"pushed_at":"2026-04-21T15:12:09.000Z","size":628,"stargazers_count":25,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-21T17:18:59.079Z","etag":null,"topics":["ai","anthropic","cost-tracking","eval","json-schema","llm","model-comparison","openai","prompt-engineering","rails","regression-testing","retry","ruby","ruby-llm","structured-output","validation"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/justi.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-20T21:50:17.000Z","updated_at":"2026-04-21T15:15:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/justi/ruby_llm-contract","commit_stats":null,"previous_names":["justi/ruby_llm-contract"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/justi/ruby_llm-contract","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justi%2Fruby_llm-contract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justi%2Fruby_llm-contract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justi%2Fruby_llm-contract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justi%2Fruby_llm-contract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/justi","download_url":"https://codeload.github.com/justi/ruby_llm-contract/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justi%2Fruby_llm-contract/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32305043,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T09:34:17.070Z","status":"ssl_error","status_checked_at":"2026-04-26T09:34:00.993Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","anthropic","cost-tracking","eval","json-schema","llm","model-comparison","openai","prompt-engineering","rails","regression-testing","retry","ruby","ruby-llm","structured-output","validation"],"created_at":"2026-04-07T05:00:42.136Z","updated_at":"2026-04-26T18:01:46.784Z","avatar_url":"https://github.com/justi.png","language":"Ruby","readme":"# ruby_llm-contract\n\n**Contracts + Evals for [ruby_llm](https://github.com/crmne/ruby_llm).**\n\nYour eval passed. Prod broke anyway? This gem wraps `RubyLLM::Chat` with input/output contracts, business-rule validation, retry with model escalation on validation failure, pre-flight cost ceilings, and an evaluation framework — so a flaky cheap-model call escalates to a stronger model instead of shipping garbage to your user.\n\n`ruby_llm` handles the HTTP side (rate limits, timeouts, streaming, tool calls, embeddings). This gem handles what the model *returned*: schema validation, business rules, model escalation on failed validation, datasets, regression tests.\n\n## Install\n\n```ruby\ngem \"ruby_llm-contract\"\n```\n\n```ruby\nRubyLLM.configure { |c| c.openai_api_key = ENV[\"OPENAI_API_KEY\"] }\nRubyLLM::Contract.configure { |c| c.default_model = \"gpt-4.1-mini\" }\n```\n\nWorks with any `ruby_llm` provider (OpenAI, Anthropic, Gemini, etc).\n\n## Do I need this?\n\nUse this if LLM output affects production behaviour, money, user trust, or downstream code. You probably don't need it if you have one low-risk prompt, manually inspect every result, or only generate best-effort prose.\n\nAlready using structured outputs from your provider? This gem adds business-rule validation, retry with model fallback, evals, regression gating, and test stubs on top of them — the layer that stops schema-valid-but-wrong output from reaching users. See [Why contracts?](docs/guide/why.md) for the four production failure modes the gem exists for, or run `ruby examples/01_fallback_showcase.rb` to see the fallback loop in 30 seconds (no API key needed).\n\n## Example\n\nA Rails app takes article text extracted from a user-submitted URL and wants to show a summary card: a short TL;DR, 3–5 key takeaways, and a tone label. The output has to fit the UI (TL;DR under 200 chars) and the schema has to be strict enough to render without conditionals.\n\n```ruby\nclass SummarizeArticle \u003c RubyLLM::Contract::Step::Base\n  prompt \u003c\u003c~PROMPT\n    Summarize this article for a UI card. Return a short TL;DR,\n    3 to 5 key takeaways, and a tone label.\n\n    {input}\n  PROMPT\n\n  output_schema do\n    string :tldr\n    array  :takeaways, of: :string, min_items: 3, max_items: 5\n    string :tone, enum: %w[neutral positive negative analytical]\n  end\n\n  validate(\"TL;DR fits the card\")  { |o, _| o[:tldr].length \u003c= 200 }\n  validate(\"takeaways are unique\") { |o, _| o[:takeaways].uniq.size == o[:takeaways].size }\n\n  retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]\nend\n\nresult = SummarizeArticle.run(article_text)\nresult.parsed_output    # =\u003e { tldr: \"...\", takeaways: [...], tone: \"analytical\" }\nresult.trace[:model]    # =\u003e \"gpt-4.1-nano\"  (first model that passed)\nresult.trace[:cost]     # =\u003e 0.000032\n```\n\nThe model returns JSON matching the schema. If the response is malformed, the TL;DR overflows the card, or the takeaway count is off, the gem retries — moving to the next model in `models:` only when the cheaper one can't satisfy the rules. In this setup cheaper models are tried first and the expensive ones are used only when cheaper models fail.\n\nYou could write this loop yourself once. The gem gives you the loop, a trace of every attempt (model, status, cost, latency), fallback policy, evals, baselines, and CI checks as one contract object — tracked per-step so adding a new LLM feature to your app is one class, not one-off scaffolding.\n\n## Most useful next\n\nEverything below is optional — the example above is a complete step. Reach for these when one step isn't enough.\n\n- **[CI regression gates](docs/guide/getting_started.md)** — `define_eval` + `save_baseline!` + `pass_eval(...).without_regressions` blocks CI when accuracy drops on a model update or prompt tweak.\n- **[Find the cheapest viable fallback list](docs/guide/optimizing_retry_policy.md)** — `Step.recommend(\"regression\", candidates: [...], min_score: 0.95)` returns the cheapest list of models that still passes your evals. `production_mode:` measures retry-aware cost.\n- **[A/B test prompts](docs/guide/eval_first.md)** — `SummarizeArticleV2.compare_with(SummarizeArticleV1, eval: \"regression\")` reports whether the new prompt is safe to ship.\n- **[Budget caps](docs/guide/getting_started.md)** — `max_cost`, `max_input`, `max_output` refuse the request before calling the API when a heuristic estimate (~±30% accuracy) exceeds the limit.\n- **[Reasoning effort / thinking config](docs/guide/optimizing_retry_policy.md)** — `thinking effort: :low` (or alias `reasoning_effort :low`) on the Step class; mirrors `RubyLLM::Agent.thinking` and forwards through `Chat#with_thinking`.\n\nAlso supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and `retry_policy attempts: N` for niche cases (we measured this empirically — for `gpt-4.1-nano` / `gpt-5-nano` on tasks with clear correctness criteria, same-model retry rarely helps; `escalate(model_2)` is the strategy that moves the needle, see [optimizing_retry_policy.md](docs/guide/optimizing_retry_policy.md)).\n\n## Relation to `RubyLLM::Agent`\n\n`Step::Base` and `RubyLLM::Agent` (since RubyLLM 1.12) are **siblings** targeting the same niche: reusable, class-based prompts. Both call into `RubyLLM::Chat` directly — Step does not wrap Agent. Step adds the contract layer: `validate` (business invariants), `retry_policy escalate(...)` (model escalation on validation failure), `max_cost` pre-flight refusal, evaluation framework, pipeline composition. **[Full feature mapping →](docs/guide/relation_to_agent.md)**\n\n## Docs\n\n**New here?** Read in order: this README → [Why contracts?](docs/guide/why.md) → [Getting Started](docs/guide/getting_started.md).\n\n| Guide | What it does for your app |\n|-------|---------------------------|\n| [Why contracts?](docs/guide/why.md) | Recognise the four production failures the gem exists for |\n| [Relation to RubyLLM::Agent](docs/guide/relation_to_agent.md) | Sibling abstractions; what each adds; runtime call path; coexistence patterns |\n| [Getting Started](docs/guide/getting_started.md) | Walk the full feature set on one concrete step |\n| [Rails integration](docs/guide/rails_integration.md) | Directory, initializer, jobs, logging, specs, CI gate — 7 FAQs for Rails devs |\n| [Adopt in an existing Rails app](docs/guide/migration.md) | Replace raw `LlmClient.call` with a contract, Before/After |\n| [Prevent silent prompt regressions](docs/guide/eval_first.md) | Evals, baselines, CI gates that block quality drift |\n| [Control retry cost and fallback behaviour](docs/guide/optimizing_retry_policy.md) | Find the cheapest viable fallback list empirically |\n| [Write validate rules that catch real bugs](docs/guide/best_practices.md) | Patterns for cross-input checks and content-quality rules |\n| [Stub LLM calls in tests](docs/guide/testing.md) | Deterministic specs, RSpec + Minitest matchers |\n| [Chain LLM calls into a pipeline](docs/guide/pipeline.md) | Multi-step with fail-fast and per-step models |\n| [Schema DSL reference](docs/guide/output_schema.md) | Every constraint, nested objects, pattern table |\n| [Prompt DSL reference](docs/guide/prompt_ast.md) | `system` / `rule` / `section` / `example` / `user` nodes |\n\n## Roadmap\n\nLatest: **v0.8.0** — tagline + narrative repositioning around \"Contracts + Evals for RubyLLM\", `thinking` / `reasoning_effort` class macro, TokenEstimator labelled as heuristic, CostCalculator repositioned. See [CHANGELOG](CHANGELOG.md) for history.\n\n## License\n\nMIT\n","funding_links":[],"categories":["Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjusti%2Fruby_llm-contract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjusti%2Fruby_llm-contract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjusti%2Fruby_llm-contract/lists"}