{"id":33900990,"url":"https://github.com/ycastorium/lextract","last_synced_at":"2026-01-13T13:18:42.561Z","repository":{"id":321604287,"uuid":"1086478734","full_name":"ycastorium/lextract","owner":"ycastorium","description":"LLM-powered text extraction library for Elixir","archived":false,"fork":false,"pushed_at":"2025-11-04T17:29:11.000Z","size":124,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-01-11T16:09:47.344Z","etag":null,"topics":["elixir","llm","nlp"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ycastorium.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"custom":"ycastor.eth"}},"created_at":"2025-10-30T13:27:50.000Z","updated_at":"2025-11-11T15:59:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"8071fe38-02b2-4c7d-94ea-bbb93a5ea6bd","html_url":"https://github.com/ycastorium/lextract","commit_stats":null,"previous_names":["ygorcastor/lextract","ycastorium/lextract"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ycastorium/lextract","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ycastorium%2Flextract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ycastorium%2Flextract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ycastorium%2Flextract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ycastorium%2Flextract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ycastorium","download_url":"https://codeload.github.com/ycastorium/lextract/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ycastorium%2Flextract/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28386084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-13T12:01:30.995Z","status":"ssl_error","status_checked_at":"2026-01-13T12:00:09.625Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir","llm","nlp"],"created_at":"2025-12-11T23:37:09.938Z","updated_at":"2026-01-13T13:18:42.555Z","avatar_url":"https://github.com/ycastorium.png","language":"Elixir","funding_links":["ycastor.eth"],"categories":[],"sub_categories":[],"readme":"# LeXtract\n\n[![Hex](https://img.shields.io/hexpm/v/lextract?style=flat-square)](https://hex.pm/packages/lextract) [![Coverage Status](https://coveralls.io/repos/github/YgorCastor/lextract/badge.svg?branch=master)](https://coveralls.io/github/YgorCastor/lextract?branch=master)\n\nLLM-powered text extraction library for Elixir. Based on Google's [LangExtract](https://github.com/google/langextract)\n\nLeXtract enables you to extract structured information from unstructured text using Large Language Models (LLMs). It provides a simple, streaming API with support for multiple LLM providers.\n\n## Features\n\n- **Multi-Provider LLM Support** - Works with OpenAI, Gemini, Anthropic, and other providers through ReqLLM\n- **Streaming API** - Memory-efficient batch processing with lazy streams\n- **Automatic Text Chunking** - Handles long documents with configurable chunk sizes and overlap\n- **Character-Level Alignment** - Precise alignment of extractions to source text positions\n- **Schema Generation** - Automatic schema inference from examples\n- **Template-Based Configuration** - Reusable extraction templates in JSON or YAML\n- **Structured Output Mode** - Enhanced reliability with schema validation\n- **Multi-Pass Extraction** - Improved recall through multiple extraction passes\n- **Flexible Output Formats** - Support for JSON and YAML output formats\n\n## Installation\n\nAdd `lextract` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [\n    {:lextract, \"~\u003e 0.1.0\"}\n  ]\nend\n```\n\n## Quick Start\n\n### Basic Entity Extraction\n\nExtract named entities from text with inline template options:\n\n```elixir\n{:ok, stream} = LeXtract.extract(\n  \"Dr. Smith prescribed aspirin 100mg to the patient.\",\n  prompt: \"Extract medical entities from the text\",\n  examples: [\n    %{\n      text: \"Patient takes ibuprofen 200mg\",\n      extractions: [\n        %{extraction_class: \"Medication\", name: \"ibuprofen\", dosage: \"200mg\"}\n      ]\n    }\n  ],\n  model: \"gpt-4o-mini\",\n  provider: :openai\n)\n\nannotated_docs = Enum.to_list(stream)\n```\n\n### Using Template Files\n\nCreate a template file (JSON or YAML) for reusable extraction configurations:\n\n```yaml\n# medication_template.yaml\ndescription: Extract medication entities with dosage and frequency\nexamples:\n  - text: \"Patient takes aspirin 100mg twice daily\"\n    extractions:\n      - extraction_class: Medication\n        name: aspirin\n        dosage: 100mg\n        frequency: twice daily\n```\n\nThen extract using the template:\n\n```elixir\n{:ok, stream} = LeXtract.extract(\n  \"Dr. Jones prescribed metformin 500mg once daily.\",\n  template_file: \"medication_template.yaml\",\n  model: \"gpt-4o-mini\",\n  provider: :openai\n)\n```\n\n### Batch Processing with Streams\n\nProcess multiple documents efficiently with streaming:\n\n```elixir\ndocuments = [\n  \"First patient document...\",\n  \"Second patient document...\",\n  \"Third patient document...\"\n]\n\n{:ok, stream} = LeXtract.extract(\n  documents,\n  prompt: \"Extract medical conditions\",\n  examples: [...],\n  model: \"gpt-4o-mini\",\n  provider: :openai,\n  batch_size: 5\n)\n\nstream\n|\u003e Stream.each(fn annotated_doc -\u003e\n  IO.puts(\"Document: #{annotated_doc.document_id}\")\n  IO.puts(\"Extractions: #{length(annotated_doc.extractions)}\")\nend)\n|\u003e Stream.run()\n```\n\n### Structured Output Mode\n\nFor better reliability and schema validation, use structured output mode:\n\n```elixir\n{:ok, stream} = LeXtract.extract(\n  \"Patient has hypertension and diabetes.\",\n  prompt: \"Extract medical conditions\",\n  examples: [\n    %{\n      text: \"Patient diagnosed with asthma\",\n      extractions: [\n        %{extraction_class: \"Condition\", name: \"asthma\", severity: \"mild\"}\n      ]\n    }\n  ],\n  model: \"gpt-4o-mini\",\n  provider: :openai,\n  use_structured_output: true\n)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fycastorium%2Flextract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fycastorium%2Flextract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fycastorium%2Flextract/lists"}