{"id":24958471,"url":"https://github.com/yaniv-golan/ostruct","last_synced_at":"2025-04-15T13:55:23.253Z","repository":{"id":274753556,"uuid":"921928083","full_name":"yaniv-golan/ostruct","owner":"yaniv-golan","description":"ostruct uses OpenAI Structured Output APIs to process a set of plain text files (data, reports, source code, CSV, etc), input variables, a dynamic prompt template, and a JSON schema specifying the desired output format, and will produce the result in JSON format.","archived":false,"fork":false,"pushed_at":"2025-03-15T20:25:07.000Z","size":1713,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-15T13:55:08.234Z","etag":null,"topics":["json","json-schema","openai","prompt-template","structured-output"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yaniv-golan.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-24T22:14:27.000Z","updated_at":"2025-03-16T15:47:58.000Z","dependencies_parsed_at":"2025-01-29T07:24:52.575Z","dependency_job_id":"cbe59dae-451e-46ef-9658-0118504e20d5","html_url":"https://github.com/yaniv-golan/ostruct","commit_stats":null,"previous_names":["yaniv-golan/ostruct"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaniv-golan%2Fostruct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaniv-golan%2Fostruct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaniv-golan%2Fostruct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaniv-golan%2Fostruct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yaniv-golan","download_url":"https://codeload.github.com/yaniv-golan/ostruct/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249085482,"owners_count":21210267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","json-schema","openai","prompt-template","structured-output"],"created_at":"2025-02-03T07:11:31.877Z","updated_at":"2025-04-15T13:55:23.231Z","avatar_url":"https://github.com/yaniv-golan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![ostruct](src/assets/ostruct-header.png)\n\n\u003cdiv align=\"center\"\u003e\n\n[![PyPI version](https://badge.fury.io/py/ostruct-cli.svg)](https://badge.fury.io/py/ostruct-cli)\n[![Python Versions](https://img.shields.io/pypi/pyversions/ostruct-cli.svg)](https://pypi.org/project/ostruct-cli)\n[![Documentation Status](https://readthedocs.org/projects/ostruct/badge/?version=latest)](https://ostruct.readthedocs.io/en/latest/?badge=latest)\n[![CI](https://github.com/yaniv-golan/ostruct/actions/workflows/ci.yml/badge.svg)](https://github.com/yaniv-golan/ostruct/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**ostruct** tranforms **unstructured** inputs into **structured**, usable **JSON** output using **OpenAI APIs** using dynamic **templates**\n\n\u003c/div\u003e\n\n# ostruct-cli\n\nostruct will process a set of plain text files (data, source code, CSV, etc), input variables, a dynamic prompt template, and a JSON schema specifying the desired output format, and will produce the result in JSON format.\n\n\u003cdiv align=\"center\"\u003e\n\n![How ostruct works](src/assets/ostrict-hl-diagram.png)\n\n\u003c/div\u003e\n\n## Why ostruct?\n\nLLMs are powerful, but getting consistent, structured output from them can be challenging. ostruct solves this problem by providing a streamlined approach to transform unstructured data into reliable JSON structures. The motivation behind creating ostruct was to:\n\n- **Bridge the gap** between freeform LLM capabilities and structured data needs in production systems\n- **Simplify integration** of AI into existing workflows and applications that expect consistent data formats\n- **Ensure reliability** and validate output against a defined schema to avoid unexpected formats or missing data\n- **Reduce development time** by providing a standardized way to interact with OpenAI models for structured outputs\n- **Enable non-developers** to leverage AI capabilities through a simple CLI interface with templates\n\n## Real-World Use Cases\n\nostruct can be used for various scenarios, including:\n\n### Etymology Analysis\n\n```bash\nostruct run prompts/task.j2 schemas/etymology.json -f input examples/scientific.txt --model gpt-4o\n```\n\nBreak down words into their components, showing their origins, meanings, and hierarchical relationships. Useful for linguistics, educational tools, and understanding terminology in specialized fields.\n\n### Automated Code Review\n\n```bash\nostruct run prompts/task.j2 schemas/code_review.json -p source \"examples/security/*.py\" --model gpt-4o\n```\n\nAnalyze code for security vulnerabilities, style issues, and performance problems, producing structured reports that can be easily integrated into CI/CD pipelines or developer workflows.\n\n### Security Vulnerability Scanning\n\n```bash\nostruct run prompts/task.j2 schemas/scan_result.json -d examples/intermediate --model gpt-4o\n```\n\nScan codebases for security vulnerabilities, combining static analysis with AI-powered reasoning to identify potential issues, suggest fixes, and provide detailed explanations.\n\n### Configuration Validation \u0026 Analysis\n\n```bash\nostruct run prompts/task.j2 schemas/validation_result.json -f dev examples/basic/dev.yaml -f prod examples/basic/prod.yaml\n```\n\nValidate configuration files across environments, check for inconsistencies, and provide intelligent feedback on potential issues or improvements in infrastructure setups.\n\n## Features\n\n- Generate structured JSON output from natural language using OpenAI models and a JSON schema\n- Rich template system for defining prompts (Jinja2-based)\n- Automatic token counting and context window management\n- Streaming support for real-time output\n- Secure handling of sensitive data\n- Model registry management with support for updating to the latest OpenAI models\n- Non-intrusive registry update checks with user notifications\n\n## Requirements\n\n- Python 3.10 or higher\n\n## Installation\n\n### For Users\n\nTo install the latest stable version from PyPI:\n\n```bash\npip install ostruct-cli\n```\n\n### For Developers\n\nIf you plan to contribute to the project, see the [Development Setup](#development-setup) section below for instructions on setting up the development environment with Poetry.\n\n## Environment Variables\n\nostruct-cli respects the following environment variables:\n\n- `OPENAI_API_KEY`: Your OpenAI API key (required unless provided via command line)\n- `OPENAI_API_BASE`: Custom API base URL (optional)\n- `OPENAI_API_VERSION`: API version to use (optional)\n- `OPENAI_API_TYPE`: API type (e.g., \"azure\") (optional)\n- `OSTRUCT_DISABLE_UPDATE_CHECKS`: Set to \"1\", \"true\", or \"yes\" to disable automatic registry update checks\n\n## Shell Completion\n\nostruct-cli supports shell completion for Bash, Zsh, and Fish shells. To enable it:\n\n### Bash\n\nAdd this to your `~/.bashrc`:\n\n```bash\neval \"$(_OSTRUCT_COMPLETE=bash_source ostruct)\"\n```\n\n### Zsh\n\nAdd this to your `~/.zshrc`:\n\n```bash\neval \"$(_OSTRUCT_COMPLETE=zsh_source ostruct)\"\n```\n\n### Fish\n\nAdd this to your `~/.config/fish/completions/ostruct.fish`:\n\n```fish\neval (env _OSTRUCT_COMPLETE=fish_source ostruct)\n```\n\nAfter adding the appropriate line, restart your shell or source the configuration file.\nShell completion will help you with:\n\n- Command options and their arguments\n- File paths for template and schema files\n- Directory paths for `-d` and `--base-dir` options\n- And more!\n\n## Quick Start\n\n1. Set your OpenAI API key:\n\n```bash\nexport OPENAI_API_KEY=your-api-key\n```\n\n### Example 1: Using stdin (Simplest)\n\n1. Create a template file `extract_person.j2`:\n\n```jinja\nExtract information about the person from this text: {{ stdin }}\n```\n\n2. Create a schema file `schema.json`:\n\n```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"person\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"name\": {\n          \"type\": \"string\",\n          \"description\": \"The person's full name\"\n        },\n        \"age\": {\n          \"type\": \"integer\",\n          \"description\": \"The person's age\"\n        },\n        \"occupation\": {\n          \"type\": \"string\",\n          \"description\": \"The person's job or profession\"\n        }\n      },\n      \"required\": [\"name\", \"age\", \"occupation\"],\n      \"additionalProperties\": false\n    }\n  },\n  \"required\": [\"person\"],\n  \"additionalProperties\": false\n}\n```\n\n3. Run the CLI:\n\n```bash\n# Basic usage\necho \"John Smith is a 35-year-old software engineer\" | ostruct run extract_person.j2 schema.json\n\n# For longer text using heredoc\ncat \u003c\u003c EOF | ostruct run extract_person.j2 schema.json\nJohn Smith is a 35-year-old software engineer\nworking at Tech Corp. He has been programming\nfor over 10 years.\nEOF\n\n# With advanced options\necho \"John Smith is a 35-year-old software engineer\" | \\\n  ostruct run extract_person.j2 schema.json \\\n  --model gpt-4o \\\n  --sys-prompt \"Extract precise information about the person\" \\\n  --temperature 0.7\n```\n\nThe command will output:\n\n```json\n{\n  \"person\": {\n    \"name\": \"John Smith\",\n    \"age\": 35,\n    \"occupation\": \"software engineer\"\n  }\n}\n```\n\n### Example 2: Processing a Single File\n\n1. Create a template file `extract_from_file.j2`:\n\n```jinja\nExtract information about the person from this text: {{ text.content }}\n```\n\n2. Use the same schema file `schema.json` as above.\n\n3. Run the CLI:\n\n```bash\n# Basic usage\nostruct run extract_from_file.j2 schema.json -f text input.txt\n\n# With advanced options\nostruct run extract_from_file.j2 schema.json \\\n  -f text input.txt \\\n  --model gpt-4o \\\n  --max-output-tokens 1000 \\\n  --temperature 0.7\n```\n\nThe command will output:\n\n```json\n{\n  \"person\": {\n    \"name\": \"John Smith\",\n    \"age\": 35,\n    \"occupation\": \"software engineer\"\n  }\n}\n```\n\n## System Prompt Handling\n\nostruct-cli provides three ways to specify a system prompt, with a clear precedence order:\n\n1. Command-line option (`--sys-prompt` or `--sys-file`):\n\n   ```bash\n   # Direct string\n   ostruct run template.j2 schema.json --sys-prompt \"You are an expert analyst\"\n\n   # From file\n   ostruct run template.j2 schema.json --sys-file system_prompt.txt\n   ```\n\n2. Template frontmatter:\n\n   ```jinja\n   ---\n   system_prompt: You are an expert analyst\n   ---\n   Extract information from: {{ text }}\n   ```\n\n3. Default system prompt (built into the CLI)\n\n### Precedence Rules\n\nWhen multiple system prompts are provided, they are resolved in this order:\n\n1. Command-line options take highest precedence:\n   - If both `--sys-prompt` and `--sys-file` are provided, `--sys-prompt` wins\n   - Use `--ignore-task-sysprompt` to ignore template frontmatter\n\n2. Template frontmatter is used if:\n   - No command-line options are provided\n   - `--ignore-task-sysprompt` is not set\n\n3. Default system prompt is used only if no other prompts are provided\n\nExample combining multiple sources:\n\n```bash\n# Command-line prompt will override template frontmatter\nostruct run template.j2 schema.json --sys-prompt \"Override prompt\"\n\n# Ignore template frontmatter and use default\nostruct run template.j2 schema.json --ignore-task-sysprompt\n```\n\n## Model Registry Management\n\nostruct-cli maintains a registry of OpenAI models and their capabilities, which includes:\n\n- Context window sizes for each model\n- Maximum output token limits\n- Supported parameters and their constraints\n- Model version information\n\nTo ensure you're using the latest models and features, you can update the registry:\n\n```bash\n# Update from the official repository\nostruct update-registry\n\n# Update from a custom URL\nostruct update-registry --url https://example.com/models.yml\n\n# Force an update even if the registry is current\nostruct update-registry --force\n```\n\nThis is especially useful when:\n\n- New OpenAI models are released\n- Model capabilities or parameters change\n- You need to work with custom model configurations\n\nThe registry file is stored at `~/.openai_structured/config/models.yml` and is automatically referenced when validating model parameters and token limits.\n\nThe update command uses HTTP conditional requests (If-Modified-Since headers) to check if the remote registry has changed before downloading, ensuring efficient updates.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaniv-golan%2Fostruct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyaniv-golan%2Fostruct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaniv-golan%2Fostruct/lists"}