https://github.com/567-labs/instructor
structured outputs for llms
https://github.com/567-labs/instructor
openai openai-function-calli openai-functions pydantic-v2 python validation
Last synced: 2 days ago
JSON representation
structured outputs for llms
- Host: GitHub
- URL: https://github.com/567-labs/instructor
- Owner: 567-labs
- License: mit
- Created: 2023-06-14T10:42:23.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-05-12T22:06:57.000Z (8 months ago)
- Last Synced: 2025-05-12T22:44:05.188Z (8 months ago)
- Topics: openai, openai-function-calli, openai-functions, pydantic-v2, python, validation
- Language: Python
- Homepage: https://python.useinstructor.com/
- Size: 128 MB
- Stars: 10,405
- Watchers: 57
- Forks: 781
- Open Issues: 23
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-ml-python-packages - Instructor
- awesome-ChatGPT-repositories - instructor - structured outputs for llms (NLP)
- awesome-ai-agents - 567-labs/instructor - Instructor is a popular Python library that simplifies working with structured outputs from large language models by providing validation, retries, and streaming support across multiple LLM providers. (Agent Integration & Deployment Tools / LLM Framework Tools)
README
# Instructor: Structured Outputs for LLMs
Get reliable JSON from any LLM. Built on Pydantic for validation, type safety, and IDE support.
```python
import instructor
from pydantic import BaseModel
# Define what you want
class User(BaseModel):
name: str
age: int
# Extract it from natural language
client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "John is 25 years old"}],
)
print(user) # User(name='John', age=25)
```
**That's it.** No JSON parsing, no error handling, no retries. Just define a model and get structured data.
[](https://pypi.org/project/instructor/)
[](https://pypi.org/project/instructor/)
[](https://github.com/instructor-ai/instructor)
[](https://discord.gg/bD9YE9JArw)
[](https://twitter.com/jxnlco)
> **Use Instructor for fast extraction, reach for PydanticAI when you need agents.** Instructor keeps schema-first flows simple and cheap. If your app needs richer agent runs, built-in observability, or shareable traces, try [PydanticAI](https://ai.pydantic.dev/). PydanticAI is the official agent runtime from the Pydantic team, adding typed tools, replayable datasets, evals, and production dashboards while using the same Pydantic models. Dive into the [PydanticAI docs](https://ai.pydantic.dev/) to see how it extends Instructor-style workflows.
## Why Instructor?
Getting structured data from LLMs is hard. You need to:
1. Write complex JSON schemas
2. Handle validation errors
3. Retry failed extractions
4. Parse unstructured responses
5. Deal with different provider APIs
**Instructor handles all of this with one simple interface:**
Without Instructor
With Instructor
```python
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "..."}],
tools=[
{
"type": "function",
"function": {
"name": "extract_user",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
},
},
},
}
],
)
# Parse response
tool_call = response.choices[0].message.tool_calls[0]
user_data = json.loads(tool_call.function.arguments)
# Validate manually
if "name" not in user_data:
# Handle error...
pass
```
```python
client = instructor.from_provider("openai/gpt-4")
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)
# That's it! user is validated and typed
```
## Install in seconds
```bash
pip install instructor
```
Or with your package manager:
```bash
uv add instructor
poetry add instructor
```
## Works with every major provider
Use the same code with any LLM provider:
```python
# OpenAI
client = instructor.from_provider("openai/gpt-4o")
# Anthropic
client = instructor.from_provider("anthropic/claude-3-5-sonnet")
# Google
client = instructor.from_provider("google/gemini-pro")
# Ollama (local)
client = instructor.from_provider("ollama/llama3.2")
# With API keys directly (no environment variables needed)
client = instructor.from_provider("openai/gpt-4o", api_key="sk-...")
client = instructor.from_provider("anthropic/claude-3-5-sonnet", api_key="sk-ant-...")
client = instructor.from_provider("groq/llama-3.1-8b-instant", api_key="gsk_...")
# All use the same API!
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)
```
## Production-ready features
### Automatic retries
Failed validations are automatically retried with the error message:
```python
from pydantic import BaseModel, field_validator
class User(BaseModel):
name: str
age: int
@field_validator('age')
def validate_age(cls, v):
if v < 0:
raise ValueError('Age must be positive')
return v
# Instructor automatically retries when validation fails
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
max_retries=3,
)
```
### Streaming support
Stream partial objects as they're generated:
```python
from instructor import Partial
for partial_user in client.chat.completions.create(
response_model=Partial[User],
messages=[{"role": "user", "content": "..."}],
stream=True,
):
print(partial_user)
# User(name=None, age=None)
# User(name="John", age=None)
# User(name="John", age=25)
```
### Nested objects
Extract complex, nested data structures:
```python
from typing import List
class Address(BaseModel):
street: str
city: str
country: str
class User(BaseModel):
name: str
age: int
addresses: List[Address]
# Instructor handles nested objects automatically
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)
```
## Used in production by
Trusted by over 100,000 developers and companies building AI applications:
- **3M+ monthly downloads**
- **10K+ GitHub stars**
- **1000+ community contributors**
Companies using Instructor include teams at OpenAI, Google, Microsoft, AWS, and many YC startups.
## Get started
### Basic extraction
Extract structured data from any text:
```python
from pydantic import BaseModel
import instructor
client = instructor.from_provider("openai/gpt-4o-mini")
class Product(BaseModel):
name: str
price: float
in_stock: bool
product = client.chat.completions.create(
response_model=Product,
messages=[{"role": "user", "content": "iPhone 15 Pro, $999, available now"}],
)
print(product)
# Product(name='iPhone 15 Pro', price=999.0, in_stock=True)
```
### Multiple languages
Instructor's simple API is available in many languages:
- [Python](https://python.useinstructor.com) - The original
- [TypeScript](https://js.useinstructor.com) - Full TypeScript support
- [Ruby](https://ruby.useinstructor.com) - Ruby implementation
- [Go](https://go.useinstructor.com) - Go implementation
- [Elixir](https://hex.pm/packages/instructor) - Elixir implementation
- [Rust](https://rust.useinstructor.com) - Rust implementation
### Learn more
- [Documentation](https://python.useinstructor.com) - Comprehensive guides
- [Examples](https://python.useinstructor.com/examples/) - Copy-paste recipes
- [Blog](https://python.useinstructor.com/blog/) - Tutorials and best practices
- [Discord](https://discord.gg/bD9YE9JArw) - Get help from the community
## Why use Instructor over alternatives?
**vs Raw JSON mode**: Instructor provides automatic validation, retries, streaming, and nested object support. No manual schema writing.
**vs LangChain/LlamaIndex**: Instructor is focused on one thing - structured extraction. It's lighter, faster, and easier to debug.
**vs Custom solutions**: Battle-tested by thousands of developers. Handles edge cases you haven't thought of yet.
## Contributing
We welcome contributions! Check out our [good first issues](https://github.com/instructor-ai/instructor/labels/good%20first%20issue) to get started.
## License
MIT License - see [LICENSE](https://github.com/instructor-ai/instructor/blob/main/LICENSE) for details.
---
Built by the Instructor community. Special thanks to Jason Liu and all contributors.