An open API service indexing awesome lists of open source software.

https://github.com/567-labs/instructor

structured outputs for llms
https://github.com/567-labs/instructor

openai openai-function-calli openai-functions pydantic-v2 python validation

Last synced: 2 days ago
JSON representation

structured outputs for llms

Awesome Lists containing this project

README

          

# Instructor: Structured Outputs for LLMs

Get reliable JSON from any LLM. Built on Pydantic for validation, type safety, and IDE support.

```python
import instructor
from pydantic import BaseModel

# Define what you want
class User(BaseModel):
name: str
age: int

# Extract it from natural language
client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "John is 25 years old"}],
)

print(user) # User(name='John', age=25)
```

**That's it.** No JSON parsing, no error handling, no retries. Just define a model and get structured data.

[![PyPI](https://img.shields.io/pypi/v/instructor?style=flat-square)](https://pypi.org/project/instructor/)
[![Downloads](https://img.shields.io/pypi/dm/instructor?style=flat-square)](https://pypi.org/project/instructor/)
[![GitHub Stars](https://img.shields.io/github/stars/instructor-ai/instructor?style=flat-square)](https://github.com/instructor-ai/instructor)
[![Discord](https://img.shields.io/discord/1192334452110659664?style=flat-square)](https://discord.gg/bD9YE9JArw)
[![Twitter](https://img.shields.io/twitter/follow/jxnlco?style=flat-square)](https://twitter.com/jxnlco)

> **Use Instructor for fast extraction, reach for PydanticAI when you need agents.** Instructor keeps schema-first flows simple and cheap. If your app needs richer agent runs, built-in observability, or shareable traces, try [PydanticAI](https://ai.pydantic.dev/). PydanticAI is the official agent runtime from the Pydantic team, adding typed tools, replayable datasets, evals, and production dashboards while using the same Pydantic models. Dive into the [PydanticAI docs](https://ai.pydantic.dev/) to see how it extends Instructor-style workflows.

## Why Instructor?

Getting structured data from LLMs is hard. You need to:

1. Write complex JSON schemas
2. Handle validation errors
3. Retry failed extractions
4. Parse unstructured responses
5. Deal with different provider APIs

**Instructor handles all of this with one simple interface:**

Without Instructor
With Instructor

```python
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "..."}],
tools=[
{
"type": "function",
"function": {
"name": "extract_user",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
},
},
},
}
],
)

# Parse response
tool_call = response.choices[0].message.tool_calls[0]
user_data = json.loads(tool_call.function.arguments)

# Validate manually
if "name" not in user_data:
# Handle error...
pass
```

```python
client = instructor.from_provider("openai/gpt-4")

user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)

# That's it! user is validated and typed
```

## Install in seconds

```bash
pip install instructor
```

Or with your package manager:
```bash
uv add instructor
poetry add instructor
```

## Works with every major provider

Use the same code with any LLM provider:

```python
# OpenAI
client = instructor.from_provider("openai/gpt-4o")

# Anthropic
client = instructor.from_provider("anthropic/claude-3-5-sonnet")

# Google
client = instructor.from_provider("google/gemini-pro")

# Ollama (local)
client = instructor.from_provider("ollama/llama3.2")

# With API keys directly (no environment variables needed)
client = instructor.from_provider("openai/gpt-4o", api_key="sk-...")
client = instructor.from_provider("anthropic/claude-3-5-sonnet", api_key="sk-ant-...")
client = instructor.from_provider("groq/llama-3.1-8b-instant", api_key="gsk_...")

# All use the same API!
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)
```

## Production-ready features

### Automatic retries

Failed validations are automatically retried with the error message:

```python
from pydantic import BaseModel, field_validator

class User(BaseModel):
name: str
age: int

@field_validator('age')
def validate_age(cls, v):
if v < 0:
raise ValueError('Age must be positive')
return v

# Instructor automatically retries when validation fails
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
max_retries=3,
)
```

### Streaming support

Stream partial objects as they're generated:

```python
from instructor import Partial

for partial_user in client.chat.completions.create(
response_model=Partial[User],
messages=[{"role": "user", "content": "..."}],
stream=True,
):
print(partial_user)
# User(name=None, age=None)
# User(name="John", age=None)
# User(name="John", age=25)
```

### Nested objects

Extract complex, nested data structures:

```python
from typing import List

class Address(BaseModel):
street: str
city: str
country: str

class User(BaseModel):
name: str
age: int
addresses: List[Address]

# Instructor handles nested objects automatically
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "..."}],
)
```

## Used in production by

Trusted by over 100,000 developers and companies building AI applications:

- **3M+ monthly downloads**
- **10K+ GitHub stars**
- **1000+ community contributors**

Companies using Instructor include teams at OpenAI, Google, Microsoft, AWS, and many YC startups.

## Get started

### Basic extraction

Extract structured data from any text:

```python
from pydantic import BaseModel
import instructor

client = instructor.from_provider("openai/gpt-4o-mini")

class Product(BaseModel):
name: str
price: float
in_stock: bool

product = client.chat.completions.create(
response_model=Product,
messages=[{"role": "user", "content": "iPhone 15 Pro, $999, available now"}],
)

print(product)
# Product(name='iPhone 15 Pro', price=999.0, in_stock=True)
```

### Multiple languages

Instructor's simple API is available in many languages:

- [Python](https://python.useinstructor.com) - The original
- [TypeScript](https://js.useinstructor.com) - Full TypeScript support
- [Ruby](https://ruby.useinstructor.com) - Ruby implementation
- [Go](https://go.useinstructor.com) - Go implementation
- [Elixir](https://hex.pm/packages/instructor) - Elixir implementation
- [Rust](https://rust.useinstructor.com) - Rust implementation

### Learn more

- [Documentation](https://python.useinstructor.com) - Comprehensive guides
- [Examples](https://python.useinstructor.com/examples/) - Copy-paste recipes
- [Blog](https://python.useinstructor.com/blog/) - Tutorials and best practices
- [Discord](https://discord.gg/bD9YE9JArw) - Get help from the community

## Why use Instructor over alternatives?

**vs Raw JSON mode**: Instructor provides automatic validation, retries, streaming, and nested object support. No manual schema writing.

**vs LangChain/LlamaIndex**: Instructor is focused on one thing - structured extraction. It's lighter, faster, and easier to debug.

**vs Custom solutions**: Battle-tested by thousands of developers. Handles edge cases you haven't thought of yet.

## Contributing

We welcome contributions! Check out our [good first issues](https://github.com/instructor-ai/instructor/labels/good%20first%20issue) to get started.

## License

MIT License - see [LICENSE](https://github.com/instructor-ai/instructor/blob/main/LICENSE) for details.

---


Built by the Instructor community. Special thanks to Jason Liu and all contributors.