https://github.com/firattamur/llmdantic
Structured Output Is All You Need!
https://github.com/firattamur/llmdantic
langchain langchain-python llm llms pydantic pydantic-v2
Last synced: 2 months ago
JSON representation
Structured Output Is All You Need!
- Host: GitHub
- URL: https://github.com/firattamur/llmdantic
- Owner: firattamur
- License: mit
- Created: 2024-03-17T20:51:25.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-19T21:47:44.000Z (about 1 year ago)
- Last Synced: 2025-03-24T11:13:17.247Z (3 months ago)
- Topics: langchain, langchain-python, llm, llms, pydantic, pydantic-v2
- Language: Python
- Homepage:
- Size: 1.8 MB
- Stars: 54
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![]()
Structured Output Is All You Need!
LLMdantic is a powerful and efficient Python library that simplifies the integration of Large Language Models (LLMs) into your projects. Built on top of the incredible [Langchain](https://github.com/hwchase17/langchain) package and leveraging the power of [Pydantic](https://github.com/pydantic/pydantic) models, LLMdantic provides a seamless and structured approach to working with LLMs.
## Features 🚀
- 🌐 Wide range of LLM support through Langchain integrations
- 🛡️ Ensures data integrity with Pydantic models for input and output validation
- 🧩 Modular and extensible design for easy customization
- 💰 Cost tracking and optimization for OpenAI models
- 🚀 Efficient batch processing for handling multiple data points
- 🔄 Robust retry mechanism for smooth and uninterrupted experience## Getting Started 🌟
### Requirements
Before using LLMdantic, make sure you have set the required API keys for the LLMs you plan to use. For example, if you're using OpenAI's models, set the `OPENAI_API_KEY` environment variable:
```bash
export OPENAI_API_KEY="your-api-key"
```If you're using other LLMs, follow the instructions provided by the respective providers in Langchain's documentation.
### Installation
```bash
pip install llmdantic
```### Usage
#### 1. Define input and output schemas using Pydantic:
- Use Pydantic to define input and output models with custom validation rules.
> [!IMPORTANT]
>
> Add docstrings to validation rules to provide prompts for the LLM. This will help the LLM understand the validation rules and provide better results```python
from pydantic import BaseModel, field_validatorclass SummarizeInput(BaseModel):
text: strclass SummarizeOutput(BaseModel):
summary: str@field_validator("summary")
def summary_must_not_be_empty(cls, v) -> bool:
"""Summary cannot be empty""" # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
if not v.strip():
raise
return v@field_validator("summary")
def summary_must_be_short(cls, v) -> bool:
"""Summary must be less than 100 words""" # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
if len(v.split()) > 100:
raise
return v
```#### 2. Create an LLMdantic client:
- Provide input and output models, objective, and configuration.
> [!TIP]
>
> The `objective` is a prompt that will be used to generate the actual prompt sent to the LLM. It should be a high-level description of the task you want the LLM to perform.
>
> The `inp_schema` and `out_schema` are the input and output models you defined in the previous step.
>
> The `retries` parameter is the number of times the LLMdantic will retry the request in case of failure.```python
from llmdantic import LLMdantic, LLMdanticConfig
from langchain_openai import ChatOpenAIllm = ChatOpenAI()
config: LLMdanticConfig = LLMdanticConfig(
objective="Summarize the text",
inp_schema=SummarizeInput,
out_schema=SummarizeOutput,
retries=3,
)llmdantic = LLMdantic(llm=llm, config=config)
```Here's the prompt template generated based on the input and output models:
```text
Objective: Summarize the textInput 'SummarizeInput':
{input}Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDS{format_instructions}
```#### 3. Generate output using the LLMdantic:
> [!TIP]
>
> The `invoke` method is used for single requests, while the `batch` method is used for batch processing.
>
> The `invoke` method returns an instance of `LLMdanticResult`, which contains the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the [LLMdanticResult](#LLMdanticResult) model for more details.
>```python
from llmdantic import LLMdanticResultdata = SummarizeInput(text="A long article about natural language processing...")
result: LLMdanticResult = llmdantic.invoke(data)output: Optional[SummarizeOutput] = result.output
if output:
print(output.summary)
```Here's the actual prompt sent to the LLM based on the input data:
```text
Objective: Summarize the textInput 'SummarizeInput':
{'text': 'A long article about natural language processing...'}Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDSThe output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.Here is the output schema:
{"properties": {"summary": {"title": "Summary", "type": "string"}}, "required": ["summary"]}
```- For batch processing, pass a list of input data.
> [!IMPORTANT]
>
> The `batch` method returns a list of `LLMdanticResult` instances, each containing the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the [LLMdanticResult](#LLMdanticResult) model for more details.
>
> The `concurrency` parameter is the number of concurrent requests to be made. Please check the usage limits of the LLM provider before setting this value.
>```python
data: List[SummarizeInput] = [
SummarizeInput(text="A long article about natural language processing..."),
SummarizeInput(text="A long article about computer vision...")
]
results: List[LLMdanticResult] = llmdantic.batch(data, concurrency=2)for result in results:
if result.output:
print(result.output.summary)
```#### 4. Monitor usage and costs:
> [!IMPORTANT]
>
> The cost tracking feature is currently available for OpenAI models only.
>
> The `usage` attribute returns an instance of `LLMdanticUsage`, which contains the number of input and output tokens, successful requests, cost, and successful outputs. Check out the [LLMdanticUsage](#LLMdanticUsage) model for more details.
>
> Please note that the usage is tracked for the entire lifetime of the `LLMdantic` instance.- Use the `cost` attribute of the LLMdanticResult to track the cost of the request (currently available for OpenAI models).
- Use the `usage` attribute of the LLMdantic to track the usage stats overall.
```python
from llmdantic import LLMdanticResultdata: SummarizeInput = SummarizeInput(text="A long article about natural language processing...")
result: LLMdanticResult = llmdantic.invoke(data)if result.output:
print(result.output.summary)# Track the cost of the request (OpenAI models only)
print(f"Cost: {result.cost}")# Track the usage stats
print(f"Usage: {llmdantic.usage}")
``````bash
Cost: 0.0003665
Overall Usage: LLMdanticUsage(
inp_tokens=219,
out_tokens=19,
total_tokens=238,
successful_requests=1,
cost=0.000367,
successful_outputs=1
)
```## Advanced Usage 🛠
`LLMdantic` is built on top of the langchain package, which provides a modular and extensible framework for working with LLMs. You can easily switch between different LLMs and customize your experience.
Switching LLMs
> [!IMPORTANT]
>
> Make sure to set the required API keys for the new LLM you plan to use.
>
> The `llm` parameter of the `LLMdantic` class should be an instance of `BaseLanguageModel` from the langchain package.
>> [!TIP]
>
> You can use the `langchain_community` package to access a wide range of LLMs from different providers.
>
> You may need to provide model_name, api_key, and other parameters based on the LLM you want to use. Check out the documentation of the respective LLM provider for more details.
>```python
from llmdantic import LLMdantic, LLMdanticConfig
from langchain_community.llm.ollama import Ollama
from langchain.llms.base import BaseLanguageModelllm: BaseLanguageModel = Ollama()
config: LLMdanticConfig = LLMdanticConfig(
objective="Summarize the text",
inp_schema=SummarizeInput,
out_schema=SummarizeOutput,
retries=3,
)llmdantic = LLMdantic(
llm=llm,
config=config
)
```## Contributing 🤝
Contributions are welcome! Whether you're fixing bugs, adding new features, or improving documentation, your help makes
**LLMdantic** better for everyone. Feel free to open an issue or submit a pull request.## License 📄
**LLMdantic** is released under the [MIT License](LICENSE). Feel free to use it, contribute, and spread the word!