https://github.com/dicklesworthstone/guide_to_openai_response_api_and_agents_sdk

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/dicklesworthstone/guide_to_openai_response_api_and_agents_sdk
Owner: Dicklesworthstone
Created: 2025-04-25T08:26:21.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-04-25T08:26:38.000Z (6 months ago)
Last Synced: 2025-07-02T18:51:33.678Z (3 months ago)
Size: 112 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

**The Agentic Vision and the New Responses API Foundation**

**1. Introduction: OpenAI's Shift Towards Agentic AI**

OpenAI's March 11, 2025 announcement marks a pivotal moment in their platform strategy, explicitly focusing on empowering developers and enterprises to build **agents**. An agent, in OpenAI's definition, is more than just a chatbot; it's a system designed to *independently accomplish tasks* on behalf of a user. This shift acknowledges that while powerful models (like GPT-4o, o1) provide the core intelligence (reasoning, multimodality), translating this potential into reliable, production-ready applications that perform complex, multi-step tasks has been challenging.

* **The Problem:** Developers previously faced hurdles like:
* **Complex Prompt Engineering:** Iterating extensively on prompts to elicit desired multi-step behaviors.
* **Custom Orchestration:** Building significant custom logic to manage conversation state, tool interactions, and workflow steps.
* **Limited Visibility:** Difficulty in debugging and understanding the agent's internal reasoning process when things went wrong.
* **API Fragmentation:** Needing to potentially juggle Chat Completions, Assistants API, and external tools/vendors.

* **The Solution:** OpenAI introduced a suite of integrated tools designed to streamline agent development:
* **Responses API:** A new foundational API combining the best of Chat Completions and Assistants.
* **Built-in Tools:** Powerful capabilities like Web Search, File Search, and Computer Use, directly integrated.
* **Agents SDK:** An open-source library for orchestrating agent workflows (single and multi-agent).
* **Observability Tools:** Integrated tracing to inspect and debug agent execution.

This suite aims to provide the necessary "building blocks" for creating useful and reliable agents, reducing the complexity previously associated with building such systems.

**2. The Responses API: A Unified Core for Agent Interactions**

At the heart of this new agent platform lies the **Responses API** (`/v1/responses`). It's positioned as OpenAI's "most advanced interface for generating model responses" and the recommended starting point for new agentic applications.

* **Design Philosophy:** The Responses API aims to merge the **simplicity** of the stateless Chat Completions API with the powerful **tool-use capabilities** previously primarily associated with the stateful Assistants API. The goal is to provide a flexible primitive that can handle increasingly complex tasks, potentially involving multiple tools and multiple model turns, within a *single API call* from the developer's perspective.

* **Key Features & Improvements (Synthesized from Docs):**
* **Superset of Chat Completions:** It offers everything Chat Completions does (text/image input, text/JSON output) *plus* built-in tool capabilities and more advanced features. Performance is stated to be equivalent.
* **Integrated Built-in Tools:** Natively supports Web Search, File Search, and Computer Use without requiring separate API integrations or external vendor management. Function calling (custom tools) is also supported.
* **Unified Item-Based Design:** A more consistent structure for inputs and outputs, simplifying how developers handle different types of content (text, images, tool calls, etc.).
* **Simpler Polymorphism:** Easier handling of different data types within the API structure.
* **Intuitive Streaming:** Improved server-sent events (SSE) for streaming responses, making it easier to build low-latency user experiences. (Details on specific events are in the API reference section).
* **SDK Helpers:** Convenience functions in the official SDKs (like `response.output_text`) simplify accessing common data like the model's final text output.
* **State Management via `previous_response_id`:** Enables multi-turn conversations by linking a new request to the previous response, allowing the API to manage the conversation history implicitly. This is simpler than manually managing message lists as often done with Chat Completions, but potentially less complex than the Thread objects in the Assistants API.
* **Data Storage for Evaluation:** Facilitates storing interaction data on OpenAI (optional, default `store=true`) to leverage platform features like tracing and evaluations, while adhering to the policy of not training on API data by default.

* **Availability and Cost:** The Responses API is available to all developers immediately upon release (March 11, 2025). There is **no separate charge** for using the Responses API itself; costs are based on standard token usage (input/output/reasoning) and any built-in tool usage (e.g., per query for search tools) as per the standard pricing page.

**3. Basic Usage Example (Responses API)**

Creating a simple text response using the Responses API is very similar to Chat Completions, highlighting its design goal of simplicity.

* **Python Example:**

```python
from openai import OpenAI

client = OpenAI() # Assumes OPENAI_API_KEY is set as an environment variable

try:
response = client.responses.create(
model="gpt-4.1", # Or other suitable models like gpt-4o, o3-mini etc.
input="Write a short, encouraging message about learning to code."
# store=False # Optional: Set to False to disable 30-day storage
)

print(response.output_text) # Access the combined text output easily

# For more details, you can inspect the full response object:
# print(response.model_dump_json(indent=2))

except Exception as e:
print(f"An error occurred: {e}")

# Example Output:
# Keep going! Every line of code you write, every bug you fix, makes you a stronger developer. Embrace the challenge!
```

* **JavaScript Example:**

```javascript
import OpenAI from "openai";

const client = new OpenAI(); // Assumes OPENAI_API_KEY is set

async function generateResponse() {
try {
const response = await client.responses.create({
model: "gpt-4.1", // Or other suitable models
input: "Write a short, encouraging message about learning to code."
// store: false // Optional: Disable storage
});

console.log(response.output_text); // Easy access to text

// Inspect the full response if needed:
// console.log(JSON.stringify(response, null, 2));

} catch (error) {
console.error("An error occurred:", error);
}
}

generateResponse();

// Example Output (similar to Python):
// Keep going! Every line of code you write, every bug you fix, makes you a stronger developer. Embrace the challenge!
```

* **cURL Example:**

```bash
curl "https://api.openai.com/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4.1",
"input": "Write a short, encouraging message about learning to code."
}'

# Example Response JSON (structure):
# {
# "id": "resp_...",
# "object": "response",
# "created_at": 17414...,
# "status": "completed",
# "model": "gpt-4.1-...",
# "output": [
# {
# "type": "message",
# "id": "msg_...",
# "status": "completed",
# "role": "assistant",
# "content": [
# {
# "type": "output_text",
# "text": "Keep going! Every line of code you write...",
# "annotations": []
# }
# ]
# }
# ],
# "usage": { ... },
# ... other fields ...
# }

```

**Analysis:** These examples show the basic interaction pattern. The `input` field accepts the prompt (or more complex structures for multi-modal input or conversation history). The `model` specifies the engine. The `response.output_text` helper in the SDKs conveniently aggregates the text from the `output` array, simplifying common use cases. The raw response object contains richer details, including usage stats and the structured output array.

**4. Implications for Existing APIs: A Strategic Shift**

OpenAI provides clear guidance on how the Responses API fits alongside existing APIs:

* **Chat Completions API (`/v1/chat/completions`):**
* **Continued Support:** Remains fully supported and is the most widely adopted API. Developers can confidently continue using it.
* **Use Case:** Ideal for applications that *do not* require OpenAI's built-in tools (Web Search, File Search, Computer Use) or complex, multi-turn logic managed implicitly by the API. Simple chatbot interactions, text generation, classification, etc., are well-suited.
* **Model Updates:** Will continue to receive new models, *provided* those models' capabilities don't depend on the specific features (like built-in tools) exclusive to the Responses API or requiring multiple internal model calls managed by the API.
* **Recommendation:** While supported, OpenAI recommends starting *new integrations* with the **Responses API**. Why? Because Responses is a *superset* offering the same performance *plus* access to the growing ecosystem of built-in tools and future agent-focused capabilities. It provides a more future-proof foundation.

* **Assistants API (`/v1/assistants`, `/v1/threads`, etc.):**
* **Background:** Launched as a Beta, the Assistants API introduced concepts like persistent Threads, Assistant objects, and built-in tools like Code Interpreter and Retrieval (the precursor to File Search).
* **Developer Feedback:** Feedback from the beta informed the design of the Responses API, aiming for improved flexibility, speed, and ease of use.
* **Feature Parity Goal:** OpenAI is actively working to bring *all* key Assistants API features into the Responses API. This includes:
* Support for Assistant-like and Thread-like objects (or equivalent concepts) for managing stateful interactions more explicitly if desired.
* Integration of the `code_interpreter` tool.
* **Planned Deprecation:** Once feature parity is achieved, OpenAI plans to **formally announce the deprecation of the Assistants API**.
* **Sunset Target:** The *target* date for shutting down the Assistants API is **mid-2026**. This is a long runway, giving developers ample time to migrate.
* **Migration Path:** OpenAI commits to providing a clear migration guide, allowing developers to preserve their data (Assistants, Threads, Files) and migrate applications to the Responses API framework.
* **Interim Support:** Until the formal deprecation announcement, the Assistants API **will continue to receive new models**. Developers currently using it are not forced to migrate immediately.
* **Clear Future:** The **Responses API represents the future direction** for building agents on the OpenAI platform.

OpenAI is doubling down on agents. The core technical piece enabling this is the **Responses API**. It's not just *another* API endpoint; it's presented as the *successor* architecture, designed to be the primary interface for leveraging OpenAI's increasingly sophisticated models and integrated tools for complex tasks.

It cleverly positions itself as offering the developer experience simplicity of Chat Completions while integrating the advanced capabilities (like tool use and implicit state management via `previous_response_id`) that made the Assistants API powerful, but perhaps complex.

The deprecation plan for the Assistants API is significant. While providing a long timeframe (mid-2026 target), it clearly signals developers to focus new development on the Responses API and plan for eventual migration. The commitment to feature parity and a migration path is crucial for existing Assistants API users.

For Chat Completions users, the message is one of continued support but gentle encouragement towards the Responses API for applications that might benefit from built-in tools or more complex interactions, positioning it as the more capable and forward-looking option.

The introduction of SDK helpers (`output_text`) and improved streaming indicates a focus on developer experience, addressing pain points likely identified during the Assistants API beta.

---

Alright, let's dive into the capabilities OpenAI is providing to allow agents built on the Responses API to interact with the external world and user-specific data.

**Empowering Agents: Integrated Tools within the Responses API**

A core limitation of traditional Large Language Models (LLMs) is their confinement to the data they were trained on and the immediate context of the conversation. To build truly useful agents capable of accomplishing real-world tasks, they need mechanisms to access *current* information, interact with *specific user data*, and even *operate* digital interfaces. OpenAI addresses this directly by integrating powerful **built-in tools** into the Responses API.

This integration is a significant philosophical and practical step. Instead of requiring developers to find, vet, integrate, and manage separate services for web search, document retrieval (Retrieval-Augmented Generation - RAG), or UI automation, OpenAI is providing these as first-party capabilities callable directly within a single `client.responses.create` call. This dramatically lowers the barrier to entry for building sophisticated agents and ensures tighter integration with OpenAI's models and platform features like tracing and evaluation.

The documentation highlights three key built-in tools initially launching with the Responses API: Web Search, File Search, and Computer Use. Let's examine each in detail.

**1. Web Search: Connecting Agents to the Live Internet**

The inability of LLMs to access real-time information has been a major constraint. The built-in Web Search tool directly tackles this.

* **Purpose:** To provide agents with fast, up-to-date answers from the web, complete with citations for verification and further exploration. This is crucial for any task requiring current events, real-time data, or information beyond the model's training cut-off.

* **Integration (Responses API):** Enabling web search is straightforward via the `tools` parameter in the `responses.create` call.

* **Python:**
```python
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
model="gpt-4o", # Also supports gpt-4o-mini
tools=[{"type": "web_search_preview"}],
input="What significant scientific discovery was announced this week?"
)

print(response.output_text)
# You can also inspect response.output for structured tool call/result info if needed
```

* **JavaScript:**
```javascript
import OpenAI from "openai";
const client = new OpenAI();

const response = await client.responses.create({
model: "gpt-4o", // Also supports gpt-4o-mini
tools: [ { type: "web_search_preview" } ],
input: "What significant scientific discovery was announced this week?",
});

console.log(response.output_text);
// console.log(response.output); // For structured details
```
* **cURL:**
```bash
curl "https://api.openai.com/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"tools": [{"type": "web_search_preview"}],
"input": "What significant scientific discovery was announced this week?"
}'
```

* **Key Features & Capabilities:**
* **Timeliness:** Provides access to current information, unlike static model knowledge.
* **Citations:** Responses include links to sources (news articles, blogs), allowing users to verify information and delve deeper. This enhances trust and provides attribution.
* **Performance:** Powered by the same models used for ChatGPT search. Benchmarked on SimpleQA (accuracy on short, factual questions), `gpt-4o-search-preview` scored 90% and `gpt-4o-mini-search-preview` scored 88%, significantly outperforming non-search-enabled models listed (GPT-4.5 at 63%, GPT-4o at 38%, etc. – *Note: The GPT-4o score of 38% seems low compared to its general capabilities and likely refers to its performance *without* the integrated search tool on this specific benchmark*).
* **Synergy:** Can be used in conjunction with other tools (like File Search or function calling) within the same Responses API call.
* **Publisher Control:** Websites and publishers can control whether their content appears in API search results (via the linked `appear` documentation).

* **Use Cases:**
* **General Knowledge:** Answering questions about current events, latest developments.
* **Research Agents:** Gathering timely information for reports or analysis (e.g., Hebbia uses it for market intelligence in finance and law).
* **Shopping Assistants:** Finding current product availability, prices, reviews.
* **Travel Booking:** Checking flight schedules, hotel availability, travel advisories.

* **Alternative Access (Chat Completions):** For developers who prefer the Chat Completions API or have existing integrations, OpenAI also provides direct access to the fine-tuned search models via specific model IDs: `gpt-4o-search-preview` and `gpt-4o-mini-search-preview`. This offers flexibility but might lack the seamless integration benefits of using the tool within the Responses API.

* **Pricing (Dedicated Models):** Using the *dedicated search models* in Chat Completions has specific pricing: $30/thousand queries for `gpt-4o-search-preview` and $25/thousand queries for `gpt-4o-mini-search-preview`. Using the `web_search_preview` *tool* within the Responses API bills standard token rates plus likely a per-search tool usage fee (the documentation implies standard rates apply to tokens/tools, but check the pricing page for specifics on the tool fee itself).

* **Availability:** The `web_search_preview` tool and the dedicated search models were available in **preview** as of the announcement.

* **Analysis:** Integrating web search directly addresses a fundamental LLM limitation. The inclusion of citations is critical for building trust and allowing verification, moving away from purely generative, potentially hallucinatory responses for factual queries. The strong benchmark performance suggests the underlying search and synthesis capabilities are robust. Offering it both as a tool in Responses and as dedicated models in Chat Completions provides flexibility, though the Responses API integration seems strategically preferred by OpenAI for agentic workflows.

**2. File Search: Enabling Retrieval-Augmented Generation (RAG)**

Agents often need to access specific, private, or domain-specific knowledge contained in documents. The File Search tool provides a powerful, integrated RAG capability.

* **Purpose:** To allow agents to efficiently and accurately retrieve relevant information from large volumes of documents provided by the developer. This enables grounding agent responses in specific knowledge bases.

* **Integration (Responses API & Vector Stores):** File Search relies on OpenAI's **Vector Stores** API. The workflow involves:
1. **Uploading Files:** Using the standard Files API (`/v1/files`).
2. **Creating a Vector Store:** Grouping relevant file IDs into a named vector store (`client.vectorStores.create`). This step likely involves background processing by OpenAI to chunk and embed the file contents.
3. **Invoking File Search:** Using the `file_search` tool type within the `tools` parameter of `client.responses.create`, specifying the `vector_store_ids` to search within.

* **Python Example:**
```python
from openai import OpenAI
client = OpenAI()

# Assume file1, file2, file3 are File objects from client.files.create()
# 1. Create a Vector Store
product_docs_vs = client.vector_stores.create(
name="Product Documentation",
file_ids=[file1.id, file2.id, file3.id]
# expires_after={"anchor": "last_active_at", "days": 7} # Optional expiration
)
print(f"Vector Store created: {product_docs_vs.id}")

# Wait for files to process (check status via client.vector_stores.files.retrieve)
# ... logic to poll vector_store_file status ...

# 2. Use File Search in Responses API
response = client.responses.create(
model="gpt-4o-mini", # Or other suitable models
tools=[{
"type": "file_search",
"vector_store_ids": [product_docs_vs.id],
}],
input="Summarize the key features of 'Project Phoenix' based on the docs.",
# include=["file_search_call.results"] # Optional: To get search results in the response object
)

print(response.output_text)
```
* **JavaScript Example:**
```javascript
import OpenAI from "openai";
const client = new OpenAI();

async function useFileSearch() {
// Assume file1, file2, file3 are File objects from client.files.create()
// 1. Create Vector Store
const productDocsVS = await client.vector_stores.create({
name: "Product Documentation",
file_ids: [file1.id, file2.id, file3.id],
// expires_after: {anchor: "last_active_at", days: 7} // Optional
});
console.log(`Vector Store created: ${productDocsVS.id}`);

// ... logic to poll vector_store_file status ...

// 2. Use File Search in Responses API
const response = await client.responses.create({
model: "gpt-4o-mini", // Or other models
tools: [{
type: "file_search",
vector_store_ids: [productDocsVS.id],
}],
input: "Summarize the key features of 'Project Phoenix' based on the docs.",
// include: ["file_search_call.results"] // Optional
});

console.log(response.output_text);
}

useFileSearch();
```

* **Key Features & Capabilities:**
* **Managed RAG:** Simplifies setting up a powerful RAG pipeline without requiring developers to manage chunking, embedding, vector databases, and retrieval logic themselves.
* **Optimization:** Includes built-in query optimization and reranking for better accuracy and relevance, reducing the need for manual tuning.
* **Scalability:** Handles large volumes of documents.
* **Flexibility:** Supports multiple file types, metadata filtering for targeted searches, and custom reranking (details likely in deeper docs).
* **Persistence:** Vector Stores provide a persistent knowledge base, unlike ephemeral file uploads sometimes used with Code Interpreter.
* **Direct Querying:** A new `/v1/vector_stores/{vector_store_id}/search` endpoint allows direct querying of vector stores outside the Responses/Assistants APIs for other use cases.

* **Use Cases:**
* **Customer Support:** Agents accessing FAQs, knowledge base articles, company policies (e.g., Navan uses it for travel policy lookups).
* **Legal Assistance:** Referencing past cases or legal documents.
* **Technical Support:** Querying technical documentation for coding agents.
* **Personalized Information:** Creating vector stores per user or group to tailor responses based on specific account settings or data.

* **Relation to Assistants API:** This tool is an evolution of the "Retrieval" tool in the Assistants API and continues to be available there, providing a bridge for existing users. The underlying Vector Store objects are likely shared.

* **Pricing:** Usage is priced per query ($2.50 per thousand queries) *plus* storage costs for the vector stores ($0.10/GB/day, with the first GB free). This incentivizes efficient querying and potentially managing vector store lifecycles (using `expires_after` or manual deletion).

* **Availability:** Available to **all developers** in the Responses API (and Assistants API).

* **Analysis:** File Search is a cornerstone of making agents knowledgeable about specific, private data. By managing the complexities of RAG internally, OpenAI significantly lowers the barrier for developers. The combination of optimized retrieval, persistence via Vector Stores, and integration with the Responses API makes it a powerful alternative to building and maintaining custom RAG solutions. The per-query pricing encourages developers to design interactions efficiently, while the storage cost reflects the value of persistent, indexed knowledge. The addition of a direct search endpoint further increases the utility of Vector Stores as a platform feature.

**3. Computer Use: Bridging the Gap to UI Automation**

Perhaps the most ambitious tool, Computer Use aims to give agents the ability to directly interact with graphical user interfaces, starting with web browsers.

* **Purpose:** To automate tasks on a computer by allowing the agent to perceive the screen and execute mouse and keyboard actions. This opens up possibilities for interacting with systems that lack APIs or require UI navigation.

* **Underlying Model:** Powered by the **Computer-Using Agent (CUA)** model, the same one enabling OpenAI's "Operator" concept. This model is specifically trained for UI interaction tasks.

* **Integration (Responses API):** Used via the `computer_use_preview` tool type and requires specifying the `computer-use-preview` model. Developers need to provide environment context (e.g., `browser`) and display dimensions. The API call doesn't *execute* the actions but returns the sequence of intended actions generated by the model. The developer's application is responsible for translating these actions into actual OS/browser commands.

* **Python Example (Conceptual - Runner logic needed separately):**
```python
# NOTE: This only gets the *intended* actions from the API.
# Executing these actions requires a separate component (e.g., using Playwright, Selenium, OS automation libs).
# See the sample application linked in the docs for a full implementation.
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
model="computer-use-preview",
tools=[{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "browser", # Or potentially 'os' in the future
}],
# Truncation 'auto' is required for CUA model
truncation="auto",
input="Find the top 3 news articles about renewable energy on Bing and summarize them."
)

# response.output would contain the sequence of actions (click, type, scroll etc.)
print(response.output)
# Example conceptual output structure (actual format may vary):
# [
# {'type': 'computer_action', 'action': {'type': 'type', 'text': 'renewable energy news'}},
# {'type': 'computer_action', 'action': {'type': 'keypress', 'key': 'Enter'}},
# {'type': 'computer_action', 'action': {'type': 'wait', 'duration_ms': 1000}},
# {'type': 'computer_action', 'action': {'type': 'click', 'x': 150, 'y': 250}},
# ... etc ...
# ]
```
* **JavaScript Example (Conceptual):**
```javascript
// NOTE: Execution logic is separate. See sample application.
import OpenAI from "openai";
const client = new OpenAI();

async function getComputerActions() {
try {
const response = await client.responses.create({
model: "computer-use-preview",
tools: [{
type: "computer_use_preview",
display_width: 1024,
display_height: 768,
environment: "browser",
}],
truncation: "auto", // Required for CUA
input: "Find the top 3 news articles about renewable energy on Bing and summarize them.",
});

// response.output contains the action sequence
console.log(response.output);

} catch (error) {
console.error("An error occurred:", error);
}
}
getComputerActions();
```

* **Key Features & Capabilities:**
* **UI Understanding:** Leverages the CUA model's ability to interpret visual interfaces.
* **Action Generation:** Outputs concrete mouse/keyboard actions (click, type, scroll, keypress, drag).
* **Benchmark Performance:** Achieved state-of-the-art (SOTA) at the time of release on:
* OSWorld (Full computer use): 38.1% (vs. 22.0% previous SOTA) - *Note: Still far from reliable for general OS tasks.*
* WebArena (Browser use): 58.1% (vs. 36.2% / 57.1% previous SOTAs) - *Significantly better but not perfect.*
* WebVoyager (Web browsing): 87.0% (matching previous SOTA) - *Very capable in web navigation tasks.*
* **Environment Context:** Can be tailored for specific environments like `browser`.

* **Use Cases:**
* **Automating Web Workflows:** QA testing, data scraping from sites without APIs.
* **Legacy System Interaction:** Performing data entry or operations in older applications lacking APIs (e.g., Luminai using it for application processing in legacy systems).
* **Information Gathering:** Accessing data only available through web interfaces (e.g., Unify checking online maps for business real estate changes).

* **Availability:** Released as a **Research Preview** available only to select developers in **usage tiers 3-5**. This restricted access reflects the early stage and potential risks.

* **Pricing:** Priced based on tokens: $3/1M input tokens and $12/1M output tokens. The higher output cost likely reflects the complexity of generating detailed action sequences.

* **Critical Safety Considerations:** OpenAI explicitly details extensive safety work due to the inherent risks of letting AI control a computer:
* **Testing:** Rigorous testing and red teaming for Operator (CUA's origin) and *additional* evaluations specifically for the API risks, including local OS interactions.
* **Risk Areas:** Misuse, model errors (hallucinated actions), and frontier risks (unforeseen consequences of advanced capabilities).
* **Developer Mitigations:**
* Safety checks against prompt injection.
* *Recommendation* for confirmation prompts for sensitive tasks (implemented by the developer).
* Tools/guidance to help developers isolate execution environments (sandboxing).
* Enhanced detection of potential policy violations.
* **Acknowledged Limitations:** Performance on OSWorld (38.1%) clearly indicates the model is **not yet highly reliable for general OS automation**.
* **Human Oversight:** Strongly recommended, especially in non-browser environments.
* **System Card:** Updated system card provides more details on API-specific safety work.

* **Analysis:** The Computer Use tool is arguably the most "agentic" capability announced, moving beyond information retrieval/generation to direct action in digital environments. Its potential is immense, blurring the lines with RPA but offering potentially greater flexibility due to the LLM's understanding. However, the risks are equally significant. OpenAI's cautious rollout (Research Preview, tiered access) and explicit discussion of safety measures and benchmark limitations are crucial. The 38.1% OSWorld score is a stark reminder that reliable, general-purpose OS automation via this tool is still a research challenge. For now, browser-based automation seems the most viable and tested path. Developers using this tool bear significant responsibility for implementing robust safety checks and maintaining human oversight.

**4. Synthesis: Tools as Enablers of Agentic Behavior**

The integrated Web Search, File Search, and Computer Use tools are not just disparate features; they are fundamental components enabling the vision of agents that can perceive, reason, and *act*.

* **Addressing Core Needs:** They directly tackle the need for agents to access external, timely information (Web Search), leverage specific knowledge bases (File Search), and interact with the digital world beyond APIs (Computer Use).
* **Simplifying Development:** By integrating these complex capabilities directly into the Responses API, OpenAI removes significant engineering overhead from developers, allowing them to focus on the agent's core logic and goals rather than on building and maintaining infrastructure for search, RAG, or UI automation.
* **Synergy within Responses API:** The ability to potentially combine these tools (and custom function calls) within a single, coherent API interaction framework is powerful. An agent could search the web for recent news, cross-reference it with internal documents via File Search, and then use Computer Use to update a record in a legacy system, all potentially orchestrated through Responses API interactions.
* **Platform Cohesion:** These tools leverage other parts of the OpenAI platform (Files API, Vector Stores API), creating a more cohesive ecosystem. Features like tracing further enhance the developer experience by providing visibility into how these tools are used within an agent's workflow.
* **Future Foundation:** These tools lay the groundwork for even more sophisticated agent capabilities. As models improve in planning and reasoning, having robust, integrated tools for interacting with information and environments will be essential.

The built-in tools are a critical pillar of OpenAI's agent strategy. They equip the Responses API with the necessary senses (search) and effectors (computer use) to move beyond simple text generation towards complex task completion, albeit with appropriate caution and controls, especially for capabilities like Computer Use.

---

Now we discuss the orchestration layer provided by OpenAI for building these agentic systems: the **OpenAI Agents SDK**.

**The Orchestration Challenge: Moving Beyond Single Calls**

While the Responses API provides a powerful and unified interface for interacting with models and their built-in tools, building complex agents often requires more than just sequential API calls. Consider these scenarios:

* A customer support system needs to route queries to specialized agents (billing, technical, returns).
* A research agent needs to perform web searches, synthesize findings, search internal documents, and finally generate a structured report.
* A workflow requires input validation before proceeding, or output checks before presenting results to a user.
* An agent needs to call a custom function (e.g., update a CRM), process the result, and then decide its next step based on that outcome.

Handling these multi-step, potentially multi-agent processes manually involves significant custom code for state management, decision logic, error handling, tool result parsing, and managing the flow of control. This is precisely the complexity the **OpenAI Agents SDK** aims to address.

**Introducing the OpenAI Agents SDK: Simplifying Agentic Workflows**

The Agents SDK is presented as an open-source Python library (with Node.js support planned) specifically designed to simplify the development, deployment, and monitoring of agentic applications. It acts as the orchestration layer, sitting on top of the underlying API calls (primarily Responses API, but compatible with Chat Completions).

* **Core Purpose:** To provide developers with a structured yet flexible framework for defining how agents interact with each other, use tools, handle safety checks, and execute complex task sequences.
* **Evolution from Swarm:** It's explicitly mentioned as a production-ready upgrade to OpenAI's previous experimental "Swarm" SDK. This suggests lessons were learned from Swarm's adoption regarding usability, features, and robustness, leading to the design of the Agents SDK.
* **Design Philosophy:**
* **Minimalism:** Focuses on a small set of core, powerful primitives rather than a vast array of complex abstractions. The learning curve is intended to be gentle.
* **Pythonic:** Leverages standard Python features (functions, classes, async/await) for orchestration, making it feel natural for Python developers.
* **Customizable:** Provides sensible defaults and built-in logic (like the agent loop) but allows developers to override or customize behavior where needed.
* **Key Goal:** Reduce the boilerplate and custom logic required for common agentic patterns, allowing developers to focus on the unique aspects of their application.

**Core Primitives: The Building Blocks of the SDK**

The SDK is built around three fundamental concepts:

1. **Agents (`agents.Agent`):**
* **Definition:** This class represents the core intelligent unit within the SDK. It encapsulates an LLM (like `gpt-4o`, `o3-mini`, or even custom models via providers) configured with specific `instructions` (system prompt), a set of `tools` it can use, and potential `handoffs` to other agents.
* **Configuration:**
* `name`: A human-readable identifier for the agent (used in tracing, handoff tool names).
* `instructions`: Static string or a dynamic function `(RunContextWrapper, Agent) -> str` defining the agent's persona, goals, and constraints. Dynamic instructions allow tailoring behavior based on runtime context.
* `model`: Specifies the underlying LLM (e.g., `"gpt-4o"`) or a specific `Model` instance (like `OpenAIResponsesModel`, `OpenAIChatCompletionsModel`, `LitellmModel`).
* `model_settings`: Fine-tunes LLM parameters like `temperature`, `top_p`, `tool_choice`, `max_tokens`.
* `tools`: A list of `Tool` objects (FunctionTool, WebSearchTool, FileSearchTool, ComputerTool) the agent can invoke. Function tools can be easily created from Python functions using the `@function_tool` decorator.
* `handoffs`: A list of other `Agent` instances or `Handoff` configuration objects this agent can delegate tasks to.
* `output_type`: Defines the expected structure of the agent's final output (e.g., a Pydantic model for structured data extraction). If omitted, the output is `str`.
* `input_guardrails`/`output_guardrails`: Lists of guardrail functions to apply.
* `hooks`: An `AgentHooks` implementation for lifecycle callbacks specific to this agent.
* **Context Genericity (`Agent[TContext]`):** Agents can be typed with a specific context class, ensuring that tools, hooks, and dynamic instructions associated with that agent receive the correctly typed context object during execution.
* **Example Definition:**
```python
from agents import Agent, function_tool
from pydantic import BaseModel

@function_tool
def get_order_status(order_id: str) -> str:
"""Fetches the status for a given order ID."""
# ... actual lookup logic ...
print(f"[Tool] Fetching status for order {order_id}")
return f"Order {order_id} is shipped."

class RefundInfo(BaseModel):
order_id: str
reason: str
approved: bool

support_agent = Agent(
name="General Support",
instructions="Help the user with their support query. Use tools if necessary. If they ask for a refund, handoff to the Refund Agent.",
model="gpt-4o-mini",
tools=[get_order_status],
# handoffs=[refund_agent] # Assuming refund_agent is defined elsewhere
output_type=str # Default output is just text
)

# refund_agent could be defined with output_type=RefundInfo
```

2. **Handoffs (`agents.handoff`, `agents.Handoff`):**
* **Purpose:** The primary mechanism for controlled delegation between agents. This enables building modular systems where specialized agents handle specific sub-tasks.
* **Mechanism:** Under the hood, handoffs are presented to the source LLM as special function tools (e.g., `transfer_to_Refund_Agent`). When the LLM decides to call this "tool," the SDK intercepts it and transfers control to the target agent defined in the handoff configuration.
* **Configuration:** Defined in the `handoffs` list of the *source* agent. The `handoff()` helper function simplifies creation:
* `agent`: The target `Agent` instance.
* `tool_name_override`/`tool_description_override`: Customize how the handoff appears as a tool to the source LLM.
* `on_handoff`: An async callback triggered *when the handoff is initiated* (before the target agent runs), useful for logging or pre-fetching data. Can optionally receive arguments from the LLM if `input_type` is set.
* `input_type`: A Pydantic model or similar type defining arguments the source LLM should provide when initiating the handoff (e.g., `reason` for escalation).
* `input_filter`: A crucial function `(HandoffInputData) -> HandoffInputData` that modifies the conversation history seen by the *target* agent. By default, the target agent sees the full history. Filters can be used to shorten context, remove tool calls, or redact information. Common filters are provided in `agents.extensions.handoff_filters` (e.g., `remove_all_tools`).
* **Control Flow:** When a handoff occurs, the target agent essentially takes over the conversation flow, becoming the new "current agent" within the `Runner`'s loop.
* **Example (Triage Flow):**
```python
from agents import Agent, handoff, Runner
from agents.extensions.handoff_filters import remove_all_tools

# Define specialist agents
billing_agent = Agent(name="Billing Specialist", instructions="Handle billing questions.")
tech_agent = Agent(name="Technical Support", instructions="Handle technical issues.")

# Define Triage Agent with handoffs
triage_agent = Agent(
name="Support Triage",
instructions="Determine if the user needs Billing or Technical support and handoff appropriately. Ask clarifying questions if needed.",
model="gpt-4o-mini", # Cheaper model for triage
handoffs=[
handoff(billing_agent, tool_description="Transfer to billing for payment issues."),
handoff(
agent=tech_agent,
tool_description="Transfer to tech support for product errors.",
input_filter=remove_all_tools # Tech agent doesn't need prior tool calls
)
]
)

# --- Running the workflow ---
# result = await Runner.run(triage_agent, "My payment failed.")
# # Expect Billing Specialist to handle the final response
# print(result.last_agent.name) # Should be "Billing Specialist"

# result = await Runner.run(triage_agent, "I'm seeing error code 500.")
# # Expect Technical Support to handle the final response
# print(result.last_agent.name) # Should be "Technical Support"
```

3. **Guardrails (`@input_guardrail`, `@output_guardrail`, `GuardrailFunctionOutput`):**
* **Purpose:** Enforce safety, relevance, and policy constraints on agent interactions. They run *concurrently* with the main agent logic.
* **Input Guardrails:** Applied to the *initial* input provided to the *first* agent in a run. Ideal for quickly rejecting off-topic, malicious, or policy-violating requests before expensive processing occurs.
* **Output Guardrails:** Applied to the *final* output generated by the *last* agent in a run. Useful for checking response quality, safety, or adherence to specific formatting before presenting it to the user.
* **Mechanism:** Developers define an async function decorated with `@input_guardrail` or `@output_guardrail`. This function receives the `RunContextWrapper`, the `Agent`, and the relevant data (input items for input guardrails, agent output for output guardrails). It must return a `GuardrailFunctionOutput` object.
* **`GuardrailFunctionOutput`:** Contains:
* `output_info`: Any data the guardrail function wants to record (e.g., analysis results, scores). This is stored in the `RunResult`.
* `tripwire_triggered`: A boolean. If `True`, the SDK immediately halts the agent run and raises either `InputGuardrailTripwireTriggered` or `OutputGuardrailTripwireTriggered`.
* **Latency Benefit:** Because guardrails (especially input ones) can run concurrently using potentially faster/cheaper models, they can trigger the tripwire and stop the main agent's execution early, saving significant time and cost if the input is invalid.
* **Example (Input Guardrail):**
```python
from agents import (Agent, InputGuardrail, GuardrailFunctionOutput,
RunContextWrapper, Runner, input_guardrail,
InputGuardrailTripwireTriggered)
from pydantic import BaseModel

class RelevanceCheck(BaseModel):
is_relevant: bool
topic: str

# A simple agent used *by* the guardrail
relevance_checker_agent = Agent(
name="Relevance Checker",
instructions="Is the following user query relevant to company products or support? Respond with JSON.",
model="gpt-4o-mini", # Fast model
output_type=RelevanceCheck
)

@input_guardrail(name="Relevance Guardrail")
async def check_relevance(ctx: RunContextWrapper, agent: Agent, input_data) -> GuardrailFunctionOutput:
print("[Guardrail] Checking relevance...")
try:
# Run the checker agent
checker_result = await Runner.run(relevance_checker_agent, input_data, context=ctx.context)
relevance_info = checker_result.final_output_as(RelevanceCheck)
print(f"[Guardrail] Relevance check result: {relevance_info}")
# Trip the wire if not relevant
return GuardrailFunctionOutput(
output_info=relevance_info,
tripwire_triggered=(not relevance_info.is_relevant)
)
except Exception as e:
print(f"[Guardrail] Error during relevance check: {e}")
# Fail open (don't trip wire) if checker fails, or handle differently
return GuardrailFunctionOutput(output_info={"error": str(e)}, tripwire_triggered=False)

# Main agent - potentially slower/more expensive
main_support_agent = Agent(
name="Product Support Agent",
instructions="Provide detailed support about our products.",
model="gpt-4o",
input_guardrails=[check_relevance] # Apply the guardrail
)

# --- Running the workflow ---
# try:
# result = await Runner.run(main_support_agent, "Tell me about the weather in London.")
# print("Main agent output:", result.final_output)
# except InputGuardrailTripwireTriggered as e:
# print(f"Guardrail tripped! Info: {e.guardrail_result.output.output_info}")
# # Expected: Guardrail trips

# try:
# result = await Runner.run(main_support_agent, "How do I reset my Model X widget?")
# print("Main agent output:", result.final_output)
# except InputGuardrailTripwireTriggered as e:
# print(f"Guardrail tripped! Info: {e.guardrail_result.output.output_info}")
# # Expected: Guardrail passes, main agent responds
```

**Executing Workflows: The `Runner` Class**

The `agents.Runner` class is the entry point for executing workflows defined using Agents, Tools, and Handoffs.

* **Core Methods:**
* `run(starting_agent, input, *, context=None, max_turns=10, run_config=None)`: Asynchronously executes the workflow starting with `starting_agent` and the initial `input`. Returns a `RunResult` upon completion.
* `run_sync(...)`: A synchronous wrapper for `run()`. Convenient for simple scripts but `run()` is preferred for applications integrating with async frameworks.
* `run_streamed(starting_agent, input, *, context=None, max_turns=10, run_config=None)`: Asynchronously executes the workflow and returns a `RunResultStreaming` object *immediately*. This object allows iterating through `StreamEvent`s as they occur (`async for event in result.stream_events():`).
* **The Agent Loop (Internal Logic):** The `Runner` manages the turn-based execution:
1. Takes the current agent and input items.
2. Invokes the agent's underlying `Model` (e.g., calls the Responses API).
3. Processes the model's output:
* If it's a final text/structured output meeting the `output_type` criteria and has no pending tool calls/handoffs, the run completes, and the `RunResult` is populated.
* If it's a handoff tool call, resolves the target agent, applies input filters, updates the current agent, and loops back to step 1 with the updated state.
* If it's one or more tool calls (function, web search, etc.), executes the tools (calling Python functions, querying Vector Stores, etc.), appends the tool results as new input items, and loops back to step 1.
* If `max_turns` is exceeded, raises `MaxTurnsExceeded`.
* **State Management:** The `Runner` implicitly manages the conversation state (the list of input/output items) between turns within a single `.run()` call. For multi-turn conversations *across* separate `.run()` calls, the developer uses `result.to_input_list()` to pass the history.
* **Context:** The `context` object provided to `run()` is passed down through the `RunContextWrapper` to all relevant components (dynamic instructions, tool functions, hooks, guardrails).

**Connecting SDK and API**

It's vital to understand how the SDK uses the underlying OpenAI APIs:

* Each time an agent needs to "think" (generate text, decide on a tool/handoff, process tool results), the `Runner` orchestrates a call to the configured `Model` implementation (e.g., `OpenAIResponsesModel`).
* This `Model` implementation then makes the actual HTTP request to the corresponding OpenAI API endpoint (e.g., `/v1/responses` or `/v1/chat/completions`).
* The SDK parses the API response (or stream events) and determines the next step in the orchestration loop (run tool, handoff, finish).
* SDK constructs like `@function_tool` translate Python function signatures into the JSON schema format required by the `tools` parameter of the API.
* Built-in SDK tools like `WebSearchTool` likely map directly to the corresponding built-in tool types (`web_search_preview`) when using the Responses API via `OpenAIResponsesModel`.

**Analysis & Synthesis (Part 3):**

The Agents SDK is the crucial **client-side orchestration framework** that complements the server-side Responses API. While the API provides the core model interaction and built-in tool execution, the SDK provides the structure and logic to *manage complex workflows involving these components*.

Its core primitives (Agent, Handoff, Guardrail) offer a powerful yet conceptually simple way to design multi-step and multi-agent systems. The emphasis on Pythonic design lowers the adoption barrier for developers familiar with the language.

**Handoffs** are particularly noteworthy. They formalize the pattern of specialized agents, enabling cleaner, more maintainable designs compared to monolithic agents trying to do everything. The `input_filter` provides essential control over context propagation between agents.

**Guardrails** address a critical need for safety and control in agentic systems, and their parallel execution model is key for maintaining acceptable latency. The tripwire mechanism allows for efficient rejection of invalid inputs/outputs.

The `Runner` hides much of the complexity of the underlying execution loop, tool invocation, and state management within a single run. The explicit handling of multi-turn conversations via `to_input_list()` maintains a clear separation between turns initiated by the user and internal steps within a single turn managed by the SDK.

The tight integration with OpenAI's APIs (especially Responses) and platform features like Tracing makes the SDK a compelling choice for developers building agents within the OpenAI ecosystem. Its open-source nature and compatibility promise with other model providers (via LiteLLM or custom integrations) offer flexibility.

Essentially, the SDK provides the "control plane" for agentic applications, defining *how* agents interact and execute tasks, while the Responses API provides the "execution plane" where the core intelligence and tool actions happen. Together, they form the foundation of OpenAI's platform for building reliable and sophisticated agents.

---

Okay, let's get into the **Responses API**, examining its structure, parameters, and how it handles inputs, outputs, state, and streaming, forming the bedrock upon which agentic capabilities are built according to the provided documentation.

**The Responses API (`/v1/responses`): Engine for Agentic Interaction**

The documentation positions the Responses API as OpenAI's premier interface for generating model responses, specifically designed with agentic applications in mind. It aims to be a versatile and powerful endpoint that unifies the strengths of previous APIs while introducing new capabilities and improving developer experience. It serves as the direct communication channel between your application and the OpenAI models when utilizing the integrated tool ecosystem and advanced features.

**1. Core Request Structure (`POST /v1/responses`)**

Understanding the request body parameters is crucial to leveraging the API's full potential:

* **`model` (Required, string):** Specifies the AI engine (e.g., `gpt-4o`, `o3-mini`, `gpt-4.1`). The choice of model dictates capabilities (like reasoning, tool support, multimodality) and performance characteristics (latency, cost). The documentation emphasizes that different models, especially newer "o-series" reasoning models, might have varying parameter support.
* **`input` (Required, string or array):** This is the primary payload containing the context for the model. Its flexibility is key:
* **Simple Text:** A single string, similar to a basic prompt in Chat Completions.
```json
"input": "Explain the concept of photosynthesis."
```
* **Conversation History (Array of Messages):** An array of message objects (with `role` and `content`) simulates a conversation turn, akin to the `messages` array in Chat Completions. The `role` can be `user`, `assistant`, or `tool` (for providing tool results).
```json
"input": [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is its population?"}
]
```
* **Multimodal Content:** The `content` within a message object can itself be an array containing different types, most notably `input_text` and `input_image`.
```json
"input": [
{"role": "user", "content": "Describe this image."},
{
"role": "user",
"content": [
{"type": "input_image", "image_url": "https://.../image.jpg"}
]
}
]
```
* **Tool Results:** When providing the results of a tool call made in a previous turn, you include a message with `role: "tool"` and a `content` array containing `tool_result` objects.
```json
"input": [
// ... previous conversation ...
{"role": "assistant", "content": [{"type": "tool_call", "id": "call_123", "function": {"name": "get_weather", "arguments": "{\"city\": \"London\"}"}}]},
{
"role": "tool",
"content": [
{"type": "tool_result", "tool_call_id": "call_123", "result": "{\"temperature\": 15, \"condition\": \"Cloudy\"}"}
]
}
]
```
* **File Inputs:** The documentation mentions "File inputs" as a possibility here, likely integrating with uploaded files for specific tools or context, although the exact structure isn't detailed in the initial overview sections provided (likely involves referencing File IDs, possibly tied to tools like `file_search` or future Assistants-like features).

* **`tools` (Optional, array):** Defines the tools the model *can* use. This is where you enable built-in tools or define custom functions.
* **Built-in:** `{"type": "web_search_preview"}`, `{"type": "file_search", "vector_store_ids": ["vs_..."]}`, `{"type": "computer_use_preview", ...}`.
* **Function Calling (Custom):** Requires defining the function's schema.
```json
"tools": [
{"type": "web_search_preview"}, // Enable web search
{ // Define a custom function
"type": "function",
"function": {
"name": "update_crm_record",
"description": "Updates a customer record in the CRM",
"parameters": {
"type": "object",
"properties": {
"customer_id": {"type": "string", "description": "The unique ID of the customer"},
"update_data": {"type": "object", "description": "Key-value pairs of fields to update"}
},
"required": ["customer_id", "update_data"]
}
}
}
]
```

* **`tool_choice` (Optional, string or object):** Explicitly controls *how* or *which* tool the model should use.
* `"auto"` (Default if tools are present): Model decides whether to use a tool and which one(s).
* `"none"` (Default if no tools): Model will not use any tool.
* `"required"`: Model *must* call one or more tools.
* `{"type": "function", "function": {"name": "my_function"}}`: Forces the model to call the specified function.
* `{"type": "web_search_preview"}`: Forces the model to use the web search tool.

* **`previous_response_id` (Optional, string or null):** The cornerstone of state management in the Responses API. By providing the `id` of the *immediately preceding* response object in the conversation, you instruct the API to consider the history leading up to and including that response when generating the new one. This allows the API to manage the conversation context window implicitly. If omitted, the conversation starts fresh based only on the current `input`.

* **`instructions` (Optional, string or null):** A system message prepended to the model's context. Crucially, when used with `previous_response_id`, instructions from the *previous* response are *not* automatically carried over. This makes it easy to dynamically change the system prompt/persona between turns without it persisting unintentionally.

* **`max_output_tokens` (Optional, integer or null):** Limits the total tokens generated in the response, including reasoning and visible output. This is distinct from Chat Completions' deprecated `max_tokens` (which targeted only completion length) and the newer `max_completion_tokens`.

* **`stream` (Optional, boolean, default: `false`):** If `true`, the API streams back results using Server-Sent Events (SSE) as they are generated, enabling real-time updates.

* **`store` (Optional, boolean, default: `true`):** Determines if the generated Response object is saved by OpenAI for 30 days, accessible via the API (`GET /v1/responses/{response_id}`) and dashboard logs. Setting to `false` prevents storage. This is important for compliance and data privacy considerations, but disabling it prevents using stored data for platform features like Evals based on logs.

* **Standard LLM Parameters (`temperature`, `top_p`):** Control the randomness and determinism of the output, similar to Chat Completions. Recommended to adjust one or the other, not both. Defaults are `1.0`.

* **`truncation` (Optional, string, default: `"disabled"`):** How to handle exceeding the context window.
* `"disabled"`: Fails with a 400 error if the context limit is hit.
* `"auto"`: Allows the model to automatically truncate the conversation history (dropping items, likely from the middle) to fit within the context window. Essential for long conversations using `previous_response_id`.

* **`text` (Optional, object):** Configuration for text output, primarily for specifying `format` (e.g., `{ "type": "text" }` for plain text or `{ "type": "json_schema", "json_schema": {...} }` for structured JSON output).

* **`reasoning` (Optional, object, o-series models only):** Configuration specific to reasoning models (e.g., controlling reasoning `effort`).

* **`parallel_tool_calls` (Optional, boolean, default: `true`):** Allows the model to request multiple tool calls simultaneously if appropriate for the task.

* **`include` (Optional, array or null):** Requests additional data in the response object beyond the default, such as `file_search_call.results` to get the actual content chunks found by file search.

* **`metadata` (Optional, map):** Up to 16 key-value pairs for tagging objects, useful for organization and filtering in logs or API queries.

* **`service_tier` (Optional, string or null):** For customers with Scale Tier or Flex Processing options to specify latency/processing tier.

* **`user` (Optional, string):** A unique identifier for the end-user for abuse monitoring purposes.

**2. Input Handling and Multimodality**

The `input` parameter is designed for flexibility:

* **Simple Text:** For single prompts, just pass a string.
* **Conversational Turns:** Pass an array of message objects (`role`, `content`) to represent the history. The Responses API, especially when using `previous_response_id`, builds upon this history.
* **Image Input:** Include `{ "type": "input_image", "image_url": "..." }` within the `content` array of a user message. This allows agents to "see" and reason about images alongside text.
```python
# Python Example: Image Input
response = client.responses.create(
model="gpt-4o", # Must be a vision-capable model
input=[
{"role": "user", "content": "What landmarks are visible in this picture?"},
{
"role": "user",
"content": [
{"type": "input_image", "image_url": "https://.../cityscape.jpg"}
]
}
]
)
print(response.output_text)
```
* **Tool Results:** Provide results from previous function calls using the `role: "tool"` message structure, ensuring the model knows the outcome of its requested actions.

**3. Output Structure and Content**

The `output` field in the Response object is an array containing various types of items generated by the model during its turn:

* **`message` Item (`type: "message"`):** Represents the assistant's textual response to the user.
* `role`: Always `"assistant"`.
* `content`: An array, typically containing one item:
* `output_text` (`type: "output_text"`): Contains the actual text generated (`text` field) and any `annotations` (like `file_citation` or `web_search_citation`).
* `file_citation`: Links text spans to specific files used by File Search.
* `web_search_citation` (Implied): Links text spans to web sources used by Web Search.
* `output_refusal` (`type: "output_refusal"`): Indicates the model refused to generate the requested content (details not fully expanded in provided docs, but likely related to safety/policy).
* **Tool Call Items (`type: "tool_call"`):** Represents the model's request to use a tool. The specific structure depends on the tool:
* `function`: Contains `id`, `function: {"name": ..., "arguments": "..."}`. The arguments are a JSON string.
* `web_search`: Contains `id`, `web_search: {"query": "..."}` (structure might be richer).
* `file_search`: Contains `id`, `file_search: {"query": "...", "vector_store_ids": [...]}`.
* `computer_use`: Contains `id`, `computer_use: {"action": {...}}` detailing the specific mouse/keyboard action.
* **`reasoning` Item (`type: "reasoning"`, o-series models only):** Provides insights into the model's thought process (structure includes `summary`, potentially containing `summary_text`).

**`output_text` SDK Helper:** Because the primary text response is nested within `output[0].content[0].text.value` (typically), the Python and JS SDKs provide `response.output_text` to directly access the aggregated text content from all `output_text` parts in the `output` array, simplifying common usage.

**4. State Management: `previous_response_id` and `truncation`**

This is a key differentiator from the stateless Chat Completions API.

* **Mechanism:** Instead of the developer sending the entire conversation history in each request's `input` array, you send only the *new* input (e.g., the latest user message or tool result) along with the `id` of the *previous* Response object via `previous_response_id`.
* **Benefit:** The API backend uses this ID to retrieve the necessary context history, simplifying client-side logic. The developer only needs to store the ID of the last response.
* **Context Window Handling:** For long conversations that exceed the model's context window, the `truncation="auto"` strategy is crucial. It tells the API to intelligently drop older messages (likely from the middle, preserving the start and end) to make space for the new input and allow the conversation to continue. Without `"auto"`, the request would fail once the limit is reached.

* **Example Two-Turn Flow:**

**Turn 1: User asks a question**
```python
response1 = client.responses.create(
model="gpt-4o",
input="What is the Responses API?"
# store=True (default)
)
print("Assistant:", response1.output_text)
# Store the ID
last_response_id = response1.id
```

**Turn 2: User asks a follow-up, providing the previous ID**
```python
response2 = client.responses.create(
model="gpt-4o",
input="How does it compare to Chat Completions?", # Only the new message
previous_response_id=last_response_id # Link to the previous turn
)
print("Assistant:", response2.output_text)
# Update the ID for the next turn
last_response_id = response2.id
```
* **Interaction with `instructions`:** If you provide `instructions` in Turn 2 along with `previous_response_id`, those *new* instructions will be used, replacing any instructions implicitly associated with `response1`.

**5. Tool Use Cycle (Developer Responsibility)**

Based on the API reference accepting `tool_result` in the input and standard function calling patterns:

1. **Request:** Developer sends input to `/v1/responses`, specifying available `tools`.
2. **Tool Call Response:** If the model decides to use a custom function, the API response (`status: "completed"` or during streaming) will include a `tool_call` item in the `output` array (e.g., `{"type": "tool_call", "id": "call_abc", "function": {"name": "update_crm", "arguments": "{\"id\": 123}"}}`).
3. **Execution:** The developer's application receives this response, parses the `tool_call` item, identifies the function (`update_crm`) and its arguments (`{"id": 123}`), and executes the corresponding Python function/external API call.
4. **Result Submission:** The developer constructs a *new* request to `/v1/responses`.
* Sets `previous_response_id` to the ID of the response received in step 2.
* Includes a new item in the `input` array representing the tool's result:
```json
"input": [
{
"role": "tool",
"content": [
{"type": "tool_result", "tool_call_id": "call_abc", "result": "{\"success\": true, \"record_updated\": true}"}
]
}
]
```
5. **Continuation:** The model receives the tool result and generates the next part of the response (e.g., confirming the CRM update to the user).

*Note:* While the documentation *promises* a single API call can handle multiple tools/turns, this likely refers primarily to *built-in* tools (Web Search, File Search) or multiple internal reasoning steps the model takes *before* requiring developer intervention for a custom tool call. The fundamental loop for *custom* function calls still appears developer-managed, albeit facilitated by the `previous_response_id` mechanism. The Agents SDK further abstracts this loop.

**6. Streaming with Server-Sent Events (SSE)**

Setting `stream=true` fundamentally changes the response format from a single JSON object to a stream of SSE events.

* **Purpose:** Enables building responsive UIs by displaying text, tool usage indications, and other events as they happen, rather than waiting for the entire response generation to complete.
* **Mechanism:** The client maintains an open connection, and the server pushes events as chunks of data become available. Each event has an `event:` type name and a `data:` field containing a JSON payload. The stream terminates with `event: done\ndata: [DONE]\n\n`.
* **Key Event Types (Synthesized from API Reference):**
* `response.created`: Signals the start, provides the initial Response object (`status: "in_progress"`).
* `response.in_progress`: Can be emitted periodically while processing (status still `in_progress`).
* `response.output_item.added`: Indicates a new item (message, tool call) is starting in the `output` array.
* `response.content_part.added`: Indicates a new content part (like `output_text`) is starting within an item.
* `response.output_text.delta`: **Crucial for streaming text.** Contains a small chunk (`delta`) of the assistant's message text. Multiple deltas are concatenated by the client.
* `response.output_text.annotation.added`: Signals an annotation (like a citation) has been added to the text.
* `response.function_call_arguments.delta`: Streams the arguments for a function call token by token (as a JSON string).
* Tool-Specific Events: Indicate progress of built-in tools (e.g., `response.web_search_call.searching`, `response.file_search_call.completed`).
* `response.output_item.done`, `response.content_part.done`, `response.output_text.done`: Mark the completion of specific parts of the response structure.
* `response.completed`: Final event indicating successful completion, contains the full Response object (often without large data like image URLs unless requested via `include`).
* `response.failed`, `response.incomplete`: Indicate non-successful terminal states.
* `error`: Signals an error during streaming.

* **Handling Example (Conceptual Python):**
```python
async def stream_response():
stream = client.responses.create(
model="gpt-4o",
input="Tell me a story about a brave knight.",
stream=True
)
final_text = ""
async for event in stream:
if event.event == 'response.output_text.delta':
text_delta = event.data.delta
print(text_delta, end="", flush=True) # Print text as it arrives
final_text += text_delta
elif event.event == 'response.completed':
print("\n\n--- Stream Complete ---")
# Full response object available in event.data.response
# print("Usage:", event.data.response.usage)
elif event.event == 'error':
print(f"\nError: {event.data}")
break
# Handle other events like tool calls if needed
```
* **Analysis:** The streaming implementation appears more granular than Chat Completions streaming (which primarily sends `choices[0].delta.content`). This richer event set (`output_item.added`, `content_part.added`, tool events) provides developers with more detailed insight into the generation process, potentially enabling more sophisticated UI updates during the run.

**7. The Response Object**

The final (or streamed `response.completed`) object contains comprehensive details about the interaction:

* `id`: Unique ID for *this* response, used for `previous_response_id` in the next turn.
* `status`: Terminal state (`completed`, `failed`, `incomplete`).
* `output`: Array of items generated (messages, tool calls).
* `usage`: Detailed token counts (`input_tokens`, `output_tokens`, `total_tokens`, and breakdowns like `reasoning_tokens`, `cached_tokens`). Crucial for cost tracking.
* `model`: The specific model snapshot used (e.g., `gpt-4o-2024-08-06`).
* `error`: Details if `status` is `failed`.
* `incomplete_details`: Reason if `status` is `incomplete` (e.g., `max_tokens`).
* Other request parameters reflected (e.g., `temperature`, `tools`, `instructions`).

**8. Synthesis and Analysis (Part 4):**

The Responses API emerges as a carefully designed evolution, specifically tailored for the demands of agentic workflows. It provides:

* **Unified Interface:** Consolidates text generation, image understanding, tool usage (built-in and custom), state management, and streaming under a single endpoint.
* **Simplified State:** The `previous_response_id` mechanism offers a simpler way to manage conversation history compared to manually constructing message arrays for Chat Completions, without the full object model complexity of Assistants API Threads (though similar concepts are planned).
* **Integrated Tools:** Direct support for Web Search, File Search, and Computer Use drastically reduces integration effort for common agent needs.
* **Enhanced Streaming:** More granular SSE events offer better real-time feedback potential compared to basic Chat Completions streaming.
* **Flexibility:** Supports various input types (text, images, history, tool results) and output formats (text, JSON).
* **Observability Foundation:** The optional `store=true` and detailed response/usage objects facilitate integration with tracing and evaluation tools.
* **Clear Evolution Path:** Explicitly positioned as the successor to both Chat Completions (for new integrations) and the Assistants API (long-term).

While the *promise* of a single API call solving complex multi-tool, multi-turn tasks might initially suggest full automation, the current documentation points towards the API facilitating this loop for the developer (especially for custom tools), with the Agents SDK providing the primary client-side abstraction for managing it. The API provides the robust *primitives* and *state-linking mechanism* (`previous_response_id`), and the SDK provides the *orchestration*. This division of labor allows the API to remain relatively clean while the SDK handles the more complex flow control logic.

---

**The Need for Orchestration: Why the SDK Exists**

While the Responses API provides the engine for model interaction and tool use, building sophisticated agents requires managing the *flow* of these interactions. This involves:

1. **Decision Making:** Determining when to call a tool, which tool to call, when to hand off to another agent, or when to simply respond to the user.
2. **State Management:** Keeping track of the conversation history, tool call results, and the current step in a multi-step process.
3. **Tool Execution:** Calling the appropriate functions or services when the model requests a tool use and formatting the results correctly for the model.
4. **Control Flow:** Implementing loops (e.g., model calls tool -> get result -> model processes result -> repeat), conditional logic (e.g., route based on classification), and potentially parallel execution.
5. **Safety and Validation:** Applying checks and balances (guardrails) at appropriate points in the workflow.
6. **Debugging and Monitoring:** Understanding the sequence of events, inputs, and outputs when diagnosing issues or optimizing performance.

Implementing all this manually can become incredibly complex and error-prone. The Agents SDK aims to provide a structured, Pythonic way to handle this orchestration, significantly reducing boilerplate code and promoting best practices.

**Core Design Principles of the Agents SDK**

The SDK's design is driven by principles aimed at developer productivity and flexibility:

1. **Simplicity and Minimal Abstraction:** It focuses on a small set of core concepts (`Agent`, `Handoff`, `Guardrail`, `Runner`) rather than introducing numerous complex layers. The goal is a fast learning curve.
2. **Python-First:** Leverages native Python features like classes, functions, decorators (`@function_tool`, `@input_guardrail`), and async/await for defining agents and control flow. This makes it intuitive for Python developers.
3. **Extensibility and Customization:** While offering sensible defaults and built-in loops, it allows developers to customize behavior through hooks, dynamic functions (for instructions), context objects, and custom tool/model implementations.
4. **Integration with OpenAI Platform:** Designed to work seamlessly with OpenAI APIs (Responses, Chat Completions) and platform features like built-in tools, tracing, evaluations, and fine-tuning.

**Dissecting the SDK Primitives**

Let's delve deeper into the SDK's main components:

**1. `agents.Agent`:** The Intelligent Actor

This class is the central representation of an agent within the SDK. It's more than just a model configuration; it's an entity with instructions, capabilities (tools), and potential delegation paths (handoffs).

* **Key Configuration Parameters (SDK perspective):**
* `name` (str): Essential for identification in logs, traces, and default handoff tool naming.
* `instructions` (str | Callable): The core guidance. Using a callable function `(RunContextWrapper[TContext], Agent[TContext]) -> MaybeAwaitable[str]` allows dynamic system prompts based on runtime context (e.g., user ID, time of day).
```python
from agents import Agent, RunContextWrapper
from dataclasses import dataclass

@dataclass
class ChatContext:
user_name: str
user_tier: str = "free"

def dynamic_prompt(ctx: RunContextWrapper[ChatContext], agent: Agent) -> str:
base = f"You are a helpful assistant chatting with {ctx.context.user_name}. "
if ctx.context.user_tier == "pro":
base += "They are a pro user, prioritize their requests."
else:
base += "Offer concise help."
return base

pro_agent = Agent[ChatContext]( # Specify context type
name="Pro Support",
instructions=dynamic_prompt,
model="gpt-4o"
)
```
* `model` (str | Model | None): Can be a model name string (resolved by the `ModelProvider`), or a concrete `Model` instance (like `OpenAIResponsesModel` or a custom one). This allows mixing different models or even providers across agents in a workflow.
* `model_settings` (ModelSettings): Configures API parameters like `temperature`, `tool_choice`, `truncation`, etc., specific to this agent's calls.
* `tools` (list[Tool]): Defines the agent's capabilities beyond text generation. Includes `FunctionTool` (from `@function_tool` or manual creation), `WebSearchTool`, `FileSearchTool`, `ComputerTool`.
* `handoffs` (list[Agent | Handoff]): Specifies potential agents this agent can delegate to. The SDK automatically creates corresponding `transfer_to_` tools.
* `output_type` (type | AgentOutputSchemaBase | None): Crucial for structured data. Tells the SDK (and underlying model via `response_format`) the expected output shape (Pydantic models, dataclasses, etc.). Enables automatic validation.
```python
from pydantic import BaseModel

class ExtractedInfo(BaseModel):
name: str
email: str | None
company: str | None

extractor_agent = Agent(
name="Info Extractor",
instructions="Extract name, email, and company from the text. Respond ONLY with JSON.",
model="gpt-4o",
output_type=ExtractedInfo # Ensures output is validated against this Pydantic model
)
```
* `input_guardrails` / `output_guardrails` (list[InputGuardrail | OutputGuardrail]): Attaches guardrail checks.
* `hooks` (AgentHooks | None): Agent-specific lifecycle callbacks.
* `context` Type (`Agent[TContext]`): Using generics allows type safety when accessing the `context` object within associated functions (tools, hooks, dynamic instructions).

* **Cloning (`agent.clone(...)`):** Easily create variations of an agent, modifying only specific parameters. Useful for creating slightly different personas or configurations without redefining everything.

**2. Handoffs: Structured Delegation**

Handoffs are the SDK's mechanism for controlled agent-to-agent delegation.

* **SDK Implementation:** Represented internally as a specialized `FunctionTool` named `transfer_to_` (customizable). When this tool is invoked by the LLM, the `Runner` intercepts it and transitions control.
* **`handoff()` Helper:** The preferred way to configure handoffs within the `Agent.handoffs` list. It wraps the target `Agent` and allows customization:
* `input_type`: Define arguments the source agent must provide (extracted by the LLM).
* `on_handoff` Callback: Execute logic *during* the handoff transition.
* `input_filter`: Modify the conversation history the target agent sees. This is powerful for managing context length and relevance.
```python
from agents import Agent, Handoff, handoff, HandoffInputData
from agents.extensions.handoff_filters import keep_last_n_items

def filter_context_for_summary(handoff_data: HandoffInputData) -> HandoffInputData:
# Keep original input + only the last 4 generated items for the summarizer
filtered_new = keep_last_n_items(4)(handoff_data.new_items)
# Return a new data object for the target agent
return HandoffInputData(
input_history=handoff_data.input_history,
pre_handoff_items=handoff_data.pre_handoff_items,
new_items=filtered_new
)

research_agent = Agent(name="Researcher", ...)
summarizer_agent = Agent(name="Summarizer", ...)

research_agent.handoffs = [
handoff(
agent=summarizer_agent,
input_filter=filter_context_for_summary # Apply the custom filter
)
]
```

**3. Guardrails: Concurrent Validation and Control**

Guardrails provide a mechanism for running parallel checks.

* **SDK Implementation:** Use the `@input_guardrail` and `@output_guardrail` decorators on async functions. These functions perform the check (often by running another, simpler/faster `Agent`) and return `GuardrailFunctionOutput`.
* **Tripwire Mechanism:** If `tripwire_triggered` is `True` in the output, the `Runner` raises an exception (`InputGuardrailTripwireTriggered` or `OutputGuardrailTripwireTriggered`), halting the main agent flow. This allows for early exit based on validation failures.
* **Concurrency:** The SDK likely runs guardrail functions concurrently with the main agent's initial processing or final output generation step, optimizing for latency.

**4. Function Tools (`@function_tool`)**

The `@function_tool` decorator is a key DX feature for integrating custom logic.

* **Automatic Schema Generation:** It inspects the decorated Python function's signature (type hints, argument names, default values) and docstring (for descriptions) to automatically generate the required JSON schema for the `tools` parameter in the API call. It uses `inspect`, `griffe`, and `pydantic`.
* **Context Access:** If the *first* argument of the decorated function is type-hinted as `RunContextWrapper[TContext]`, the SDK will automatically inject the current run context, allowing the tool access to shared state or dependencies.
* **Error Handling (`failure_error_function`):** Provides control over how tool execution errors are reported back to the LLM (return a custom message, return a default message, or raise the exception).
```python
from agents import Agent, function_tool, RunContextWrapper, UserError
import requests

class APIContext:
def __init__(self, api_endpoint: str):
self.api_endpoint = api_endpoint

def handle_api_error(ctx: RunContextWrapper[APIContext], error: Exception) -> str:
# Custom error message for the LLM
return f"Failed to call external API: {type(error).__name__}. Please try again later or ask the user for clarification."

@function_tool(failure_error_function=handle_api_error)
def call_external_system(ctx: RunContextWrapper[APIContext], payload: dict) -> dict:
"""Calls our external processing system."""
try:
response = requests.post(f"{ctx.context.api_endpoint}/process", json=payload, timeout=10)
response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
return response.json()
except requests.exceptions.RequestException as e:
print(f"API Call failed: {e}")
# Let the custom error handler generate the message for the LLM
raise UserError(f"Network error calling external system: {e}") from e
except Exception as e:
print(f"Unexpected error in tool: {e}")
# Let the custom error handler generate the message for the LLM
raise UserError(f"Unexpected tool error: {e}") from e

# Usage:
# my_context = APIContext(api_endpoint="https://my-api.com")
# api_agent = Agent[APIContext](..., tools=[call_external_system])
# result = await Runner.run(api_agent, "Process data {...}", context=my_context)
```

**5. Hosted Tools (`WebSearchTool`, `FileSearchTool`, `ComputerTool`)**

The SDK provides specific classes to represent OpenAI's built-in tools. Using these classes signals to the `OpenAIResponsesModel` to configure the corresponding tool type in the underlying API call.

```python
from agents import Agent, WebSearchTool, FileSearchTool, ComputerTool, LocalPlaywrightComputer

vs_id = "vs_..." # Assume this exists
computer_impl = LocalPlaywrightComputer() # Needs setup, see example

agent_with_hosted_tools = Agent(
name="Capable Assistant",
instructions="Use web search for current events, file search for internal docs, and computer use for browser tasks.",
model="gpt-4o", # Requires Responses API compatible model
tools=[
WebSearchTool(user_location={"type": "approximate", "city": "San Francisco"}),
FileSearchTool(vector_store_ids=[vs_id], max_num_results=5),
# ComputerTool(computer=computer_impl) # Requires suitable model & tier
]
)
```

**6. `agents.Runner`: Executing and Managing Runs**

The `Runner` orchestrates the execution flow.

* **Entry Points:** `run`, `run_sync`, `run_streamed`.
* **Internal Loop:** Manages the interaction cycle: Call LLM -> Process Output -> Execute Tool/Handoff -> Update State -> Repeat. It abstracts away the details of constructing API requests, handling `previous_response_id`, parsing tool calls, and invoking tools/handoffs.
* **Turn vs. Conversation:** A single `Runner.run()` call typically represents one *turn* in the conversation from the user's perspective, even if it involves multiple internal agent steps, tool calls, or handoffs. The developer manages the overall conversation flow by feeding the output of one turn (`result.to_input_list()`) plus the new user input into the next `Runner.run()` call.
* **`RunConfig`:** Allows overriding agent settings globally for a run (e.g., forcing a specific model, disabling tracing, setting shared guardrails).

**7. Results (`RunResult`, `RunResultStreaming`)**

These objects capture the outcome of a run.

* `final_output`: The concluding message or structured object from the last agent. Needs type checking/casting (`final_output_as`) if `output_type` was used.
* `new_items` (list[RunItem]): A structured log of everything that happened *during* the run (messages generated, tools called, outputs received, handoffs). Essential for debugging and understanding the agent's process.
* `to_input_list()`: Helper to format the entire history (original input + new items) for the *next* conversational turn.
* `last_agent`: Reference to the agent that produced the final output.
* `RunResultStreaming` adds the `stream_events()` async iterator for real-time updates.

**8. Streaming in the SDK**

`Runner.run_streamed()` provides two types of events via `stream_events()`:

* `RawResponsesStreamEvent`: Low-level events directly from the underlying API stream (e.g., `response.output_text.delta`). Useful for token-by-token text streaming to the UI.
* `RunItemStreamEvent` / `AgentUpdatedStreamEvent`: Higher-level SDK events indicating semantic milestones (tool called, message completed, handoff occurred). Useful for structured progress updates.

**9. Configuration and Context**

* **Global Config:** Functions like `set_default_openai_key`, `set_default_openai_client`, `set_default_openai_api` configure defaults used by the SDK if not overridden at the Agent or RunConfig level.
* **Run Context (`RunContextWrapper`):** The mechanism for injecting application-specific state and dependencies into tools, hooks, etc., without polluting the LLM's context window.

**10. Analysis & Synthesis (Part 5):**

The Agents SDK acts as a crucial **application framework** built upon the foundation of the Responses API (or Chat Completions). Its primary value lies in **abstraction and structure**. It takes the powerful but potentially complex primitives of the underlying API (tool calls, state management, streaming) and provides higher-level, Pythonic constructs (`Agent`, `Handoff`, `@function_tool`, `Runner`) that simplify common agentic patterns.

* **Reduces Boilerplate:** Manages the internal loop, API request formatting, response parsing, tool dispatch, and state updates within a single turn.
* **Promotes Modularity:** Encourages breaking down complex tasks using specialized agents and controlled delegation via Handoffs.
* **Enhances Safety:** Integrates Guardrails as a first-class concept for concurrent validation.
* **Improves Developer Experience:** Automatic schema generation for tools, typed context, lifecycle hooks, and integrated tracing streamline development and debugging.
* **Flexibility:** Supports different models, providers (via interfaces/integrations like LiteLLM), and customization points.

While the Responses API provides the *what* (model interaction, built-in tools), the Agents SDK provides the *how* (defining agents, orchestrating their interactions, managing the workflow). It transforms the process from manually stitching together API calls to defining structured components within a cohesive framework, making the development of complex, multi-step, multi-agent applications significantly more manageable and robust. The common agentic patterns outlined in the documentation (deterministic flows, routing, agents-as-tools, LLM-as-judge, parallelization) are all facilitated or directly implemented using the SDK's primitives.

---

Okay, now look at the **Built-in Tools** provided by OpenAI, examining how they function within the **Responses API** and how the **Agents SDK** facilitates their use in building sophisticated agent applications. These tools represent a crucial step in empowering agents to interact dynamically with information and environments beyond their static training data.

**The Imperative for Tools: Grounding Agents in Reality**

LLMs, despite their impressive generative and reasoning abilities, operate within a bubble defined by their training data and the immediate conversation context. To perform meaningful, real-world tasks, agents need "senses" to perceive external information and "hands" to interact with external systems. Tools provide these interfaces. OpenAI's strategy of offering powerful *built-in* tools directly integrated with the Responses API and Agents SDK is significant because it:

1. **Lowers Development Barriers:** Reduces the need for developers to integrate and manage third-party services or build complex custom solutions for common tasks like web search or document retrieval.
2. **Ensures Tight Integration:** Allows for seamless interaction between the model's reasoning capabilities and the tool's functionality, potentially leading to better performance and reliability than loosely coupled external tools.
3. **Leverages Platform Synergies:** Integrates with other OpenAI platform features like Vector Stores, Files API, and observability tools (Tracing).
4. **Provides Curated Capabilities:** Offers optimized, model-aware implementations (e.g., search models fine-tuned for Q&A).

The initial suite focuses on accessing dynamic web information (Web Search), retrieving knowledge from specific documents (File Search), and interacting with graphical interfaces (Computer Use).

**1. Web Search (`web_search_preview`): The Agent's Window to the Live Web**

This tool directly addresses the LLM's limitation of static knowledge by connecting it to the vast, real-time information available on the internet.

* **Core Purpose:** Equip agents with the ability to find current information, answer questions about recent events, and gather data that post-dates their training cut-off. The inclusion of citations is a key feature, promoting transparency and verifiability.
* **Implementation via Responses API:** Enabled by including `{"type": "web_search_preview"}` in the `tools` array of a `POST /v1/responses` request. It's compatible with `gpt-4o` and `gpt-4o-mini`. The API call itself triggers the search and synthesis process.
```json
// Request Body Snippet for Responses API
{
"model": "gpt-4o",
"input": "Summarize the latest advancements in quantum computing reported this month.",
"tools": [
{"type": "web_search_preview"}
]
}
```
* **Implementation via Agents SDK:** The `agents.WebSearchTool` class provides a convenient way to configure and include this tool in an `Agent` definition.
```python
from agents import Agent, WebSearchTool, Runner

# Configure the tool (optional location awareness)
web_search = WebSearchTool(
user_location={"type": "approximate", "city": "London", "country": "UK"}
# search_context_size: "low" | "medium" | "high" = "medium" # Controls depth/breadth
)

research_agent = Agent(
name="Current Events Researcher",
instructions="You are an AI assistant that uses web search to answer questions about recent events.",
model="gpt-4o", # Must use a model compatible with the tool
tools=[web_search]
)

async def run_search():
result = await Runner.run(
research_agent,
input="Are there any major tube strikes planned in London next week?"
)
print(result.final_output)
# Example Output: "Transport for London has confirmed there are no major tube strikes planned for next week, although minor delays on the Central line are possible due to ongoing signal upgrades [Source: tfl.gov.uk]."
```
* **Underlying Mechanism & Performance:** Powered by OpenAI's internal search infrastructure, likely involving query generation, web crawling/indexing, result ranking, snippet extraction, and synthesis by the LLM. The strong SimpleQA benchmark scores (90% for GPT-4o search, 88% for mini) suggest high accuracy for factual queries compared to non-search-enabled models on the same benchmark.
* **Citations and Output:** The Responses API returns the synthesized answer within the `output[].content[].text.value` field. Citations are embedded as `annotations` within that text object, linking specific parts of the generated text back to source URLs. The Agents SDK's `RunResult.new_items` might contain specific `RunItem` types related to the web search call and its results, providing structured access if needed beyond the synthesized text.
* **Flexibility and Control:**
* **Direct Model Access:** The `gpt-4o-search-preview` and `gpt-4o-mini-search-preview` models in the Chat Completions API offer an alternative integration path.
* **Publisher Opt-Out:** Content owners retain control over whether their sites are included via standard web protocols (likely robots.txt or specific meta tags, detailed in the linked `appear` docs).
* **Use Cases:** Essential for news summaries, checking real-time status (weather, stocks, travel), product research, competitive analysis, fact-checking, and any application where static knowledge is insufficient. Hebbia's use case for real-time market intelligence in finance/legal highlights its enterprise value.
* **Pricing:** Using the *tool* via Responses API incurs standard token costs plus a likely per-search tool fee (verify on the pricing page). Using the *dedicated search models* via Chat Completions has distinct per-query pricing ($30/k for 4o, $25/k for 4o-mini).
* **Analysis:** Web Search integration is a game-changer for agent utility. By handling the complexity internally and providing citations, OpenAI makes reliable, up-to-date information access much simpler for developers. The dual access (Responses tool vs. Chat Completions models) offers flexibility but reinforces the Responses API as the integrated path for agent building. The citation mechanism is crucial for responsible AI deployment.

**2. File Search (`file_search`): Giving Agents Access to Your Knowledge**

This tool empowers agents with Retrieval-Augmented Generation (RAG) capabilities, allowing them to query and synthesize information from specific document sets provided by the developer.

* **Core Purpose:** To ground agent responses in proprietary, domain-specific, or user-specific information contained within files, enabling accurate answers based on provided documents rather than just the model's general knowledge. It provides persistent, searchable knowledge bases.
* **Prerequisites: Files and Vector Stores:**
1. **Upload Files:** Use the standard Files API (`POST /v1/files` with `purpose="assistants"` or potentially a newer purpose like `"vector_store"`) to upload your documents (PDF, DOCX, TXT, etc.).
2. **Create Vector Store:** Use the Vector Stores API (`POST /v1/vector_stores`) to create a logical container and associate uploaded file IDs with it. OpenAI handles the backend processing (chunking, embedding) to make the files searchable. Vector Stores are persistent resources.
```python
from openai import OpenAI
client = OpenAI()

# Assume file_report_q1, file_policy_v2 exist (File objects)
try:
knowledge_vs = client.vector_stores.create(
name="Company Knowledge Base",
file_ids=[file_report_q1.id, file_policy_v2.id],
# Optional: Define how files are chunked
chunking_strategy={
"type": "static",
"static": {"max_chunk_size_tokens": 500, "chunk_overlap_tokens": 100}
},
# Optional: Set expiration
expires_after={"anchor": "last_active_at", "days": 30}
)
print(f"Created Vector Store: {knowledge_vs.id}")
# Need to poll/wait for file processing completion status
# client.vector_stores.files.list(vector_store_id=knowledge_vs.id) -> check status
except Exception as e:
print(f"Error creating Vector Store: {e}")
```
* **Implementation via Responses API:** Enabled via the `tools` array, specifying `type: "file_search"` and the `vector_store_ids` to target.
```json
// Request Body Snippet for Responses API
{
"model": "gpt-4o",
"input": "What was the Q1 revenue mentioned in the report?",
"tools": [
{
"type": "file_search",
"vector_store_ids": ["vs_abc123"] // ID of the Vector Store created above
}
],
// Optional: Request raw search results alongside the synthesized answer
"include": ["file_search_call.results"]
}
```
* **Implementation via Agents SDK:** The `agents.FileSearchTool` class configures the tool within an `Agent`.
```python
from agents import Agent, FileSearchTool, Runner

vs_id = "vs_abc123" # ID of the pre-created Vector Store

# Configure the File Search tool
file_search = FileSearchTool(
vector_store_ids=[vs_id],
max_num_results=5, # Limit number of retrieved chunks
# Optional: Add metadata filters
# filters={"file_type": "pdf", "min_date": "2024-01-01"},
# Optional: Get raw results for inspection/logging
include_search_results=True
)

hr_policy_agent = Agent(
name="HR Policy Assistant",
instructions="Answer employee questions based ONLY on the provided company policy documents.",
model="gpt-4o",
tools=[file_search]
)

async def run_policy_query():
result = await Runner.run(
hr_policy_agent,
input="What is the company policy on remote work?"
)
print(result.final_output)

# Inspect raw results if include_search_results=True
for item in result.new_items:
if item.type == 'tool_call_output_item' and item.raw_item.type == 'file_search':
# Access item.raw_item.file_search.results here (structure defined in API ref)
# Contains list of {file_id, filename, score, attributes, content:[{type:'text', text:'...'}]}
pass
```
* **Underlying Mechanism & Performance:** OpenAI manages the complex RAG pipeline: document parsing, chunking (automatic or static), embedding generation, indexing into a vector database, query embedding, similarity search, and reranking. The promise of built-in optimization removes a significant burden from developers. The ability to directly query the vector store (`POST /v1/vector_stores/{id}/search`) offers additional flexibility.
* **Output & Citations:** Similar to web search, the model synthesizes an answer based on retrieved chunks, presented in `output_text`. `file_citation` annotations link parts of the answer back to the source `file_id` and potentially specific chunks/locations within the file. Setting `include=["file_search_call.results"]` in the API or `include_search_results=True` in the SDK tool provides access to the raw retrieved chunks (text, file source, metadata, score) for logging, custom processing, or display.
* **Use Cases:** Critical for internal knowledge bots (HR, IT support, sales enablement), analyzing reports, summarizing user-specific documents, providing context from technical manuals, legal document review support (e.g., Navan's travel policy agent). The ability to create distinct Vector Stores per user/group enables highly personalized agent experiences.
* **Pricing:** Involves a per-query fee ($2.50/k) plus ongoing storage costs ($0.10/GB/day, 1GB free). This structure encourages efficient querying and mindful management of Vector Store lifecycles, especially for large or numerous stores.
* **Analysis:** File Search democratizes RAG. By abstracting the infrastructure, OpenAI makes it feasible for many developers to build agents grounded in specific data without deep expertise in vector databases or embedding models. The integration within Responses/Agents SDK is seamless. The pricing model reflects both the compute for search and the value of persistent storage. The direct search endpoint adds further value, making Vector Stores a reusable platform component.

**3. Computer Use (`computer_use_preview`): The Agent's Hands for UI Interaction**

This experimental tool represents a leap towards agents that can directly operate graphical interfaces, starting with web browsers.

* **Core Purpose:** To enable agents to automate tasks involving UI navigation, data entry, or information gathering directly from web pages or potentially desktop applications, bypassing the need for traditional APIs.
* **Prerequisites & Setup:**
* **Model:** Requires the specific `computer-use-preview` model.
* **API Parameter:** Must use `truncation="auto"`.
* **SDK Implementation (`ComputerTool`):** This requires the developer to provide an implementation of the `Computer` (sync) or `AsyncComputer` (async) interface. This interface defines methods like `screenshot()`, `click(x, y)`, `type(text)`, `scroll()`, `keypress(keys)`, etc. OpenAI provides a sample `LocalPlaywrightComputer` using the Playwright library for browser automation. **Crucially, the SDK user writes the code that *executes* the actions; the API only *generates* the intended actions.**
```python
from agents import Agent, ComputerTool, ModelSettings, Runner
# Assume LocalPlaywrightComputer is defined as in the SDK examples/docs
# Needs playwright installed: pip install playwright; playwright install

async def run_computer_agent():
async with LocalPlaywrightComputer() as computer: # Manages browser lifecycle
computer_agent = Agent(
name="Web Task Automator",
instructions="Carefully follow the user's request to interact with the browser.",
model="computer-use-preview", # Specific model required
model_settings=ModelSettings(truncation="auto"), # Required
tools=[
ComputerTool(computer=computer) # Pass the implementation
]
)

# Example task
result = await Runner.run(
computer_agent,
input="Go to openai.com, find the latest blog post title, and tell me what it is."
)
# The Runner internally handles the loop:
# 1. Agent asks CUA model what to do based on input & screenshot.
# 2. API returns actions (e.g., type URL, click link, scroll).
# 3. Runner calls the corresponding methods on `LocalPlaywrightComputer` (e.g., computer.type(...), computer.click(...)).
# 4. `LocalPlaywrightComputer` executes these using Playwright.
# 5. Agent potentially asks for another screenshot and continues until task complete.

print(f"Final Answer: {result.final_output}")
```
* **Implementation via Responses API:** Requires specifying the tool type, display dimensions, and environment. The response's `output` array will contain `computer_action` items detailing the sequence of actions the CUA model determined.
```json
// Request Body Snippet for Responses API
{
"model": "computer-use-preview",
"input": "Navigate to example.com and click the 'About' link.",
"truncation": "auto",
"tools": [
{
"type": "computer_use_preview",
"display_width": 1280,
"display_height": 800,
"environment": "browser"
}
]
}
// Response Output Snippet (Conceptual)
// "output": [
// { "type": "computer_action", "action": {"type": "type", "text": "example.com"} },
// { "type": "computer_action", "action": {"type": "keypress", "keys": ["Enter"]} },
// // ... wait, scroll, find link coordinates ...
// { "type": "computer_action", "action": {"type": "click", "x": 200, "y": 150} }
// ]
```
* **Underlying Mechanism & Performance:** Relies on the specialized CUA model trained on UI interaction data. Benchmarks show strong performance on web tasks (WebArena 58.1%, WebVoyager 87.0%) but significantly lower reliability on general OS tasks (OSWorld 38.1%). This highlights that browser automation is the more mature use case currently.
* **Safety and Responsibility:** This is the most critical aspect.
* **Execution Burden:** The developer is responsible for translating the API's action list into actual commands and executing them safely.
* **Sandboxing:** OpenAI strongly recommends executing actions in isolated environments (containers, VMs) to prevent unintended system modifications.
* **Confirmation Prompts:** Developers should implement confirmation steps for potentially destructive or sensitive actions (e.g., deleting files, submitting forms with personal data).
* **Human Oversight:** Essential due to the model's imperfect reliability, especially outside well-defined browser tasks.
* **Preview Status:** Availability limited to higher usage tiers reflects the experimental nature and potential risks.
* **Use Cases:** Automating interactions with websites lacking APIs, QA testing web applications, data entry into legacy web forms, scraping visually structured data. Luminai's and Unify's examples showcase its power in dealing with API-less systems and extracting unique signals.
* **Pricing:** Token-based ($3/M input, $12/M output), reflecting the generative nature of producing the action sequence. The execution cost (e.g., running Playwright) is borne by the developer.
* **Analysis:** Computer Use is a powerful glimpse into the future of agents as active participants in digital workflows. It has the potential to revolutionize automation where APIs are absent. However, its current limitations (reliability, especially on OS) and the significant safety burden placed on the developer make it a tool requiring extreme caution and careful implementation. The SDK's `ComputerTool` requires substantial developer effort to provide the actual execution backend (like `LocalPlaywrightComputer`). It's not a plug-and-play automation solution but rather provides the *intent* for automation, which the developer must then safely realize.

**4. Synthesis: Tools as Integrated Agent Capabilities**

The introduction of Web Search, File Search, and Computer Use as first-party, integrated tools within the Responses API and Agents SDK represents a significant step towards OpenAI's agentic vision.

* **Reduced Friction:** They drastically simplify incorporating crucial external interaction capabilities compared to building or integrating third-party solutions.
* **Enhanced Agent Intelligence:** These tools aren't just callable functions; they are deeply integrated with the model's reasoning. The model can decide *when* and *how* to use them, synthesize their results, and even chain them together (e.g., search the web, then search local files based on web findings).
* **Platform Cohesion:** They leverage and enhance other platform components like Files, Vector Stores, and specialized models (CUA, search models), creating a more unified developer experience.
* **Managed Complexity:** OpenAI handles the intricate backend infrastructure for search indexing/querying, RAG pipelines, and (in the case of Computer Use) the complex CUA model inference, allowing developers to focus on application logic.
* **Orchestration via SDK:** The Agents SDK provides the classes (`WebSearchTool`, `FileSearchTool`, `ComputerTool`) and the `Runner` logic to seamlessly integrate these tools into agent definitions and execution flows. The SDK manages requesting tool use via the API and handling the results or generated actions.

These built-in tools transform the Responses API from a mere text/image generation endpoint into a true foundation for agents that can perceive (search, see screens), access knowledge (files), and act (computer use). While Computer Use remains experimental and requires significant developer caution, Web Search and File Search offer robust, production-ready capabilities for building knowledgeable and grounded agents today.

---

Okay, let's construct a detailed exploration of building agentic applications using the OpenAI Agents SDK, focusing on practical patterns, concrete code examples, and analysis of how the SDK simplifies complex orchestration tasks.

**Building Agentic Applications with the OpenAI Agents SDK**

The OpenAI Agents SDK serves as the primary framework for developers looking to move beyond simple request-response interactions and build applications where AI actively pursues goals, uses tools, interacts with data, and collaborates within multi-agent systems. It provides a structured, Pythonic layer on top of OpenAI's powerful models and APIs (primarily the Responses API), managing the complexities of orchestration, state, and control flow.

**1. The Foundation: Single Agent with Tools**

The most basic agentic application involves a single agent equipped with tools to augment its capabilities. The SDK makes defining and running such an agent straightforward.

* **Defining the Agent:** Use the `agents.Agent` class. Key components are `name`, `instructions` (system prompt), `model`, and `tools`.
* **Defining Tools:** Use the `@agents.function_tool` decorator on standard Python functions. The SDK automatically handles:
* Extracting the function name for the tool name.
* Using the docstring for the tool description.
* Parsing type hints and arguments (including default values) to generate the JSON schema required by the API.
* Parsing argument descriptions from standard docstring formats (google, sphinx, numpy).
* **Execution:** Use `agents.Runner` to execute the agent with user input.

* **Example: Currency Conversion Agent**

Let's create an agent that can convert currencies using a (mock) function tool.

```python
import asyncio
from agents import Agent, Runner, function_tool, RunContextWrapper
from pydantic import BaseModel
import random # For mock exchange rate

# --- Tool Definition ---
class ConversionResult(BaseModel):
from_currency: str
to_currency: str
amount_from: float
amount_to: float
rate: float

# Use @function_tool to expose this Python function as a tool to the LLM
@function_tool
def get_conversion_rate(from_currency: str, to_currency: str, amount: float) -> ConversionResult:
"""
Converts an amount from one currency to another using a mock exchange rate.

Args:
from_currency: The currency code to convert from (e.g., 'USD').
to_currency: The currency code to convert to (e.g., 'EUR').
amount: The amount of the 'from_currency' to convert.
"""
print(f"[Tool Executing] get_conversion_rate(from={from_currency}, to={to_currency}, amount={amount})")
# In a real scenario, this would call an external exchange rate API
mock_rates = {
("USD", "EUR"): 0.93,
("EUR", "USD"): 1.08,
("USD", "GBP"): 0.81,
("GBP", "USD"): 1.24,
("EUR", "GBP"): 0.87,
("GBP", "EUR"): 1.15,
}
rate = mock_rates.get((from_currency.upper(), to_currency.upper()), None)

if rate is None:
# Add some randomness for pairs not explicitly defined
if from_currency.upper() == to_currency.upper():
rate = 1.0
else:
rate = random.uniform(0.5, 1.5)
print(f"[Tool Warning] Using random rate {rate} for {from_currency}->{to_currency}")

amount_to = amount * rate
return ConversionResult(
from_currency=from_currency.upper(),
to_currency=to_currency.upper(),
amount_from=amount,
amount_to=round(amount_to, 2),
rate=round(rate, 4)
)

# --- Agent Definition ---
currency_agent = Agent(
name="Currency Converter Agent",
instructions="You are a helpful currency conversion assistant. Use the available tool to perform conversions when requested. Clearly state the original amount, the converted amount, and the rate used.",
model="gpt-4o-mini", # Suitable for tool use
tools=[get_conversion_rate] # Make the tool available
)

# --- Execution ---
async def main():
user_query = "Can you convert 100 USD to EUR?"
print(f"User Query: {user_query}")

# Runner manages the interaction with the agent and its tools
result = await Runner.run(currency_agent, input=user_query)

print("\n--- Agent Run Details ---")
print(f"Final Output:\n{result.final_output}")

print("\nItems generated during run:")
for item in result.new_items:
print(f"- Type: {item.type}, Agent: {item.agent.name}")
# You can inspect item.raw_item for the detailed API object
if item.type == 'tool_call_item':
print(f" Tool Called: {item.raw_item.function.name}")
print(f" Arguments: {item.raw_item.function.arguments}")
elif item.type == 'tool_call_output_item':
print(f" Tool Result: {item.output}") # Access parsed Pydantic model
print(f" Raw Tool Result (for API): {item.raw_item.result}") # String representation

if __name__ == "__main__":
asyncio.run(main())

```

* **Execution Flow Breakdown:**
1. `Runner.run` is called with `currency_agent` and the user query.
2. The `Runner` prepares the input and calls the agent's model (`gpt-4o-mini`) via the Responses API, providing the instructions and the schema for the `get_conversion_rate` tool.
3. The LLM receives the prompt and tool schema. It decides that the user query requires the tool and generates a response containing a `tool_call` item requesting to call `get_conversion_rate` with arguments like `{"from_currency": "USD", "to_currency": "EUR", "amount": 100}`.
4. The `Runner` receives this API response, parses the `tool_call`.
5. It identifies the corresponding Python function (`get_conversion_rate` via the `@function_tool` mapping).
6. It validates the arguments string (`"{\"from_currency\": \"USD\", ...}"`) against the function's signature (automatically converted to a Pydantic model internally) and calls the `get_conversion_rate` Python function with the validated arguments (`from_currency="USD"`, `to_currency="EUR"`, `amount=100.0`).
7. The `get_conversion_rate` function executes (prints the debug message, calculates the result) and returns a `ConversionResult` Pydantic object.
8. The `Runner` takes the returned object, converts it back into a string representation suitable for the API (`{"from_currency": "USD", ...}`), and prepares a new input item (`role: "tool"`, `content: [{"type": "tool_result", ...}]`).
9. The `Runner` calls the LLM again (Responses API), providing the original context *plus* the new tool result input item (using `previous_response_id`).
10. The LLM receives the tool result, understands the conversion was successful, and generates the final textual response for the user based on the result and the agent's instructions.
11. The `Runner` receives this final response, determines it's the concluding output (no more tool calls), packages everything into the `RunResult` object, and returns it.

* **Analysis:** The SDK significantly simplifies this. Without it, the developer would need to manually: format the tool schema for the API, make the first API call, parse the response for tool calls, find and execute the local function, handle potential errors, format the tool result, make the second API call with the result and conversation history, and parse the final response. The `Runner` and `@function_tool` handle most of this orchestration.

**2. Structured Outputs: Getting Predictable Data**

Often, the desired output isn't just text but structured data for programmatic use. The `output_type` parameter on `Agent` enables this.

* **Mechanism:** When `output_type` is set (e.g., to a Pydantic model), the SDK:
1. Generates a JSON schema from the provided type.
2. Instructs the underlying model (via the `text.format.type = "json_schema"` parameter in the Responses API) to output JSON conforming to that schema.
3. Automatically validates the JSON string returned by the LLM against the `output_type`.
4. Populates `RunResult.final_output` with the *parsed and validated* object (e.g., an instance of your Pydantic model), raising `ModelBehaviorError` if validation fails.

* **Example: Extracting Invoice Details**

```python
import asyncio
from agents import Agent, Runner
from pydantic import BaseModel, Field
from typing import List, Optional

# --- Define the Output Structure ---
class InvoiceItem(BaseModel):
description: str = Field(..., description="Description of the line item")
quantity: int = Field(..., description="Quantity of the item")
unit_price: float = Field(..., description="Price per unit")
total_price: float = Field(..., description="Total price for this line item (quantity * unit_price)")

class InvoiceDetails(BaseModel):
invoice_id: Optional[str] = Field(None, description="The invoice number or ID, if present")
vendor_name: Optional[str] = Field(None, description="Name of the vendor or company issuing the invoice")
customer_name: Optional[str] = Field(None, description="Name of the customer or recipient")
invoice_date: Optional[str] = Field(None, description="Date the invoice was issued (YYYY-MM-DD format if possible)")
due_date: Optional[str] = Field(None, description="Date the payment is due")
items: List[InvoiceItem] = Field(..., description="List of line items on the invoice")
subtotal: Optional[float] = Field(None, description="The total amount before taxes or discounts")
tax_amount: Optional[float] = Field(None, description="The amount of tax applied")
total_amount: float = Field(..., description="The final total amount due")

# --- Agent Definition ---
invoice_parser_agent = Agent(
name="Invoice Parser",
# Instructions guide the LLM on *what* to extract and *how* to format it (implicitly via output_type)
instructions="Extract the key details from the provided invoice text into the required JSON structure. Calculate total_price for items if not explicit.",
model="gpt-4o", # Good model for complex extraction
output_type=InvoiceDetails # Crucial: Tell the agent the expected output structure
)

# --- Execution ---
async def main():
invoice_text = """
Invoice #INV-123
From: Tech Solutions Inc.
To: Global Corp
Date: 2025-03-15
Due: 2025-04-14

Items:
- Widget A, Qty: 2, Price: $50.00 each
- Service B, Qty: 1, Price: $200.00
- Widget C, Qty: 5, Price: $10.00 each

Subtotal: $350.00
Tax (10%): $35.00
Total Due: $385.00
"""
print(f"Parsing Invoice:\n{invoice_text}")

result = await Runner.run(invoice_parser_agent, input=invoice_text)

if isinstance(result.final_output, InvoiceDetails):
extracted_data: InvoiceDetails = result.final_output_as(InvoiceDetails) # Safe cast now
print("\n--- Extracted Invoice Data ---")
print(f"Invoice ID: {extracted_data.invoice_id}")
print(f"Vendor: {extracted_data.vendor_name}")
print(f"Customer: {extracted_data.customer_name}")
print(f"Total Amount: ${extracted_data.total_amount:.2f}")
print("Items:")
for item in extracted_data.items:
print(f" - {item.description}: {item.quantity} @ ${item.unit_price:.2f} = ${item.total_price:.2f}")
else:
# Handle cases where the model failed to produce valid JSON
print("\n--- Failed to Extract Structured Data ---")
print(f"Raw Output: {result.final_output}") # Will likely be an error message or malformed string

if __name__ == "__main__":
asyncio.run(main())
```

* **Analysis:** Structured outputs are essential for integrating LLM results into downstream processes. Using `output_type` with Pydantic models provides strong typing, validation, and ease of use. The SDK handles the conversion to the API's `response_format` parameter and the validation of the returned JSON, significantly reducing the risk of application errors due to malformed LLM outputs. It relies on the model's ability to follow formatting instructions based on the provided schema.

**3. Orchestration Patterns: Handoffs and Agents-as-Tools**

The SDK enables building complex workflows by coordinating multiple agents.

* **Handoffs (`Agent.handoffs`, `handoff()`): Structured Delegation**
* **Use Case:** Routing tasks to specialized agents (e.g., language-specific support, different stages of a workflow like research -> drafting -> editing).
* **Mechanism:** The source agent's LLM identifies the need to delegate and calls the implicit `transfer_to_` tool. The SDK intercepts this and switches the active agent within the `Runner` loop.
* **Context Control (`input_filter`):** Essential for managing what the *target* agent sees. Without filters, the target sees the entire preceding history, which might be long or contain irrelevant tool calls. Filters allow pruning or transforming the context for efficiency and relevance.
```python
from agents import Agent, Handoff, handoff, HandoffInputData, Runner
from agents.extensions.handoff_filters import remove_tool_calls # Example filter

spanish_agent = Agent(name="Spanish Support", instructions="Hablas español.")
english_agent = Agent(name="English Support", instructions="Speak English.")

triage_agent = Agent(
name="Language Triage",
instructions="Determine the language of the request (Spanish or English) and handoff to the appropriate agent.",
model="gpt-4o-mini",
handoffs=[
handoff(spanish_agent, input_filter=remove_tool_calls), # Spanish agent doesn't need triage tool history
handoff(english_agent, input_filter=remove_tool_calls) # English agent doesn't need triage tool history
]
)

# result_es = await Runner.run(triage_agent, "Necesito ayuda con mi cuenta.")
# print(result_es.last_agent.name) # Expected: Spanish Support
# print(result_es.final_output)

# result_en = await Runner.run(triage_agent, "I need help with my account.")
# print(result_en.last_agent.name) # Expected: English Support
# print(result_en.final_output)
```

* **Agents as Tools (`agent.as_tool()`): Querying Sub-Agents**
* **Use Case:** When an orchestrating agent needs information or a sub-task completed by another agent *without* giving up control of the main flow. Useful for parallel queries or incorporating specialized agent outputs into a larger synthesis.
* **Mechanism:** Wraps a target agent (`spanish_agent`) so it appears as a standard `FunctionTool` to the orchestrator agent. When the orchestrator calls this tool, the `Runner` executes the target agent (e.g., `spanish_agent`) with the provided input, captures its `final_output`, and returns it as the tool result to the orchestrator.
* **Example:**
```python
from agents import Agent, Runner
import asyncio

# Define specialist "tool" agents
code_generator = Agent(name="Code Generator", instructions="Generate Python code for the given task.")
code_reviewer = Agent(name="Code Reviewer", instructions="Review the provided Python code for bugs and style issues.")

# Define orchestrator
developer_assistant = Agent(
name="Developer Assistant",
instructions="Generate Python code using the 'generate_code' tool, then get it reviewed using the 'review_code' tool. Finally, present the reviewed code.",
model="gpt-4o",
tools=[
# Wrap agents as tools
code_generator.as_tool(
tool_name="generate_code",
tool_description="Generates Python code based on a description."
),
code_reviewer.as_tool(
tool_name="review_code",
tool_description="Reviews Python code and suggests improvements."
)
]
)

# async def main():
# result = await Runner.run(developer_assistant, "Generate a Python function to calculate factorial.")
# print(result.final_output)
# # Expected output would be the final reviewed code, potentially incorporating suggestions.
# if __name__ == "__main__":
# asyncio.run(main())

```
* **Comparison:** Handoffs change *who* is driving the conversation. Agents-as-tools keep the original agent in control, using others as callable resources.

**4. Managing Conversation History (`result.to_input_list()`)**

The SDK simplifies maintaining context across multiple user turns.

1. **Initial Turn:** `result1 = await Runner.run(agent, user_input_1, context=...)`
2. **Prepare Next Input:** Get the complete history from the first run (original input + all items generated during the run) and append the new user message.
`input_for_turn_2 = result1.to_input_list() + [{"role": "user", "content": user_input_2}]`
3. **Subsequent Turn:** `result2 = await Runner.run(agent, input_for_turn_2, context=...)` (Use `result1.last_agent` if stateful agent routing is desired).

This pattern ensures the agent receives the necessary history for contextually relevant responses, relying on the underlying API's `previous_response_id` (if using Responses API) or the full message list (if using Chat Completions) managed by the `Model` implementation within the SDK. The `truncation="auto"` setting in `ModelSettings` becomes crucial for long conversations when using the Responses API.

**5. Analysis & Synthesis (Part 7):**

The Agents SDK provides a practical and powerful toolkit for implementing various agentic architectures.

* **Foundation:** Simple agents augmented with function tools and structured outputs form the base.
* **Modularity:** Handoffs enable building complex systems from smaller, specialized, more manageable agents. `input_filter` is key to controlling context flow effectively.
* **Flexibility:** Agents-as-tools offer an alternative orchestration pattern where a central agent maintains control while leveraging specialized sub-agents.
* **State Management:** The `Runner` handles intra-turn state, while `result.to_input_list()` provides the mechanism for developers to manage inter-turn conversation history easily.
* **Developer Experience:** Decorators (`@function_tool`, `@input_guardrail`), automatic schema generation, typed context (`Agent[TContext]`), and structured results (`RunResult`) significantly enhance productivity and reduce boilerplate compared to raw API interactions.

By combining these SDK primitives, developers can construct workflows ranging from simple tool-augmented chatbots to sophisticated multi-agent systems capable of research, planning, execution, and validation, all within a coherent and relatively easy-to-learn Python framework. The choice between handoffs and agents-as-tools depends on whether delegation or controlled querying is the desired interaction pattern for the specific workflow step.

---

Okay, let's explore the advanced features and surrounding ecosystem components that elevate the OpenAI Agents SDK from a basic execution framework to a more comprehensive platform for developing, managing, and improving agentic systems. These include Tracing, the role of Evaluations, integration with the Model Context Protocol (MCP), and lifecycle hooks.

**1. Tracing: Illuminating the Agent's Path**

As agent workflows become more complex, involving multiple steps, tool calls, handoffs, and guardrails, understanding *what* happened during a run, *why* it happened, and *where* things went wrong becomes critical. The Agents SDK addresses this with built-in **Tracing**.

* **Purpose:** To provide developers with a detailed, structured record of an agent run's execution path. This is invaluable for:
* **Debugging:** Pinpointing errors, unexpected behavior, or infinite loops by examining the sequence of LLM calls, tool inputs/outputs, and agent transitions.
* **Monitoring:** Observing agent behavior in production, tracking performance metrics (latency, token usage per step), and identifying patterns.
* **Visualization:** Using dashboards (like the OpenAI Traces dashboard mentioned) to graphically represent the flow of execution, making complex interactions easier to grasp.
* **Analysis & Improvement:** Understanding which tools are used most often, where agents struggle, or which prompts lead to better outcomes, feeding into evaluation and fine-tuning processes.

* **Core Concepts:** Tracing in the SDK adopts standard observability concepts:
* **Trace:** Represents a single, complete end-to-end execution of a workflow. It groups together all related operations. A trace might correspond to a single `Runner.run()` call or encompass multiple `Runner.run()` calls if wrapped in a higher-level `trace()` context manager. Key properties include `workflow_name`, `trace_id`, `group_id` (for linking related traces, e.g., within a single user conversation), and `metadata`.
* **Span:** Represents a specific operation within a trace that has a start and end time. Spans are nested, creating a tree structure that reflects the execution hierarchy. A `generation_span` might occur within an `agent_span`, which itself might be within the main `trace`. Key properties include `trace_id`, `parent_id` (linking to the parent span), timestamps, and `span_data` containing operation-specific details (e.g., agent name, tool arguments, LLM response).

* **Automatic Tracing:** The SDK instruments key operations by default:
* **Run Level:** Each call to `Runner.run()`, `run_sync()`, or `run_streamed()` is automatically wrapped in a top-level trace span unless nested within an explicit `trace()`.
* **Agent Execution:** Each time an agent's logic is invoked (before calling the LLM), an `agent_span` is created.
* **LLM Generation:** Every call to the underlying `Model` (e.g., Responses API request) is captured in a `generation_span`.
* **Tool Calls:** `function_span` for custom Python tools, and likely specific spans for built-in tools like web search or file search.
* **Handoffs:** Captured within a `handoff_span`.
* **Guardrails:** Execution of input/output guardrails is recorded in `guardrail_span`.
* **Audio (If Used):** `transcription_span` (STT), `speech_span` (TTS), potentially grouped under `speech_group_span`.
* **MCP:** Interactions with MCP servers (`list_tools`, `call_tool`) are also traced.

* **Custom Tracing with `trace()`:** For workflows spanning multiple `Runner.run` calls (e.g., a user conversation), you can create a single overarching trace using the `trace()` context manager. This ensures all related activities are grouped logically.
```python
import asyncio
from agents import Agent, Runner, trace

chat_agent = Agent(name="Concise Assistant", instructions="Be very brief.")
thread_id = "user_session_123" # Example identifier

async def handle_conversation():
history = []
# Use trace() to group all interactions for this thread_id
with trace(workflow_name="User Chat", group_id=thread_id):
user_q1 = "What is the capital of Japan?"
print(f"User: {user_q1}")
input1 = user_q1
result1 = await Runner.run(chat_agent, input1)
print(f"Agent: {result1.final_output}")
history = result1.to_input_list() # Get history including agent's response

user_q2 = "What language do they speak there?"
print(f"\nUser: {user_q2}")
input2 = history + [{"role": "user", "content": user_q2}]
result2 = await Runner.run(chat_agent, input2)
print(f"Agent: {result2.final_output}")
history = result2.to_input_list()

# ... further turns ...

if __name__ == "__main__":
asyncio.run(handle_conversation())
# In the OpenAI Traces dashboard, you would see a single Trace for "User Chat"
# with group_id "user_session_123", containing spans for both Runner.run calls
# and their internal agent/generation spans.
```

* **Custom Spans:** While most operations are traced automatically, `custom_span()` allows developers to add their own timed operations to the trace for better application-specific visibility.

* **Sensitive Data Control:** Recognizing that LLM inputs/outputs and tool data can be sensitive, the SDK provides controls:
* `RunConfig(trace_include_sensitive_data=False)`: Prevents LLM/tool inputs and outputs from being stored in the `GenerationSpanData`/`FunctionSpanData`. The spans are still created, but the potentially sensitive payload is omitted.
* (For voice agents) `VoicePipelineConfig(trace_include_sensitive_audio_data=False)`: Prevents raw audio data from being stored in audio-related spans.
* This allows capturing the workflow structure and timings without necessarily logging the raw content.

* **Disabling Tracing:**
* Globally: Set environment variable `OPENAI_AGENTS_DISABLE_TRACING=1`.
* Per-Run: Pass `RunConfig(tracing_disabled=True)` to `Runner.run()`.
* ZDR Limitation: Tracing is inherently incompatible with Zero Data Retention policies as it involves sending data to OpenAI's backend for storage and visualization.

* **Backend and Custom Processors:**
* **Default Backend:** Traces are sent to OpenAI's backend via a `BackendSpanExporter`. This requires an OpenAI API key (either the default one or one set specifically via `set_tracing_export_api_key`).
* **Extensibility:** The SDK allows adding (`add_trace_processor`) or replacing (`set_trace_processors`) the default trace processing pipeline. This enables:
* Sending traces to multiple destinations (e.g., OpenAI backend *and* an internal logging system or a third-party observability platform like Langfuse, W&B, etc.).
* Completely replacing OpenAI's backend if desired (e.g., for compliance reasons or to use a different visualization tool).

* **Analysis:** Integrated tracing is a cornerstone feature for building production-grade agentic systems. The complexity of multi-step, multi-tool, multi-agent workflows makes traditional logging insufficient. Visual, hierarchical traces provide critical insight for debugging and optimization. The automatic instrumentation covers most common SDK operations, while `trace()` and `custom_span()` offer flexibility. Control over sensitive data is essential for adoption. The extensible processor architecture allows integration with diverse observability stacks, acknowledging that OpenAI's backend might not be the sole destination for all users.

**2. The Role of Evaluations (Evals API)**

While the Agents SDK documentation focuses primarily on *building* and *running* agents, the broader context provided (including the API reference for `/evals`) highlights the importance of **evaluation** in the agent development lifecycle.

* **Purpose:** To systematically measure the performance, quality, and safety of agentic applications against defined criteria. This is crucial for:
* **Quality Assurance:** Ensuring the agent meets functional requirements and produces accurate/helpful outputs.
* **Regression Testing:** Detecting performance degradation when models, prompts, or tools are updated.
* **Comparison:** Objectively comparing different agent versions, prompts, models, or configurations (e.g., comparing `gpt-4o` vs `o3-mini` on a specific task).
* **Improvement:** Identifying specific weaknesses or failure modes to guide prompt engineering, fine-tuning, or tool refinement.

* **Integration Points with SDK/Responses API:**
* **Data Sources:** The Evals API (`POST /v1/evals`) can use various data sources (`data_source_config`), including:
* `stored_completions`: Leverages Chat Completion requests stored via `store=true`.
* `stored_responses` (Implied/Future): Logically, evaluation should also work with Responses API interactions stored via `store=true`, providing a rich dataset of agent runs including tool usage.
* `custom`: Allows providing data directly (e.g., from a file or list).
* **Tracing Data:** Although not a direct input format for the Evals API itself, the detailed information captured in traces (inputs, outputs, tool calls, intermediate steps) is invaluable for *analyzing* evaluation results and understanding *why* an agent failed or succeeded on a particular test case. The trace provides the necessary context behind the evaluation score.
* **Testing Criteria (`testing_criteria`):** The Evals API defines various "graders":
* `label_model`: Uses an LLM (often a cheaper one like `o3-mini`) with specific instructions to classify or grade the agent's output (e.g., check sentiment, relevance, correctness based on ground truth).
* `string_check`: Performs simple string comparisons (equality, contains, etc.) against expected outputs or references.
* Other custom graders can likely be defined.

* **The MLOps Loop:** Evaluations close the loop in the agent development cycle:
1. **Build:** Define agents, tools, handoffs using the Agents SDK.
2. **Deploy/Run:** Execute agent workflows using `Runner`. Ensure relevant interactions are stored (`store=true` in Responses API) or logged.
3. **Monitor:** Use Tracing to observe behavior and debug issues in development or production.
4. **Evaluate:** Use the Evals API, potentially feeding it stored interactions or custom datasets, to measure performance against defined criteria.
5. **Improve:** Analyze eval results (often aided by traces) to refine prompts, tools, agent logic, or even fine-tune models based on performance data.

* **Analysis:** While the Agents SDK itself doesn't contain the Eval execution logic (that resides in the separate Evals API), it's a critical part of the *ecosystem*. The SDK generates the traceable, potentially storable interactions that form the basis for evaluation. Effective agent development requires this iterative loop, and OpenAI is providing integrated tools for tracing (monitoring) and evaluation (measurement) to support it.

**3. Model Context Protocol (MCP): Standardizing Tool Integration**

MCP represents an effort to standardize how agents discover and interact with external tools and data sources, moving beyond bespoke function definitions.

* **Purpose:** To create an open, interoperable protocol allowing AI models/agents to seamlessly connect with diverse "context providers" (tools, databases, APIs) without requiring custom integration code for each one. The USB-C analogy highlights the goal of plug-and-play context.
* **SDK Support:** The Agents SDK provides classes to interact with MCP servers:
* `MCPServerStdio`: Connects to MCP servers running as local subprocesses (communicating via standard input/output). Requires `command` and `args` to launch the server process.
* `MCPServerSse`: Connects to remote MCP servers via HTTP Server-Sent Events (SSE). Requires the server `url`.
* **Integration with `Agent`:** MCP servers are added to the `Agent.mcp_servers` list.
* **Interaction Flow:**
1. **Tool Discovery:** When an `Agent` with MCP servers runs, the SDK calls `server.list_tools()` on each configured server before invoking the LLM.
2. **Tool Schema Inclusion:** The schemas of the tools discovered via `list_tools()` are included in the `tools` parameter sent to the LLM API, alongside any native function tools defined directly on the agent.
3. **Tool Invocation:** If the LLM decides to call a tool provided by an MCP server, the SDK intercepts the `tool_call`.
4. **MCP Call:** The SDK calls the appropriate `server.call_tool(tool_name, arguments)` method on the corresponding MCP server.
5. **Result Handling:** The MCP server executes the tool and returns the result to the SDK, which then formats it and sends it back to the LLM for the next step.
* **Example (Conceptual using Filesystem Server):**
```python
import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerStdio # Assuming MCP extension is available

async def run_mcp_agent():
# Define connection parameters for the local filesystem server
# (Requires installing the MCP server: npm install @modelcontextprotocol/server-filesystem)
fs_server_params = {
"command": "npx", # Assuming npx is in PATH
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/accessible/directory"],
# Ensure the specified directory exists and is accessible
}

# Connect to the MCP server using a context manager for cleanup
async with MCPServerStdio(params=fs_server_params, name="Local Filesystem", cache_tools_list=True) as fs_server:
# Define the agent, passing the connected MCP server
file_agent = Agent(
name="File Manager",
instructions="Use the available filesystem tools to manage files as requested by the user.",
model="gpt-4o",
mcp_servers=[fs_server] # Add the MCP server here
# Note: No tools explicitly defined here; they come from MCP.
)

# Run the agent
user_request = "Can you list the files in the root directory provided to the server?"
# The LLM should see tools like 'fs_list_directory' from the MCP server
result = await Runner.run(file_agent, input=user_request)
print(result.final_output)

if __name__ == "__main__":
asyncio.run(run_mcp_agent())
```
* **Caching (`cache_tools_list=True`):** Crucial for performance, especially with remote SSE servers, as it avoids repeated `list_tools` calls if the toolset is static. `invalidate_tools_cache()` allows manual refresh.
* **Tracing:** MCP interactions (`list_tools`, `call_tool`) are automatically included in traces.
* **Analysis:** MCP integration significantly broadens the potential tool ecosystem for agents built with the SDK. It enables using standardized, potentially third-party tools without writing custom `FunctionTool` wrappers for each one. This promotes reusability and separation of concerns (agent logic vs. tool implementation). While still potentially early in adoption, MCP support positions the Agents SDK within a potentially larger, interoperable ecosystem of AI context providers.

**4. Lifecycle Hooks (`RunHooks`, `AgentHooks`): Observing and Intervening**

Hooks provide callbacks into the agent execution lifecycle, allowing developers to monitor events or trigger custom logic at specific points.

* **Purpose:** Enable fine-grained observation, custom logging, state synchronization with external systems, or triggering side-effects based on agent progress.
* **Types:**
* **`RunHooks`:** Global hooks passed to `Runner.run()`. Their methods (`on_agent_start`, `on_tool_end`, etc.) are called for *all* agents and tools within that specific run.
* **`AgentHooks`:** Agent-specific hooks set on the `Agent.hooks` parameter. Their methods are called *only* when that particular agent is involved (e.g., `on_start` when it becomes the active agent, `on_handoff` when control is handed *to* it).
* **Key Callback Methods:**
* `on_agent_start(ctx, agent)`: Before an agent's LLM call.
* `on_agent_end(ctx, agent, output)`: After an agent produces its final output for the run.
* `on_tool_start(ctx, agent, tool)`: Before a tool (function, MCP, built-in) is executed.
* `on_tool_end(ctx, agent, tool, result)`: After a tool finishes execution.
* `on_handoff(ctx, from_agent, to_agent)` (`RunHooks`) / `on_handoff(ctx, agent, source)` (`AgentHooks`): When a handoff occurs.
* **Context Access:** All hook methods receive the `RunContextWrapper`, providing access to the shared context object (`wrapper.context`) and current token usage (`wrapper.usage`).
* **Example:**
```python
import asyncio
from agents import (Agent, Runner, function_tool, RunHooks, AgentHooks,
RunContextWrapper, Tool, Usage)
from typing import Any

# --- Global Run Hooks ---
class MyGlobalHooks(RunHooks):
def __init__(self):
self.turn_count = 0

async def on_agent_start(self, context: RunContextWrapper, agent: Agent) -> None:
self.turn_count += 1
print(f"[Global Hook] Starting Turn {self.turn_count} with Agent: {agent.name}")

async def on_agent_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None:
print(f"[Global Hook] Agent {agent.name} finished run. Final Output: {output}")
print(f"[Global Hook] Final Usage: {context.usage.total_tokens} tokens")

# --- Agent-Specific Hooks ---
class MathAgentHooks(AgentHooks):
async def on_start(self, context: RunContextWrapper, agent: Agent) -> None:
print(f"[Math Hook] '{agent.name}' taking over calculation.")

async def on_tool_end(self, ctx: RunContextWrapper, agent: Agent, tool: Tool, result: str) -> None:
print(f"[Math Hook] Tool '{tool.name}' produced result: {result}")

# --- Tools and Agents ---
@function_tool
def add(a: int, b: int) -> int:
"""Adds two numbers."""
return a + b

calculator_agent = Agent(
name="Calculator",
instructions="Use the add tool to perform addition.",
tools=[add],
hooks=MathAgentHooks() # Assign agent-specific hooks
)

# --- Execution ---
async def main():
global_hooks = MyGlobalHooks()
result = await Runner.run(
calculator_agent,
input="What is 5 + 7?",
hooks=global_hooks # Assign global run hooks
)

if __name__ == "__main__":
asyncio.run(main())

# Expected Output Order (Conceptual):
# [Global Hook] Starting Turn 1 with Agent: Calculator
# [Math Hook] 'Calculator' taking over calculation.
# [Math Hook] Tool 'add' produced result: 12
# [Global Hook] Agent Calculator finished run. Final Output: 12
# [Global Hook] Final Usage: ... tokens
```

* **Analysis:** Hooks provide essential observability and control points within the agent execution flow. `RunHooks` are useful for global logging or metrics aggregation across an entire run, while `AgentHooks` allow for agent-specific logic or monitoring. They bridge the gap between the SDK's managed execution loop and the developer's need to react to specific events or inject custom behavior.

**5. Synthesis: An Ecosystem for Advanced Agent Development**

Tracing, Evaluations, MCP, and Hooks are not isolated features but interconnected components of a broader platform vision for agent development:

* **Visibility and Debugging:** Tracing provides the fine-grained visibility needed to understand complex runs involving handoffs, MCP tools, and custom logic triggered by hooks.
* **Performance Measurement:** Evaluations rely on the ability to capture and store agent interactions (facilitated by the Responses API's `store` option and potentially detailed in traces) to systematically measure quality and identify areas for improvement.
* **Extensibility:** MCP dramatically expands the potential capabilities of agents by providing a standardized way to connect to external tools and data sources, moving beyond just built-in tools and basic function calling.
* **Control and Monitoring:** Hooks offer developers specific points to observe and potentially influence the agent's execution path, enabling custom logging, external state synchronization, or real-time feedback mechanisms.

Together, these features aim to provide a robust environment not just for *building* simple agents, but for *developing, deploying, monitoring, evaluating, and iteratively improving* sophisticated, reliable agentic applications capable of handling complex, real-world tasks. They address the practical challenges encountered when moving agents from prototypes to production systems.

---

Okay, let's get into the advanced architectural patterns and considerations involved in building sophisticated agentic systems using the OpenAI Agents SDK and the underlying Responses API. Moving beyond single-agent interactions requires thoughtful design of how agents collaborate, how control flows, and how safety and efficiency are maintained.

**Architecting Agentic Systems: Patterns and Practices**

The Agents SDK, with its core primitives (`Agent`, `Tool`, `Handoff`, `Guardrail`, `Runner`), provides the foundational elements. However, constructing robust applications often involves combining these primitives in specific patterns tailored to the task complexity and desired behavior. The documentation highlights several key patterns, which we will explore in depth, analyzing their mechanisms, use cases, and trade-offs.

**1. Deterministic Agent Flows: Predictable Pipelines**

While the core strength of LLMs lies in their flexible reasoning, some tasks benefit from a more structured, predictable sequence of operations. Deterministic flows achieve this by breaking down a complex task into a predefined series of steps, where each step is handled by a dedicated agent, and the output of one agent serves as the input for the next.

* **Concept:** Instead of a single agent trying to manage a multi-stage process (like research -> outline -> draft -> edit), you define separate `Agent` instances for each stage. Your application code, *outside* the SDK's internal loop, explicitly calls `Runner.run` for each agent sequentially.
* **Mechanism:**
1. Define Agent A (e.g., Outliner), Agent B (e.g., Drafter), Agent C (e.g., Editor).
2. Call `resultA = await Runner.run(Agent A, initial_input)`.
3. Extract the relevant output from `resultA.final_output`.
4. Format this output as the input for the next agent.
5. Call `resultB = await Runner.run(Agent B, input_from_A)`.
6. Repeat for subsequent agents (Agent C using input from B).

* **Example: Content Generation Pipeline**

```python
import asyncio
from agents import Agent, Runner
from pydantic import BaseModel
from typing import List

# --- Define Agent Output Structures (for clarity) ---
class StoryOutline(BaseModel):
title: str
plot_points: List[str]
characters: List[str]

class StoryDraft(BaseModel):
title: str
content: str

class FinalStory(BaseModel):
title: str
final_content: str

# --- Define Agents for Each Stage ---
outline_agent = Agent(
name="Outliner",
instructions="Generate a brief story outline with title, key plot points, and main characters based on the user's theme. Output JSON.",
model="gpt-4o-mini",
output_type=StoryOutline
)

draft_agent = Agent(
name="Drafter",
instructions="Write a short story draft based on the provided outline (title, plot, characters). Focus on narrative flow.",
model="gpt-4o", # Use a more capable model for drafting
output_type=StoryDraft # Output the draft content
)

edit_agent = Agent(
name="Editor",
instructions="Review the provided story draft. Improve clarity, pacing, and grammar. Ensure it aligns with the original title.",
model="gpt-4o-mini",
output_type=FinalStory # Output the final edited content
)

# --- Orchestrate the Deterministic Flow ---
async def generate_story_pipeline(theme: str):
print(f"--- Starting Story Pipeline for theme: {theme} ---")

# Stage 1: Outline
print("Stage 1: Generating Outline...")
outline_result = await Runner.run(outline_agent, input=f"Theme: {theme}")
if not isinstance(outline_result.final_output, StoryOutline):
print("Error: Failed to generate outline.")
return
story_outline: StoryOutline = outline_result.final_output_as(StoryOutline)
print(f"Outline Generated: {story_outline.title}")

# Prepare input for Stage 2
draft_input = f"Title: {story_outline.title}\nPlot Points: {'; '.join(story_outline.plot_points)}\nCharacters: {', '.join(story_outline.characters)}"

# Stage 2: Draft
print("\nStage 2: Generating Draft...")
draft_result = await Runner.run(draft_agent, input=draft_input)
if not isinstance(draft_result.final_output, StoryDraft):
print("Error: Failed to generate draft.")
return
story_draft: StoryDraft = draft_result.final_output_as(StoryDraft)
print(f"Draft Generated for: {story_draft.title}")

# Prepare input for Stage 3
edit_input = f"Title: {story_draft.title}\nDraft Content:\n{story_draft.content}"

# Stage 3: Edit
print("\nStage 3: Editing Draft...")
edit_result = await Runner.run(edit_agent, input=edit_input)
if not isinstance(edit_result.final_output, FinalStory):
print("Error: Failed to edit story.")
return
final_story: FinalStory = edit_result.final_output_as(FinalStory)
print(f"Editing Complete for: {final_story.title}")

print("\n--- Final Story ---")
print(f"Title: {final_story.title}")
print(final_story.final_content)

# if __name__ == "__main__":
# asyncio.run(generate_story_pipeline(theme="a robot discovers gardening"))
```

* **Use Cases:** Content pipelines (writing, translation, summarization), structured data processing (extract -> transform -> load), code generation and review sequences, report generation from data analysis.
* **Trade-offs:**
* **Pros:** High predictability, easier debugging (isolate issues to specific stages), fine-grained control over models and prompts at each step, potentially better cost management by using cheaper models for simpler stages.
* **Cons:** Less flexible; cannot easily adapt to unexpected user inputs or deviations from the planned flow. Requires more application-level code to manage the sequence and data handoff between stages. Might be slower overall due to multiple sequential `Runner.run` calls.

**2. Handoffs and Routing: Intelligent Delegation**

This pattern leverages the LLM's ability to decide when to delegate a task to a more specialized agent. It's core to building modular and scalable agent systems.

* **Concept:** A primary agent (often a "triage" or "orchestrator" agent) receives the initial request. Based on its instructions and the available `handoffs` (presented as tools), it determines if delegation is needed and invokes the appropriate `transfer_to_` tool. Control then shifts to the specialized agent.
* **Mechanism Deep Dive:**
* **Tool Presentation:** The SDK automatically creates function tool schemas for each handoff defined in `Agent.handoffs`. The `name` defaults to `transfer_to_` (using `agent.name.replace(" ", "_")`) and the `description` uses `agent.handoff_description` (falling back to `agent.instructions` or a generic description if needed). These are crucial for the source LLM to understand *when* and *why* to use a particular handoff. Providing clear, distinct `handoff_description`s is key for accurate routing.
* **LLM Decision:** The source agent's LLM analyzes the input and its instructions. If the instructions indicate delegation based on certain conditions, and the input matches those conditions, it calls the corresponding `transfer_to_...` tool. If `input_type` was specified for the handoff, the LLM also generates the required arguments.
* **SDK Interception:** The `Runner` intercepts the call to the special handoff tool.
* **Callback (`on_handoff`):** If an `on_handoff` callback was configured (via `handoff(...)`), it's executed *now*. This allows triggering actions (like logging or pre-fetching data for the target agent) based on the *intent* to handoff, potentially receiving LLM-generated arguments if `input_type` was used.
* **Input Filtering (`input_filter`):** The configured `input_filter` function (if any) is applied to the current conversation history (`HandoffInputData`). This is the critical point to manage the context passed to the target agent. Common filters (`remove_tool_calls`, `keep_last_n_items`, `remove_system_prompts`) help prevent context bloat and tailor the information for the specialist.
* **Agent Transition:** The `Runner` updates its internal state: the `current_agent` becomes the target agent returned by the handoff configuration, and the `input` for the next loop iteration becomes the (potentially filtered) conversation history. The loop continues with the new agent.

* **Example: Advanced Support Routing with Context Filtering**

```python
import asyncio
from agents import Agent, Runner, handoff, HandoffInputData, ItemHelpers
from agents.extensions.handoff_filters import remove_tool_calls
from typing import List

# --- Context Filter ---
def keep_only_user_messages(handoff_data: HandoffInputData) -> HandoffInputData:
""" Keep the original input history + only user messages from the new items. """
# Filter new_items generated during the triage turn
filtered_new_items = [
item for item in handoff_data.new_items
if item.type == 'message_output_item' and item.raw_item.role == 'user'
]
# Reconstruct the input list for the target agent
# Note: This is a simplified example; real filter might need more sophisticated history reconstruction
# depending on how `input_history` and `pre_handoff_items` are structured.
# The goal is to return a HandoffInputData object.
print(f"[Input Filter] Keeping {len(filtered_new_items)} user messages for target agent.")
# For simplicity here, we just pass the filtered new items. A real implementation
# would combine appropriately with input_history / pre_handoff_items.
# This simplified version assumes the target agent only needs the very last user message.
last_user_message = filtered_new_items[-1].to_input_item() if filtered_new_items else []
return HandoffInputData(
input_history=handoff_data.input_history, # Pass original history if needed
pre_handoff_items=(), # Often cleared for specialist
new_items=(last_user_message,) if last_user_message else ()
)

# --- Specialist Agents ---
refund_agent = Agent(
name="Refund Processor",
instructions="Process refund requests. You only need the user's request details.",
model="gpt-4o-mini"
)
account_agent = Agent(
name="Account Manager",
instructions="Handle account management tasks like password resets or profile updates.",
model="gpt-4o-mini"
)

# --- Triage Agent ---
triage_agent = Agent(
name="Support Triage",
instructions="Analyze the user's request. If it's about refunds, handoff to the Refund Processor. If it's about account issues, handoff to the Account Manager. Otherwise, try to answer directly.",
model="gpt-4o-mini",
handoffs=[
handoff(
refund_agent,
handoff_description="Handles all refund and return inquiries.",
input_filter=keep_only_user_messages # Refund agent only needs user request
),
handoff(
account_agent,
handoff_description="Handles password resets, profile updates, and login issues.",
input_filter=keep_only_user_messages # Account agent only needs user request
)
]
)

# --- Execution ---
# async def main():
# result_refund = await Runner.run(triage_agent, "I'd like to return my order #12345.")
# print(f"Handled by: {result_refund.last_agent.name}")
# print(f"Final Response: {result_refund.final_output}")

# result_account = await Runner.run(triage_agent, "I forgot my password, can you help?")
# print(f"\nHandled by: {result_account.last_agent.name}")
# print(f"Final Response: {result_account.final_output}")

# result_general = await Runner.run(triage_agent, "What are your opening hours?")
# print(f"\nHandled by: {result_general.last_agent.name}") # Should be Triage Agent
# print(f"Final Response: {result_general.final_output}")

# if __name__ == "__main__":
# asyncio.run(main())
```
* **Trade-offs:**
* **Pros:** Enables modularity, specialization, potentially using different models for different tasks. Allows LLM intelligence to drive routing. `input_filter` offers context control.
* **Cons:** Control flow is less predictable than deterministic pipelines. Relies on good instructions and `handoff_description`s for accurate routing. Can increase latency if multiple handoffs occur. Context management via filters requires careful design.

**3. Agents as Tools: Controlled Sub-Processing**

This pattern allows an orchestrator agent to invoke other agents as if they were standard function tools, receiving their final output as a result without relinquishing control of the main workflow.

* **Concept:** An orchestrator agent needs specific information or a sub-task performed, which another agent is specialized in. Instead of handing off, it *calls* the specialist agent as a tool.
* **Mechanism (`agent.as_tool()`):**
1. The `as_tool()` method is called on the target agent (e.g., `summarizer_agent.as_tool(...)`).
2. This creates a `FunctionTool` object. The `tool_name` and `tool_description` must be provided. The tool's input schema is implicitly defined to take a string (the input for the target agent).
3. This `FunctionTool` is added to the orchestrator agent's `tools` list.
4. When the orchestrator's LLM decides to call this tool, the `Runner` intercepts it.
5. The `Runner` internally executes a *separate, nested run* of the target agent (`summarizer_agent`) using the arguments provided by the orchestrator as the input.
6. The `final_output` of this nested run is captured. By default, it's converted to a string. A `custom_output_extractor` function can be provided to `as_tool` to customize how the result is extracted and formatted from the nested `RunResult`.
7. This extracted output string is returned as the result of the tool call to the orchestrator agent.
8. The orchestrator agent receives the tool result and continues its own execution flow.

* **Example: Parallel Translation**

```python
import asyncio
from agents import Agent, Runner

# Define translator agents
spanish_translator = Agent(name="Spanish Translator", instructions="Translate the input text to Spanish.")
french_translator = Agent(name="French Translator", instructions="Translate the input text to French.")

# Define orchestrator agent
polyglot_agent = Agent(
name="Polyglot Orchestrator",
instructions="Use the available tools to translate the user's input text into both Spanish and French. Provide both translations.",
model="gpt-4o",
tools=[
spanish_translator.as_tool(
tool_name="translate_to_spanish",
tool_description="Translate the input text to Spanish."
),
french_translator.as_tool(
tool_name="translate_to_french",
tool_description="Translate the input text to French."
)
]
# parallel_tool_calls=True (default in Responses API) is useful here
)

# --- Execution ---
async def main():
text_to_translate = "Hello, world!"
print(f"Orchestrator Goal: Translate '{text_to_translate}' to Spanish and French.")

result = await Runner.run(polyglot_agent, input=text_to_translate)

print("\n--- Orchestrator Run Details ---")
print(f"Final Output:\n{result.final_output}")

print("\nItems generated during run:")
for item in result.new_items:
print(f"- Type: {item.type}, Agent: {item.agent.name}")
if item.type == 'tool_call_item':
# Should show calls to translate_to_spanish and translate_to_french
print(f" Tool Called: {item.raw_item.function.name}")
elif item.type == 'tool_call_output_item':
# Should show the results from the nested agent runs
print(f" Tool Result ({item.raw_item.tool_call_id}): {item.raw_item.result}")

# if __name__ == "__main__":
# asyncio.run(main())

# Expected Final Output: "Okay, here are the translations:
# Spanish: ¡Hola, mundo!
# French: Bonjour, le monde!"
```
*(Note: The parallel execution happens if the LLM generates both tool calls simultaneously, which `gpt-4o` with `parallel_tool_calls=True` is capable of. The SDK's `Runner` would then likely use `asyncio.gather` or similar to execute the nested runs concurrently.)*

* **Trade-offs:**
* **Pros:** Orchestrator retains full control; enables parallel execution of sub-tasks; allows complex synthesis of results from multiple specialist agents.
* **Cons:** Can lead to very complex instructions for the orchestrator agent; context window for the orchestrator grows with each tool call and result; potentially higher token usage if sub-agent outputs are large and need to be fully processed by the orchestrator. The implementation requires careful management of nested `Runner.run` calls if done manually (though `as_tool` abstracts this).

**4. LLM-as-a-Judge / Self-Correction Loops: Iterative Refinement**

This pattern uses LLMs not just to generate content but also to critique and refine it, leading to higher quality outputs through iteration.

* **Concept:** One agent (Generator) produces an output. Another agent (Judge/Critic) evaluates this output against specific criteria. The critique is then fed back to the Generator (or a dedicated Refiner agent) to produce an improved version. This loop can repeat until the Judge is satisfied or a maximum number of iterations is reached.
* **Mechanism:** This requires application-level code to manage the loop.
1. Define Generator Agent and Judge Agent (often with structured output for the critique, e.g., `Critique(score: int, suggestions: List[str])`).
2. Call `Runner.run(Generator, input)` to get initial output.
3. Loop:
a. Prepare input for Judge, including the Generator's output and evaluation criteria.
b. Call `Runner.run(Judge, judge_input)`.
c. Parse the critique from `Judge`'s `final_output`.
d. If critique indicates sufficient quality (e.g., high score) or max iterations reached, break the loop.
e. Prepare input for Generator (or Refiner), including the original goal *and* the critique.
f. Call `Runner.run(Generator/Refiner, refinement_input)` to get the improved output. Use this improved output in the next iteration (step 3a).

* **Example: Iterative Code Refinement**

```python
import asyncio
from agents import Agent, Runner
from pydantic import BaseModel, Field
from typing import List

# --- Structures ---
class CodeCritique(BaseModel):
score: int = Field(..., description="Score from 1-5 (5 is best)")
suggestions: List[str] = Field(..., description="Specific suggestions for improvement")
passes: bool = Field(..., description="True if the code meets quality standards")

# --- Agents ---
code_generator = Agent(
name="Python Coder",
instructions="Generate Python code based on the user request.",
model="gpt-4o"
)

code_critic = Agent(
name="Python Critic",
instructions="Critically evaluate the provided Python code based on correctness, efficiency, and style. Provide a score (1-5), specific suggestions, and a boolean 'passes' flag in JSON format.",
model="gpt-4o-mini", # Cheaper model for critique
output_type=CodeCritique
)

# --- Orchestration Loop ---
async def generate_and_critique_code(request: str, max_iterations: int = 3):
print(f"--- Generating code for: {request} ---")
current_code = ""
critique = None

for i in range(max_iterations):
print(f"\n--- Iteration {i+1} ---")

# Generate or Refine Code
generator_input = request
if critique: # If refining, include the critique
generator_input += f"\nPlease improve the previous version based on this critique:\nScore: {critique.score}\nSuggestions: {'; '.join(critique.suggestions)}"
print("Generating/Refining code...")
generator_result = await Runner.run(code_generator, generator_input)
current_code = generator_result.final_output
print(f"Generated Code:\n```python\n{current_code}\n```")

# Critique Code
print("Critiquing code...")
critic_input = f"Please critique this Python code:\n```python\n{current_code}\n```"
critic_result = await Runner.run(code_critic, critic_input)

if not isinstance(critic_result.final_output, CodeCritique):
print("Error: Critic failed to provide valid critique.")
break
critique = critic_result.final_output_as(CodeCritique)
print(f"Critique: Score={critique.score}, Passes={critique.passes}, Suggestions={critique.suggestions}")

# Check if done
if critique.passes:
print("\nCode passed critique!")
break
elif i == max_iterations - 1:
print("\nMax iterations reached, using last generated code.")
break

print("\n--- Final Code ---")
print(f"```python\n{current_code}\n```")
return current_code

# if __name__ == "__main__":
# asyncio.run(generate_and_critique_code("Write a function to find prime numbers up to n"))
```

* **Trade-offs:**
* **Pros:** Can dramatically improve output quality for subjective or complex tasks. Allows leveraging different model strengths (e.g., fast generator, thorough critic).
* **Cons:** Significantly increases latency and cost due to multiple LLM calls per iteration. Requires careful prompt engineering for both generator and critic. Loop control logic resides in the application code.

**5. Synthesis: Building Sophisticated Systems**

These patterns are not mutually exclusive. Real-world agentic applications frequently combine them:

* A complex research task might use a **deterministic flow** for initial setup, then an **orchestrator agent** using **agents-as-tools** for parallel web/database searches, followed by a **handoff** to a specialized report-writing agent, which internally uses an **LLM-as-a-judge loop** for self-refinement before producing the final output, all potentially protected by input/output **guardrails**.

The OpenAI Agents SDK provides the necessary primitives (`Agent`, `Tool`, `Handoff`, `Guardrail`, `Runner`) and DX features (decorators, context, structured outputs, tracing) to implement these diverse patterns effectively. Building successful agents becomes less about single-shot prompting and more about thoughtful architectural design: decomposing problems, defining specialized agent roles, managing control flow (via code, handoffs, or tools), controlling context, ensuring safety, and establishing loops for refinement and evaluation.

---

Okay, let's synthesize the extensive information provided, focusing on the overarching vision, the interplay between the different components, and the implications for developers building the next generation of AI applications on OpenAI's platform.

**The Converging Platform: Enabling the Agentic Future**

OpenAI's announcements and documentation paint a clear picture: the future of their platform is centered around enabling sophisticated **AI agents** – systems capable of understanding complex goals, reasoning through multi-step plans, interacting with diverse information sources and tools, and ultimately taking action autonomously on behalf of users. This represents a significant evolution from earlier focuses primarily on text generation or simpler chatbot interactions.

The comprehensive suite of tools – the foundational Responses API, integrated Built-in Tools, the orchestrating Agents SDK, and supporting features like Vector Stores, Tracing, and Evaluations – are not disparate releases but tightly integrated components designed to work together, addressing the key challenges developers faced in building reliable agentic systems previously.

**1. The Responses API: The Unified, Future-Proof Core**

The Responses API (`/v1/responses`) stands as the central pillar of this new architecture. It's meticulously designed to be the primary interface for developers building new agentic applications.

* **Synthesis of Strengths:** It successfully merges the stateless simplicity preferred by many developers (like Chat Completions) with the powerful tool-using and state-management capabilities pioneered by the Assistants API. The `previous_response_id` mechanism offers an elegant way to handle conversation history without forcing developers to manage ever-growing message lists client-side or adopt the full object model of Assistants.
* **Designed for Complexity:** The API anticipates agents needing multiple "turns" of reasoning or tool use *within* what the developer perceives as a single logical request. While custom function calls still require a developer-managed loop (call -> get result -> call again), the API's internal handling of built-in tools and potential future multi-step reasoning aims to abstract away some of this complexity over time.
* **Foundation for Tools:** It acts as the natural integration point for OpenAI's expanding suite of built-in tools (Web Search, File Search, Computer Use). Providing these as first-party capabilities callable via a unified `tools` parameter significantly reduces integration friction.
* **Strategic Direction:** OpenAI's clear recommendation to use Responses for new projects and the planned (though distant) deprecation of the Assistants API solidify Responses as the target platform endpoint. Its design as a superset of Chat Completions ensures a smooth adoption path while offering significantly more power for agentic use cases.
* **Developer Experience Focus:** Features like the improved streaming event granularity, the `output_text` SDK helper, dynamic `instructions` handling, and the `"auto"` truncation strategy demonstrate a focus on addressing pain points and simplifying common development patterns observed with previous APIs.

**2. Integrated Tools: The Agent's Senses and Hands**

The built-in tools are not mere add-ons; they are fundamental capabilities granting agents the necessary faculties to interact with the world beyond their training data.

* **Web Search:** Provides the crucial ability to access real-time, dynamic information from the internet, grounded with verifiable citations. Essential for current events, research, and fact-checking.
* **File Search (RAG):** Enables agents to leverage specific knowledge contained within user-provided documents via Vector Stores. This allows for grounding responses in proprietary data, company policies, user manuals, etc., making agents vastly more useful in specific domains. OpenAI manages the underlying RAG complexity (chunking, embedding, retrieval).
* **Computer Use:** The most "agentic" tool, allowing models (specifically CUA) to perceive UI elements and generate sequences of mouse/keyboard actions to automate tasks in browsers (initially) and potentially operating systems (future, with extreme caution). This bridges the gap to interacting with systems lacking APIs but requires significant developer responsibility for safe execution.
* **Synergy:** The true power emerges when these tools are potentially used in combination within a single agent turn or orchestrated sequence (e.g., search the web, refine query based on results, search internal files, then potentially use computer use to act on findings). The Responses API provides the framework for the model to request these tools, and the Agents SDK helps orchestrate their execution and the subsequent reasoning.

**3. The Agents SDK: Orchestrating Complexity**

If the Responses API is the engine, the Agents SDK is the chassis, steering, and control system. It provides the client-side framework for structuring, coordinating, and managing agent behavior.

* **Abstraction Layer:** It manages the interaction loop with the API (calling the model, handling tool calls/results, managing handoffs) based on the defined `Agent` configurations.
* **Modularity:** The `Agent` primitive encourages breaking down complex tasks into smaller, specialized units.
* **Control Flow Mechanisms:**
* **Handoffs:** Enable intelligent, LLM-driven delegation between specialized agents, crucial for routing and complex workflow management. `input_filter` provides vital context control.
* **Agents-as-Tools:** Allows an orchestrator agent to query sub-agents without relinquishing control, useful for parallel processing or information gathering.
* **Code-Driven Orchestration:** Developers retain full control to sequence agent runs deterministically using standard Python async patterns, calling `Runner.run` multiple times and managing data flow explicitly.
* **Safety and Validation:** `Guardrails` integrate safety checks (relevance, policy adherence) concurrently, allowing for early termination via tripwires.
* **Developer Experience:** `@function_tool` automation, typed context, lifecycle hooks (`RunHooks`, `AgentHooks`), and structured results (`RunResult`) significantly streamline development.
* **Open Source and Extensibility:** Its open-source nature invites community contribution and adaptation. Support for custom `ModelProvider`s (including the LiteLLM integration) allows using models beyond OpenAI's, enhancing flexibility.

**4. The Development Lifecycle: An Integrated Ecosystem**

OpenAI is building not just individual components but an integrated platform supporting the end-to-end lifecycle of agent development.

* **Build & Deploy:** The Responses API and Agents SDK provide the core tools for initial creation and orchestration.
* **Monitor & Debug:** **Tracing** is the critical component here. It automatically captures detailed execution flows, providing invaluable insight into complex agent interactions, tool usage, and handoffs, making debugging feasible. The ability to visualize traces on a dashboard is key.
* **Evaluate & Improve:** The **Evals API** allows systematic testing of agent performance against defined criteria, using data potentially sourced from stored Responses API interactions (enabled by `store=true`). This enables objective comparison and regression testing. **Fine-tuning** provides a path to further specialize models based on evaluation results or specific task data. **Vector Stores** allow for continuous knowledge updates for RAG-based agents.

This integrated loop (Build -> Run/Store -> Trace/Monitor -> Evaluate -> Fine-tune/Refine) is essential for moving agents from prototypes to reliable production systems.

**5. Advanced Capabilities and Future Directions**

* **Multimodality:** The foundation is laid (Responses API handling image input, models like GPT-4o). Agents can potentially reason about visual information alongside text and tool outputs.
* **Realtime & Voice:** While the core documentation focused on text/tool-based agents, the mention of the Realtime API and voice support in the Agents SDK points towards enabling conversational voice agents with similar orchestration capabilities.
* **Reasoning Models (o-series):** Newer models explicitly designed for better planning, reasoning, and handling complex tasks are central to enabling more autonomous agent behavior. Configuration options like `reasoning_effort` provide some control over this process.
* **Platform Evolution:** OpenAI explicitly states plans to release additional tools and capabilities to further simplify agent building, suggesting ongoing investment in this area. The Responses API is designed to accommodate these future enhancements.

**6. Challenges and Considerations for Developers**

Despite the advancements, building and deploying sophisticated agents presents ongoing challenges:

* **Reliability & Predictability:** Ensuring agents consistently achieve complex goals, handle edge cases gracefully, and avoid harmful or nonsensical actions remains a significant challenge, especially for open-ended tasks. The CUA benchmarks clearly show that full UI automation is far from solved.
* **Safety & Alignment:** As agents become more autonomous and capable of taking actions (especially via Computer Use), ensuring they operate within safe boundaries, respect policies, and avoid misuse becomes paramount. Robust guardrails, sandboxing, human oversight, and careful prompt engineering are non-negotiable.
* **Cost Management:** Multi-step workflows, potentially involving multiple LLM calls per user turn (especially with critique loops or complex tool interactions), use of specialized models (CUA), per-query tool fees (search), and data storage (Vector Stores) can lead to significant operational costs. Careful design and monitoring are required.
* **Debugging Complexity:** While tracing helps immensely, debugging the emergent behavior of multiple interacting LLMs and tools can still be intricate. Understanding *why* an agent made a particular decision or failed requires deep analysis of traces and potentially intermediate states.
* **Evaluation Complexity:** Defining effective evaluation metrics for complex, multi-turn, goal-oriented agent tasks is harder than evaluating simple classification or generation. It requires defining success criteria carefully and potentially involving human judgment alongside automated checks.
* **Context Management:** Even with `previous_response_id` and `truncation="auto"`, effectively managing the context window for very long interactions or complex state transitions requires careful consideration, potentially involving summarization techniques or more sophisticated state tracking outside the core API/SDK mechanisms.

**7. The Road Ahead: Implications for Development**

This platform shift signifies a move towards:

* **System-Level Thinking:** Developers need to think beyond single prompts and design entire workflows, considering agent roles, interaction patterns, data flow, and safety mechanisms.
* **Orchestration as a Core Skill:** Proficiency with frameworks like the Agents SDK or similar orchestration tools becomes essential for managing complexity.
* **Leveraging Integrated Tools:** Effectively utilizing built-in tools like Search and File Search can provide significant advantages over building custom solutions.
* **Emphasis on Evaluation and Iteration:** The development process becomes more iterative, relying heavily on tracing, evaluation, and refinement loops to improve agent performance and reliability.
* **Increased Responsibility:** With agents capable of taking more significant actions (especially Computer Use), developers bear a greater responsibility for ensuring safe and ethical deployment.

---

**Supporting Infrastructure: Vector Stores, Files, and Uploads**

While the Responses API and Agents SDK provide the core interaction and orchestration layers, agents often rely on external data. OpenAI provides managed infrastructure for handling files and enabling efficient retrieval, primarily through the Vector Stores and Files APIs.

**1. Vector Stores API (`/v1/vector_stores`): The Agent's Persistent Knowledge Base**

Vector Stores are more than just a backend for the File Search tool; they are a distinct platform capability for managing and searching embedded document collections.

* **Core Functionality:** Create, manage, and query collections of files optimized for semantic search. They underpin the `file_search` tool but can also be queried directly.
* **Lifecycle Management:**
* **Creation (`POST /v1/vector_stores`):** Creates an empty store or initializes it with `file_ids`. Crucially allows specifying `chunking_strategy` (auto, static with defined token sizes/overlap) and `expires_after` policy (anchor to creation or last activity, duration in days) for automatic cleanup and cost management.
```python
# Example: Creating a Vector Store with custom chunking and expiration
from openai import OpenAI
client = OpenAI()
# Assume file_ids = [file1.id, file2.id] exists
vs = client.vector_stores.create(
name="Project Codex - 7 Day Expiry",
file_ids=file_ids,
chunking_strategy={
"type": "static",
"static": { "max_chunk_size_tokens": 600, "chunk_overlap_tokens": 150 }
},
expires_after={ "anchor": "last_active_at", "days": 7 }
)
print(f"Created Vector Store {vs.id} with status {vs.status}")
# Status will likely be 'in_progress' initially
```
* **File Association (`POST /v1/vector_stores/{vs_id}/files`):** Adds individual files to an existing store. Also supports setting file-specific `attributes` (metadata key-value pairs) and overriding the store's default `chunking_strategy` for that specific file.
* **Batch File Association (`POST /v1/vector_stores/{vs_id}/file_batches`):** Efficiently adds multiple `file_ids` in one operation. Returns a batch object (`vector_store.file_batch`) whose status can be monitored (`GET /v1/vector_stores/{vs_id}/file_batches/{batch_id}`) and potentially cancelled (`POST .../cancel`). Files within the batch can be listed (`GET .../files`).
* **Listing/Retrieving/Modifying/Deleting:** Standard CRUD operations are available for Vector Stores (`GET`, `POST /{id}`, `DELETE /{id}`) and files within them (`GET /files`, `GET /files/{file_id}`, `DELETE /files/{file_id}`). Modification mainly covers `name`, `expires_after`, and `metadata`. Deleting a file from a store removes its chunks but doesn't delete the underlying File object.
* **Status Tracking:** Both Vector Stores and files within them have a `status` field (`in_progress`, `completed`, `failed`, `cancelled`, `expired`). File processing (chunking/embedding) happens asynchronously after creation/association. Agents should ideally wait for `completed` status before relying on file search for those files. The `file_counts` object on the Vector Store provides a summary.
* **Direct Semantic Search (`POST /v1/vector_stores/{vs_id}/search`):** This powerful endpoint allows querying a Vector Store *outside* the context of the Responses or Assistants APIs.
* **Inputs:** Takes a `query` string, `max_num_results`, optional `filters` based on file attributes, and optional `ranking_options`. A `rewrite_query` flag suggests potential internal query expansion/rewriting for better results.
* **Output:** Returns a paginated list (`vector_store.search_results.page`) of relevant chunks, including the `file_id`, `filename`, `score`, source file `attributes`, and the actual chunk `content` (usually text).
* **Use Cases:** Powering custom RAG implementations, semantic search features in applications, data exploration tools, feeding context to *other* APIs (like Chat Completions) if needed.
* **Attributes and Filtering:** Attaching metadata (`attributes`) to files when adding them to a Vector Store (`POST .../files` or `POST .../file_batches`) enables powerful filtering during search (`POST .../search` using the `filters` parameter). This allows searching only within specific subsets of documents (e.g., files tagged with a certain `author` or `date` range).
* **Analysis:** The Vector Stores API provides a robust, managed solution for document storage, chunking, embedding, and semantic search. Its integration with File Search is key for agents, but the standalone `/search` endpoint and attribute filtering make it a versatile tool for broader RAG and search applications on the platform. The batch operations and status tracking are essential for managing large document sets efficiently. Expiration policies are a welcome addition for cost control.

**2. Files API (`/v1/files`) & Uploads API (`/v1/uploads`): Handling the Raw Data**

These APIs are the foundation for getting document data onto the OpenAI platform for use with Vector Stores, Fine-tuning, Batch API, Assistants, etc.

* **Files API (`/v1/files`):**
* **Purpose:** Manages individual file objects within an organization's storage space (limit 100 GB total, 512 MB per file initially).
* **Upload (`POST /v1/files`):** The standard way to upload smaller files (<512 MB). Requires specifying the `file` data and the `purpose` (e.g., `assistants`, `fine-tune`, `batch`, `vision`, `user_data`, `evals`). The `purpose` dictates validation checks and how the file can be used. Returns a `File` object with an `id`.
* **Operations:** List files (with filtering by `purpose`), retrieve file metadata (`GET /{file_id}`), delete a file (`DELETE /{file_id}`), and retrieve the raw file content (`GET /{file_id}/content`).
* **Uploads API (`/v1/uploads`): For Large Files**
* **Purpose:** Handles uploading files larger than the standard API limit, up to 8 GB, by breaking them into parts. Essential for large datasets or documents.
* **Workflow:**
1. **Create Upload (`POST /v1/uploads`):** Initiates the process. Requires specifying the total expected `bytes`, `filename`, `mime_type` (which must match allowed types for the `purpose`), and `purpose`. Returns an `Upload` object (`status: "pending"`) with an `upload_id`. Uploads expire after 1 hour if not completed.
2. **Add Parts (`POST /v1/uploads/{upload_id}/parts`):** Upload individual chunks (parts) of the file (up to 64 MB each). Can be done in parallel. Each successful part upload returns a `Part` object with a `part_id`.
3. **Complete Upload (`POST /v1/uploads/{upload_id}/complete`):** Finalizes the upload. Requires providing the `part_ids` in the *correct order*. The total bytes of the completed parts must match the `bytes` specified during creation. Optionally include an `md5` checksum for verification. Returns the `Upload` object (`status: "completed"`) which now contains a nested `File` object (with its own `file-id`) that is ready for use like any other file on the platform.
4. **Cancel Upload (`POST /v1/uploads/{upload_id}/cancel`):** Aborts an in-progress upload.
* **Analysis:** The Files API is the standard gateway for bringing data in. The Uploads API provides a necessary solution for large files, breaking down a potentially unreliable single large transfer into manageable, parallelizable chunks with a final assembly step. Understanding the `purpose` parameter is critical as it governs file validation and usability across different OpenAI features.

**Context on Deprecating Assistants API Objects**

While the focus is on the Responses API and Agents SDK, understanding the structure of the outgoing Assistants API objects is helpful for developers migrating or comparing approaches.

* **`Assistant` (`/v1/assistants`):** Represents the configured agent (model, instructions, tools, resources). Conceptually similar to `agents.Agent` in the SDK, but managed via API state rather than just client-side code. Contained `tool_resources` to link files/vector stores directly.
* **`Thread` (`/v1/threads`):** A persistent conversation container. Analogous to the conversation history implicitly managed by `previous_response_id` in the Responses API or explicitly managed by the developer across `Runner.run` calls in the SDK. Threads were persistent API objects. The SDK doesn't have a direct `Thread` object; history is managed via input lists.
* **`Message` (`/v1/threads/{thread_id}/messages`):** Individual user or assistant messages within a Thread. Included `content` (text/image), `role`, `attachments`. Similar to message items in the Responses API input/output and SDK `RunItem`s.
* **`Run` (`/v1/threads/{thread_id}/runs`):** Represents an execution of an Assistant on a Thread. Triggered by `POST`, its `status` progressed through states like `queued`, `in_progress`, `requires_action`, `completed`, `failed`, etc. Contained detailed status, error info, usage, and linked to generated messages/steps. Conceptually similar to the execution managed by `agents.Runner`, but represented as a persistent API object. The `requires_action` status (specifically `submit_tool_outputs`) was the mechanism for handling function calls, requiring a separate API call (`POST .../submit_tool_outputs`) from the developer.
* **`Run Step` (`/v1/threads/{thread_id}/runs/{run_id}/steps`):** Detailed breakdown of a Run's execution, showing `message_creation` steps and `tool_calls` steps (including Code Interpreter execution details or File Search queries). Analogous to the information captured in SDK Tracing spans (`generation_span`, `function_span`, tool-specific spans).
* **Code Interpreter:** Notably, the Code Interpreter tool (allowing the Assistant to write and run Python code in a sandboxed environment, including generating files/images) was a key feature of the Assistants API. The documentation states OpenAI is working to bring this tool to the Responses API, implying it wasn't available there at the time of the announcement (March 2025) but is planned for feature parity before the Assistant API's deprecation (mid-2026 target).
* **Migration Context:** Developers migrating from Assistants will need to adapt to managing conversation state via `previous_response_id` or client-side history lists (`result.to_input_list()`), using the Agents SDK for orchestration instead of polling Run statuses, and eventually using the Code Interpreter tool via the Responses API/SDK once available. The underlying concepts of agents, tools, and conversation flow remain, but the implementation shifts from API-managed state objects (Thread, Run) to a more stateless API (Responses) orchestrated by a client-side SDK (Agents).

**Other Supporting APIs in the Agentic Ecosystem**

While not the primary focus of the agent-building announcement, other OpenAI APIs play supporting roles:

* **Embeddings API (`/v1/embeddings`):** Essential for developers building *custom* RAG or memory systems *outside* of OpenAI's managed Vector Stores. Allows converting text into vector representations for similarity search in external vector databases. The Agents SDK could potentially interact with such custom stores via function tools.
* **Moderations API (`/v1/moderations`):** A crucial safety tool. Can be used alongside or within Agents SDK Guardrails to classify input or output text/images against OpenAI's safety policies (hate, harassment, violence, etc.). Provides category scores for granular analysis.
* **Fine-tuning API (`/v1/fine_tuning/jobs`):** Allows specializing models (like `gpt-4o-mini`) on custom datasets. In an agentic context, data collected from agent interactions (captured via `store=true` and potentially analyzed via Tracing/Evals) could be used to fine-tune models for better performance on specific tasks, tool usage patterns, or desired response styles. Checkpoint management and sharing via permissions (`/v1/fine_tuning/checkpoints/.../permissions`) allow controlled deployment within an organization.
* **Batch API (`/v1/batches`):** Offers asynchronous processing of large numbers of API requests (including to `/v1/responses`, `/v1/chat/completions`, `/v1/embeddings`) at a discount, with results returned within 24 hours. Useful for large-scale evaluations, data processing, or non-interactive agent tasks.
* **Audio APIs (`/v1/audio/...`):**
* **Speech (TTS):** Generates audio from text (`/v1/audio/speech`), enabling agents to respond with voice. Supports various voices and speed control.
* **Transcription (STT):** Converts audio to text (`/v1/audio/transcriptions`), allowing agents to understand spoken input. Supports language detection and different output formats.
* **Translation:** Translates audio directly into English text (`/v1/audio/translations`).
* **Realtime API (`/v1/realtime/...`):** Specifically designed for low-latency, bidirectional audio/text communication (via WebRTC/WebSockets) with models like `gpt-4o-realtime-preview`. Essential for building conversational voice agents. Involves managing sessions, audio buffers, and handling specific client/server events for turn detection, transcription, and audio streaming. The Agents SDK documentation mentions voice support, likely integrating with this API or similar capabilities.
* **Images API (`/v1/images/...`):** Generates or manipulates images. While less central to typical agent orchestration, an agent could potentially use function calling to invoke this API for creative tasks or image editing based on user requests or its own reasoning.
* **Completions API (Legacy) (`/v1/completions`):** The older text-completion endpoint. While still functional, OpenAI strongly recommends using Chat Completions or Responses API for newer models and features. Its relevance diminishes significantly in the context of the new agent-focused platform.

**SDK Configuration and Utilities**

* **API Setup:** `set_default_openai_key`, `set_default_openai_client`, `set_default_openai_api` provide programmatic control over the default API key, client instance (useful for custom base URLs or providers with OpenAI-compatible APIs), and target API endpoint (Responses vs. Chat Completions) used by the SDK, overriding environment variables if needed.
* **Tracing Configuration:** `set_tracing_export_api_key` allows using a separate key for uploading traces. `set_tracing_disabled` globally turns tracing off. `add_trace_processor`/`set_trace_processors` enable integration with custom/third-party observability backends.
* **Logging:** `enable_verbose_stdout_logging()` provides easy debug output. Standard Python logging configuration (`logging.getLogger("openai.agents")`) allows fine-grained control. Environment variables (`OPENAI_AGENTS_DONT_LOG_MODEL_DATA`, `OPENAI_AGENTS_DONT_LOG_TOOL_DATA`) help manage sensitive data in logs.
* **Visualization (`draw_graph`):** The `agents.extensions.visualization.draw_graph` function (requires `openai-agents[viz]`) generates Graphviz diagrams of agent structures, showing agents, tools, and handoff relationships. Useful for understanding and documenting complex agent architectures.
* **Computer Interface (`AsyncComputer`):** Defines the contract (`screenshot`, `click`, `type`, etc.) that developers must implement to connect the `ComputerTool` to an actual automation backend (like Playwright, Selenium, or OS-level libraries). The SDK provides the `LocalPlaywrightComputer` as a reference implementation for browser automation.

**API Operational Details**

* **Authentication:** Standard `Authorization: Bearer OPENAI_API_KEY`. Organization/Project context can be specified via headers (`OpenAI-Organization`, `OpenAI-Project`). Admin APIs require special Admin API Keys obtainable only by Org Owners.
* **Debugging:** `x-request-id` header in responses is crucial for troubleshooting with OpenAI support. SDKs typically expose this ID on response objects.
* **Rate Limiting:** Responses include `x-ratelimit-*` headers detailing request/token limits, remaining quotas, and reset times. Applications need robust handling for rate limit errors (typically HTTP 429). Project-level rate limits can be managed via the Admin API.
* **Backward Compatibility:** OpenAI commits to stability for the REST API (v1), SDKs (semantic versioning), and model families (`gpt-4o`), but *not* for specific model snapshot behavior or prompting nuances between versions. Pinned model versions and evaluations are recommended for ensuring consistent application behavior. Backwards-compatible changes (new endpoints, optional parameters, added response properties) are expected, while breaking changes are rare and announced via changelogs.

**Final Synthesis: A Maturing Platform for Intelligent Automation**

The collection of APIs, the Agents SDK, built-in tools, and supporting infrastructure detailed in the documentation represent a significant maturation of OpenAI's platform towards enabling sophisticated, goal-oriented AI agents.

* **Layered Architecture:** There's a clear layering: Foundational Models & APIs (Responses, Vector Stores, Files, etc.) providing capabilities -> Agents SDK providing orchestration, structure, and DX -> Developer Application implementing specific logic, UI, and potentially custom tool backends/context.
* **Reduced Undifferentiated Heavy Lifting:** OpenAI is absorbing complexity related to RAG (Vector Stores/File Search), web interaction (Web Search), UI automation intent (Computer Use), and basic orchestration loops (Agents SDK Runner), allowing developers to focus higher up the stack.
* **Emphasis on Lifecycle Management:** The inclusion of integrated Tracing, the Evals API, Fine-tuning capabilities, and configuration management APIs points towards supporting the full MLOps lifecycle for agentic systems, not just initial development.
* **Flexibility and Control:** While providing managed components, the platform retains flexibility through custom function tools, dynamic instructions/hooks, context injection, configurable handoff filters, custom model providers, and extensible tracing.
* **Safety as a Key Concern:** Explicit discussion of risks (especially with Computer Use), provision of safety tools (Moderations API, Guardrails), and controls over data storage/tracing indicate an awareness of the responsibilities accompanying more capable agents.

Building agents on this platform requires understanding the interplay between the stateless power of the Responses API, the client-side orchestration provided by the Agents SDK, the capabilities granted by built-in and custom tools, and the surrounding ecosystem for data management, evaluation, and monitoring. It's a powerful, albeit complex, toolkit poised to redefine how developers integrate AI to automate tasks and create intelligent applications.

---

**Platform Management and Operational Considerations**

Building and deploying agents involves more than just writing code; it requires managing resources, controlling access, monitoring usage, and ensuring security. OpenAI provides administrative APIs and features to support these operational aspects.

**1. Organization and Project Administration (Admin API)**

A key aspect, especially for enterprise use, is the ability to manage users, projects, API keys, and permissions programmatically. This is handled via endpoints under `/v1/organization/...` and requires a special **Admin API Key**, obtainable only by Organization Owners from the platform settings.

* **Hierarchical Structure:** OpenAI uses a hierarchy: Organization -> Project -> User/Service Account/API Key.
* **Organization:** The top-level entity. Settings like global rate limits, billing, and user roles (Owner, Reader) apply here.
* **Project:** A container within an organization used to group resources, manage access, and track usage/costs separately. Every organization has a default project. Projects can be created, named, modified, and *archived* (but not deleted, except potentially the default project). Archived projects become inactive.
* **User Management (`/v1/organization/users`, `/v1/organization/invites`):**
* Org Owners can invite users via email (`POST /v1/organization/invites`), assigning an organization-level role (`owner` or `reader`). Invites expire and can be revoked (`DELETE /v1/organization/invites/{invite_id}`).
* Org Owners can list users (`GET /v1/organization/users`), retrieve user details (`GET /{user_id}`), modify their org role (`POST /{user_id}`), and remove them (`DELETE /{user_id}`).
* Users must accept an invite to become part of the organization.
* **Project Membership (`/v1/organization/projects/{project_id}/users`):**
* Users must first be members of the organization before being added to a project.
* Project roles are distinct from org roles: `owner` (full control within the project) or `member` (can use project resources, typically manage their own keys).
* Project Owners (or Org Owners) can add existing org users to a project (`POST .../users`), list members (`GET .../users`), modify project roles (`POST .../{user_id}`), and remove users from the project (`DELETE .../{user_id}`).
* Invites can optionally grant project membership upon acceptance.
* **Service Accounts (`/v1/organization/projects/{project_id}/service_accounts`):**
* **Purpose:** Represent non-human actors (e.g., backend applications, CI/CD pipelines). Crucially, their API keys are *not* tied to a specific user account and remain valid even if the creating user leaves the organization. This is essential for production applications.
* **Management:** Created within a specific project (`POST .../service_accounts`). Returns the service account details *and* an unredacted API key (`value` field) - **this key must be stored securely immediately as it won't be shown again**. Service accounts can be listed, retrieved, and deleted (`DELETE .../{service_account_id}`). They also have project roles (`owner` or `member`).
* **API Key Management:**
* **Org-Level Listing (`GET /v1/organization/admin_api_keys`):** Lists *all* Admin API keys and Project API keys across the organization (requires Admin key).
* **Project-Level Keys (`/v1/organization/projects/{project_id}/api_keys`):** Lists keys *within* a specific project. Can retrieve details (`GET .../{key_id}`) showing the redacted key (`sk-...abc`), name, creation/last used times, and owner (which can be a User or a Service Account). Crucially, Admin API can *delete* (`DELETE .../{key_id}`) keys owned by *users*, but **cannot create user keys** (users must generate their own keys via the platform UI). It *can* manage the lifecycle of keys associated with Service Accounts (as creation returns the key, and deletion removes the account and implicitly its key).
* **Rate Limits (`/v1/organization/projects/{project_id}/rate_limits`):**
* Admins can view (`GET`) and modify (`POST .../{rate_limit_id}`) rate limits (Requests Per Minute, Tokens Per Minute, etc.) on a per-model, per-project basis.
* Project limits must be less than or equal to the overall organization limits. Allows fine-grained control over resource allocation and preventing runaway costs in specific projects.
* **Audit Logs (`/v1/organization/audit_logs`):**
* Provides an immutable log of significant actions within the organization (project creation/archival, user invites/role changes, API key creation/deletion, certificate management, rate limit changes, etc.). Requires opt-in via settings.
* Crucial for security monitoring, compliance, and incident investigation.
* Logs include timestamp (`effective_at`), event `type`, details specific to the event type (e.g., `project.created` includes the project ID), and detailed `actor` information (user ID/email, API key ID, or session details including IP address/user agent for UI actions). Supports filtering by actor, event type, resource ID, project, and time range.
* **Analysis:** The Admin API provides essential governance capabilities for organizations using OpenAI at scale. The distinction between user-owned keys and service accounts is critical for production stability. Audit logs are fundamental for security. Project-level controls for membership and rate limits allow for better resource management and cost attribution within larger teams or companies.

**2. Fine-tuning: Specializing Agents and Models**

Fine-tuning allows adapting base models (like `gpt-4o-mini`) to specific tasks, data formats, or desired response styles, potentially improving agent performance and reliability.

* **Process Overview:**
1. **Prepare Data:** Create training (and optional validation) datasets formatted as JSONL files. The exact format depends on the tuning method and base model type (Chat vs. legacy Completions).
2. **Upload Files:** Use the Files API (`POST /v1/files`) with `purpose="fine-tune"`.
3. **Create Job (`POST /v1/fine_tuning/jobs`):** Specify the `training_file` ID, the base `model` ID, and optionally `validation_file`, `method` (including hyperparameters), `suffix` for the new model name, `integrations` (like Weights & Biases), and `metadata`.
4. **Monitor Job:** Track progress via `GET /v1/fine_tuning/jobs/{job_id}` (shows `status`: `validating_files`, `queued`, `running`, `succeeded`, `failed`, `cancelled`) and detailed events via `GET .../events`.
5. **Use Model:** Once `status` is `succeeded`, the `fine_tuned_model` field contains the ID of the new custom model (e.g., `ft:gpt-4o-mini:my-org:custom-suffix:abc123`). This ID can then be used in the `model` parameter of API calls (Responses, Chat Completions) or within `agents.Agent` definitions.
* **Training Data Formats (JSONL):**
* **Supervised Chat (Default):** Each line is a JSON object containing `{"messages": [{"role": "user", ...}, {"role": "assistant", ...}]}` representing a conversation turn. Can include `tool_calls` in the assistant message and corresponding `tools` definitions at the top level of the JSON object for teaching tool usage.
```json
{"messages": [{"role": "system", "content": "You are a helpful SQL assistant."}, {"role": "user", "content": "Show me active users."}, {"role": "assistant", "content": "SELECT * FROM users WHERE status = 'active';"}]}
{"messages": [{"role": "user", "content": "How many users are inactive?"}], "assistant": {"content": "SELECT COUNT(*) FROM users WHERE status = 'inactive';"}}
```
* **Preference/DPO Chat (`method={"type": "preference"}`):** Each line contains `{"input": {"messages": [...]}, "preferred_completion": [{"role": "assistant", ...}], "non_preferred_completion": [{"role": "assistant", ...}]}`. Used for methods like Direct Preference Optimization (DPO) to teach the model preferences between two possible responses.
* **Legacy Completions:** Each line is `{"prompt": "...", "completion": "..."}`. Used for older base models like `gpt-3.5-turbo-instruct`.
* **Hyperparameters & Method:**
* The `method` object in the job creation request defines the tuning strategy.
* `"supervised"` (default): Standard fine-tuning. Hyperparameters like `n_epochs`, `batch_size`, `learning_rate_multiplier` can be set (often `"auto"` is sufficient).
* `"preference"`: For DPO-style tuning. Specific hyperparameters for this method apply.
* The older top-level `hyperparameters` field is deprecated in favor of nesting them under the `method` object.
* **Checkpoints (`GET .../checkpoints`):** Fine-tuning jobs periodically save checkpoints (`fine_tuning.job.checkpoint` objects). Each checkpoint represents a usable model snapshot at a specific training step (`ft:...:ckpt-step-XXXX`) and includes validation metrics (`train_loss`, `valid_loss`, `valid_mean_token_accuracy`, etc.) if a validation file was provided. This allows selecting the best-performing checkpoint instead of just the final one.
* **Checkpoint Permissions (`/v1/fine_tuning/checkpoints/.../permissions`):** Requires Admin API Key. Org Owners can grant specific *projects* within their organization access to use a fine-tuned model checkpoint created in another project. This enables sharing specialized models across teams without duplicating training efforts. Permissions are managed via `POST` (create) and `DELETE` (revoke).
* **Integrations:** Can enable integrations like Weights & Biases (`wandb`) for more detailed logging and visualization of the fine-tuning process.
* **Analysis:** Fine-tuning is a powerful tool for specializing agents. Being able to fine-tune on conversational data, including tool usage patterns, can significantly improve an agent's ability to follow instructions, use tools correctly, and adopt a specific persona or response style. Checkpoints provide valuable control over model selection based on performance metrics. Cross-project checkpoint permissions are crucial for efficient model sharing in larger organizations.

**3. Usage and Cost Monitoring APIs**

Understanding API consumption is vital for managing costs and identifying usage patterns. OpenAI provides granular usage data via `/v1/organization/usage/...` endpoints and aggregated cost data via `/v1/organization/costs`.

* **Usage APIs (`/v1/organization/usage/...`):**
* **Endpoints:** Separate endpoints exist for different usage types: `completions` (covers Responses, Chat Completions, legacy Completions), `embeddings`, `moderations`, `images`, `audio_speeches` (TTS), `audio_transcriptions` (STT), `vector_stores`, `code_interpreter_sessions`.
* **Structure:** Responses are paginated lists of time `bucket` objects. Each bucket represents a time window (`start_time`, `end_time`) determined by the `bucket_width` parameter (`1m`, `1h`, or `1d`).
* **Results within Buckets:** Each bucket contains a `results` array. Each item in `results` represents aggregated usage metrics (e.g., `input_tokens`, `output_tokens`, `num_model_requests`, `images`, `seconds`, `characters`, `usage_bytes`, `num_sessions`) for a specific grouping within that time bucket.
* **`group_by` Parameter:** This is key for slicing data. You can group by `project_id`, `user_id`, `api_key_id`, `model`, or other dimensions specific to the endpoint (like `batch` for completions, `size`/`source` for images). If `group_by` is used, the corresponding fields in the result object (e.g., `result.project_id`, `result.model`) will be populated; otherwise, they are `null`, representing the aggregate across all possibilities for that dimension.
* **Filtering:** Can filter by time range (`start_time`, `end_time`), specific `project_ids`, `user_ids`, `api_key_ids`, `models`, etc.
* **Costs API (`/v1/organization/costs`):**
* **Purpose:** Provides aggregated cost information, designed to align closely with billing invoices. Recommended for financial tracking.
* **Structure:** Similar bucketed structure, currently supporting only `bucket_width="1d"`.
* **`group_by`:** Supports grouping by `project_id` and/or `line_item` (corresponding to invoice line items like "GPT-4o", "Image models", "Fine-tuning", "Vector Store Storage").
* **Output:** Each result shows the `amount` (with `value` and `currency`) for the specified group within the time bucket.
* **Reconciliation:** The documentation notes that granular Usage API data might not *perfectly* reconcile with the Costs API or invoices due to minor timing/recording differences. For billing purposes, Costs API/Dashboard are preferred.
* **Analysis:** These APIs provide essential visibility into resource consumption. The `group_by` functionality is powerful for attributing usage and costs to specific projects, users, keys, or models, enabling chargebacks, identifying high-usage areas, and optimizing spending. The distinction between Usage (detailed metrics) and Costs (billing-aligned value) is important to understand.

**4. Advanced SDK Utilities and Considerations**

Beyond the core primitives, the SDK includes helpers and requires consideration of certain aspects:

* **`ItemHelpers`:** A utility class providing static methods for common tasks related to processing `RunItem`s or API response/input items. Examples:
* `extract_last_text`/`extract_last_content`: Get text from a message item.
* `text_message_output`/`text_message_outputs`: Aggregate text across content parts or multiple message items.
* `tool_call_output_item`: Helper to construct the `FunctionCallOutput` item needed when submitting tool results.
* `input_to_new_input_list`: Standardizes input (string or list) into the list format needed internally.
* **`AsyncComputer` Interface:** When using `ComputerTool`, the developer *must* provide a class implementing this interface. The SDK doesn't perform the UI automation itself; it relies on the developer's implementation (like the provided `LocalPlaywrightComputer` example) to translate the CUA model's actions into actual browser/OS commands. This is a significant implementation detail requiring external libraries (Playwright, Selenium, etc.) and careful handling of the execution environment.
* **Error Handling:** The SDK defines specific exceptions (`MaxTurnsExceeded`, `ModelBehaviorError`, `UserError`, `*GuardrailTripwireTriggered`). Robust applications need `try...except` blocks to catch these and handle them appropriately (e.g., inform the user, retry, log). The `failure_error_function` for function tools provides fine-grained control over how tool errors are presented back to the LLM.
* **Context Management (`RunContextWrapper`):** While simple to use, developers need to ensure the context object passed to `Runner.run` is accessible and contains the necessary data/dependencies for all tools, hooks, and dynamic instructions that might be invoked during that run. The generic typing (`Agent[TContext]`) aids type safety.
* **Beta/Preview Features:** Developers using features marked Beta (Assistants API components - though being superseded, Realtime API, Fine-tuning Checkpoint Permissions) or Preview (Web Search tool, Computer Use tool) should be aware that APIs and functionality might change, and access might be limited. Thorough testing and monitoring are especially important for these features.

**5. Security, Privacy, and Data Handling**

* **API Keys:** Treat all API keys (regular, Admin, Service Account) as secrets. Do not embed them in client-side code. Use environment variables or secure secret management systems. Rotate keys regularly. Use service accounts for production applications instead of user keys.
* **Data Storage (`store=true/false`):** Be mindful of the `store` parameter (default `true` for Responses API, `false` for Chat Completions). Storing data enables platform features like logging, tracing, and evals based on history, but might conflict with data privacy requirements. OpenAI states they don't train on API data by default, but storage itself might be a concern. Set `store=false` if data residency or minimal logging is required, understanding the trade-off in platform feature usability.
* **Tracing Data:** Traces sent to OpenAI's backend contain execution details. Use `RunConfig(trace_include_sensitive_data=False)` or environment variables to prevent logging potentially sensitive LLM/tool inputs/outputs if needed.
* **User Input (`user` parameter):** Providing a unique end-user identifier helps OpenAI monitor for abuse but requires handling user data appropriately.
* **Computer Use:** Extreme caution is required. Execute actions in sandboxed environments and implement confirmation flows for sensitive operations.

**Final Synthesis: Operationalizing Agentic AI**

This final exploration highlights the critical supporting elements and operational considerations surrounding the core agent-building tools. Effective management of API keys, users, and projects via the Admin API is crucial for secure and organized deployment, especially in team or enterprise settings. Fine-tuning offers a path to significantly enhance agent performance on specific tasks, with checkpoints and permissions enabling controlled rollout. Usage and Costs APIs provide essential visibility for optimization and financial tracking. The nuances of the SDK's helpers, error handling, and the developer's responsibility (especially with `AsyncComputer`) are vital practical details. Lastly, a constant awareness of the beta/preview status of features and careful consideration of security and data privacy implications (`store`, tracing sensitivity) are non-negotiable aspects of building responsible and robust agentic systems on the OpenAI platform. Mastering these operational details is as important as mastering the core agent orchestration logic for successful real-world deployment.

---

Let's assemble the final pieces of the puzzle, examining the supporting APIs, specific object structures, advanced SDK features, and operational details that provide a complete understanding of the OpenAI platform tailored for agentic systems.

**1. Realtime Communication: Enabling Voice Agents**

While much of the focus has been on text and tool interactions, the **Realtime API** (Beta) is specifically designed for low-latency, bidirectional communication, primarily enabling conversational voice agents using models like `gpt-4o-realtime-preview`.

* **Core Concept:** Uses WebRTC or WebSockets for near-instantaneous streaming of audio input from the user and audio/text output from the model. This avoids the higher latency inherent in standard request-response HTTP cycles.
* **Session Management:**
* **Ephemeral Tokens (`POST /v1/realtime/sessions`, `/v1/realtime/transcription_sessions`):** Backend applications first create a short-lived session token (`client_secret`) using these REST endpoints. This token is then passed to the client-side application (e.g., browser) to authenticate its WebSocket/WebRTC connection directly to OpenAI's Realtime servers. This avoids exposing long-lived API keys on the client.
* **Session Configuration:** The token creation endpoints allow pre-configuring session parameters (model, voice, audio formats, turn detection, tools, instructions, etc.). These can be updated during the session via client events.
* **Key Configuration Parameters:**
* `model`: Specifies the Realtime-capable model (e.g., `gpt-4o-realtime-preview`).
* `voice`: Selects the TTS voice (alloy, echo, fable, onyx, nova, shimmer, etc.). Cannot be changed mid-session after audio is first generated.
* `input_audio_format`/`output_audio_format`: Specifies audio encoding (pcm16, g711_ulaw, g711_alaw). PCM16 is typically 24kHz, 16-bit mono, little-endian.
* `turn_detection`: Crucial for voice conversations.
* `null`: Manual mode; client explicitly triggers model responses.
* `server_vad`: Uses Voice Activity Detection (VAD) based on audio energy/silence duration (`threshold`, `prefix_padding_ms`, `silence_duration_ms`). Responds after user silence.
* `semantic_vad`: More advanced; uses a model to estimate if the user has *semantically* finished speaking, adjusting silence timeouts dynamically (potentially higher latency but more natural).
* `input_audio_transcription`: Configures asynchronous transcription (using Whisper or `gpt-4o-transcribe`) providing a text representation of user audio (primarily for developer guidance/logging, as the model ingests audio directly).
* `input_audio_noise_reduction`: Applies filtering to incoming audio to improve VAD and model perception.
* `modalities`: Can be set to `["text"]` to disable audio output if only text responses are desired.
* **Client-Server Interaction (Events):** Communication over the WebSocket/WebRTC connection relies on a specific event protocol:
* **Client -> Server:** `session.update`, `input_audio_buffer.append` (send audio chunks), `input_audio_buffer.commit` (manual turn end), `input_audio_buffer.clear`, `conversation.item.create` (inject history/messages), `conversation.item.retrieve`, `conversation.item.truncate` (stop playing assistant audio if interrupted), `conversation.item.delete`, `response.create` (manual trigger), `response.cancel`, `transcription_session.update`, `output_audio_buffer.clear` (WebRTC interrupt).
* **Server -> Client:** `session.created`/`updated`, `conversation.created`/`item.created`/`retrieved`/`truncated`/`deleted`, `input_audio_buffer.committed`/`cleared`/`speech_started`/`speech_stopped` (VAD events), `response.created`/`done`/`output_item.added`/`done`/`content_part.added`/`done`, `response.text.delta`/`done`, `response.audio.delta`/`done`, `response.audio_transcript.delta`/`done`, `response.function_call_arguments.delta`/`done`, `transcription_session.updated`, `rate_limits.updated`, `error`, WebRTC specific `output_audio_buffer.*` events.
* **Analysis:** The Realtime API provides the specialized infrastructure needed for fluid voice interactions. The event-driven protocol, session management via ephemeral tokens, and configurable turn detection are key features. Building voice agents requires handling this complex asynchronous communication flow, likely abstracted by future enhancements or specific patterns within the Agents SDK's mentioned "voice" support.

**2. Peripheral APIs: Audio, Images, Embeddings, Moderation**

While not the core focus of the "agent" announcement, these APIs provide essential supporting capabilities that agents can leverage, typically via function tools.

* **Audio API (`/v1/audio/...`):**
* **Speech (TTS):** `POST /v1/audio/speech` turns agent text responses into audible speech using various models (`tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`) and voices. Allows controlling `speed` and format (`mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`). `gpt-4o-mini-tts` supports additional `instructions` for voice control.
* **Transcription (STT):** `POST /v1/audio/transcriptions` converts spoken input (various formats) into text using models like `whisper-1` or `gpt-4o-transcribe`. Supports language hints (`language`), context (`prompt`), timestamp granularities (`word`, `segment` - requires `verbose_json` format), and streaming for newer models. Provides `logprobs` for confidence scores.
* **Translation:** `POST /v1/audio/translations` translates audio from other languages *directly* into English text using `whisper-1`.
* **Images API (`/v1/images/...`):**
* **Generation (`/generations`):** Creates images from text prompts using DALL-E 2, DALL-E 3, or `gpt-image-1`. Parameters vary by model (e.g., `size`, `quality`, `style`, `n`). `gpt-image-1` adds controls like `background` transparency, `output_format` (png, jpeg, webp), `output_compression`, and `moderation` level.
* **Edits (`/edits`):** Modifies existing images based on a prompt and an optional mask (DALL-E 2, `gpt-image-1`). `gpt-image-1` can edit multiple input images together.
* **Variations (`/variations`):** Creates variations of an input image (DALL-E 2 only).
* **Embeddings API (`/v1/embeddings`):** Creates vector representations of text using models like `text-embedding-ada-002` or newer `text-embedding-3` models. Key for custom RAG/memory systems. Supports specifying output `dimensions` (for newer models) and `encoding_format` (`float` or `base64`).
* **Moderations API (`/v1/moderations`):** Classifies text and/or images against safety categories using models like `text-moderation-latest` or `omni-moderation-latest`. Returns boolean flags (`flagged`, categories like `harassment`, `violence`), confidence scores (`category_scores`), and which input types triggered flags (`category_applied_input_types`). Essential for content safety filtering.

**3. Advanced SDK Features and Nuances**

Digging deeper into the Agents SDK documentation reveals more granular controls and utilities:

* **Tool Behavior Control (`Agent.tool_use_behavior`):** Provides finer control over what happens after a function tool is called:
* `"run_llm_again"` (Default): Execute tool, send result back to LLM for synthesis/next step.
* `"stop_on_first_tool"`: Execute tool, use its direct output as the `final_output` of the agent run *without* sending the result back to the LLM. Useful for direct action tools where LLM synthesis isn't needed.
* `StopAtTools(stop_at_tool_names=["tool_a", "tool_b"])`: Similar to `stop_on_first_tool`, but only stops if one of the specified tools is called.
* `ToolsToFinalOutputFunction`: A custom function `(RunContextWrapper, List[FunctionToolResult]) -> ToolsToFinalOutputResult` that receives all tool results from a turn and programmatically decides if they constitute the `final_output` or if the LLM needs to run again. Offers maximum control.
* **Tool Choice Reset (`Agent.reset_tool_choice`):** Defaults to `True`. After a tool call, it automatically resets the `ModelSettings.tool_choice` back to `auto` (or the agent's default) for the *next* LLM call within the same run. This prevents infinite loops where `tool_choice="required"` or a specific tool forces repeated calls. Setting to `False` requires careful handling to avoid such loops.
* **Handoff Filters (`agents.extensions.handoff_filters`):** Provides pre-built functions for common context filtering needs during handoffs (e.g., `remove_all_tools`, `remove_system_prompts`, `keep_last_n_items`).
* **Recommended Handoff Prompts (`agents.extensions.handoff_prompt`):** Utilities like `RECOMMENDED_PROMPT_PREFIX` or `prompt_with_handoff_instructions` help ensure source agents' instructions clearly explain how and when to use the available handoffs (which appear as tools to the LLM).
* **Model Interfaces (`Model`, `ModelProvider`):** Defines the abstract base classes for integrating custom LLM backends or lookup logic into the SDK. Implementations like `OpenAIResponsesModel`, `OpenAIChatCompletionsModel`, and `LitellmModel` leverage these.
* **Output Schemas (`AgentOutputSchemaBase`, `AgentOutputSchema`):** Defines how the SDK handles structured outputs, including schema generation and validation logic. Allows customization beyond simple type hints.
* **Function Schema Generation (`FuncSchema`, `function_schema`):** Details the internal representation and generation process for converting Python functions into tool schemas using `inspect`, `griffe`, and `pydantic`. Handles type hints, docstrings (argument descriptions, overall description), and context injection.
* **Run Items (`RunItem` variants):** The structured representation of events within `RunResult.new_items`. Understanding these types (`MessageOutputItem`, `ToolCallItem`, `HandoffOutputItem`, etc.) is key to parsing the detailed execution history programmatically.
* **Visualization (`agents.extensions.visualization`):** The `draw_graph` utility uses Graphviz (`pip install "openai-agents[viz]"`) to generate visual diagrams of agent architectures, showing agent nodes, tool nodes, and handoff edges. Useful for documentation and understanding complex flows.

**4. Operational Details Revisited**

* **Backward Compatibility:** OpenAI aims for stability in the v1 REST API, official SDKs, and model families (like `gpt-4o`). However, *prompting behavior and specific outputs can change between model snapshots* (e.g., `gpt-4o-2024-05-13` vs `gpt-4o-2024-08-06`). Using pinned model versions (when available) and robust evaluations are the best defense against unexpected behavioral shifts.
* **Debugging Headers:** Logging the `x-request-id` response header is highly recommended for efficient troubleshooting with OpenAI support.
* **Admin Keys:** Reiteration that these powerful keys are required for org/project management APIs and should be protected rigorously.

**5. Comparing Responses API and Chat Completions API Features**

While Responses API is recommended for new agentic work, understanding the differences with the widely used Chat Completions API is important:

| Feature | Responses API (`/v1/responses`) | Chat Completions API (`/v1/chat/completions`) | Notes |
| :---------------------- | :---------------------------------------------- | :-------------------------------------------- | :-------------------------------------------------------------------- |
| **Primary Goal** | Agentic tasks, integrated tools, state | Conversational interaction, text/image gen | Responses is a superset for agentic features. |
| **State Management** | `previous_response_id` for implicit history | Developer manages full `messages` array | Responses simplifies client-side history management. |
| **Built-in Tools** | Yes (Web Search, File Search, Computer Use) | No (Web Search via dedicated models only) | Major differentiator for agents needing these capabilities. |
| **Custom Tools** | Yes (Function Calling via `tools`) | Yes (Function Calling via `tools`) | Both support custom functions. |
| **Streaming** | Rich SSE events (`response.*`, `item.*`, etc.) | Simpler SSE chunks (`choices[].delta.content`) | Responses streaming offers more granularity. |
| **Output Structure** | `output` array (messages, tool calls, etc.) | `choices[].message` (single message/tool_calls) | Responses output is structured as a sequence of items. |
| **Truncation** | `truncation="auto"` recommended | Implicit (older messages dropped) / Error | Responses offers explicit control over context overflow handling. |
| **Data Storage** | `store=true` (default) | `store=false` (default) | Different defaults reflect use cases (agent logs vs chat). |
| **Structured Output** | `text.format = {type:"json_schema", ...}` | `response_format = {type:"json_object" / ...}` | Both support JSON, Responses prefers schema-based. |
| **Reasoning Controls** | `reasoning.effort` (o-series) | `reasoning_effort` (o-series) | Specific parameters for newer reasoning models. |
| **Audio Output** | Not directly mentioned (likely via Realtime) | Yes (`modalities=["audio"]`, specific models) | Chat Completions has direct async audio generation mode. |
| **Max Tokens** | `max_output_tokens` (incl. reasoning) | `max_completion_tokens` (incl. reasoning) | Similar concepts, different parameter names. (`max_tokens` is legacy) |
| **Future Direction** | Recommended for new agent integrations | Fully supported, but Responses preferred | Clear strategic direction towards Responses API for agents. |

**Final Synthesis: A Holistic View**

The OpenAI platform for agents is a complex but powerful ecosystem. The **Responses API** acts as the modern, versatile core engine, handling complex interactions with models and integrated **Tools** (Web Search, File Search, Computer Use). The **Agents SDK** provides the essential client-side framework for **Orchestration**, defining agent behaviors (`Agent`), managing delegation (`Handoff`), ensuring safety (`Guardrail`), integrating custom logic (`@function_tool`), and executing workflows (`Runner`).

Supporting this core are critical infrastructure components like the **Files/Uploads APIs** for data ingestion and **Vector Stores** for persistent, searchable knowledge. Specialized APIs for **Audio**, **Images**, **Embeddings**, and **Moderation** provide further capabilities often leveraged via function tools. The **Realtime API** caters specifically to low-latency voice agents.

Effective development requires not only understanding the core API/SDK but also the operational aspects managed via the **Admin API** (users, projects, keys, rate limits, audit logs), the model specialization offered by **Fine-tuning** (including checkpoints and permissions), and the feedback loop enabled by **Tracing** and **Evals**.

Developers must navigate trade-offs between LLM-driven flexibility (using handoffs, complex prompts) and code-driven predictability (deterministic flows, structured outputs). They must also prioritize safety and cost-effectiveness, utilizing guardrails, sandboxing (for Computer Use), controlling data storage/tracing, and selecting appropriate models for each task. While components like the Assistants API are being phased out in favor of the Responses API + Agents SDK paradigm, understanding their concepts aids migration and highlights the evolution towards a more unified, powerful, and developer-friendly platform for building the future of agentic AI.

---

Okay, creating a separate, detailed addendum is a cleaner approach. This allows the main synthesis to remain focused on the strategic narrative while providing a comprehensive reference section for developers needing specifics.

Here is the proposed **Detailed Addendum Section**, designed to be appended to your existing writeup. It systematically covers the missing details identified earlier, drawing from the simulated API/SDK reference documentation.

---

## Detailed Addendum: API Specifications and SDK Nuances

This section provides supplementary, granular details on specific API endpoints, object schemas, SDK classes, parameters, and operational aspects referenced in the main synthesis. It serves as a more technical reference based on the comprehensive documentation provided.

**I. Responses API (`/v1/responses`) - Specifics**

* **Endpoint:** `POST /v1/responses`
* **Key Request Body Parameters (Recap & Detail):**
* `model` (string, **Required**): E.g., "gpt-4o", "o3-mini", "computer-use-preview".
* `input` (string | array[ResponseInputItemParam], **Required**): Prompt or conversation history. See `ResponseInputItemParam` below.
* `tools` (array[ToolParam], Optional): Define available tools. See `ToolParam` below.
* `previous_response_id` (string | null, Optional): ID of the previous response for context continuation.
* `instructions` (string | null, Optional): System prompt (not inherited with `previous_response_id`).
* `max_output_tokens` (integer | null, Optional): Limit on *total* generated tokens (output + reasoning).
* `stream` (boolean, Optional, Default: `false`): Enable SSE streaming.
* `store` (boolean | null, Optional, Default: `true`): Enable 30-day storage.
* `truncation` (string | null, Optional, Default: `"disabled"`): Strategy ("auto" or "disabled").
* `tool_choice` (string | object, Optional): Control tool usage ("auto", "none", "required", specific tool).
* `text` (object, Optional): Output format control (`format: {"type": "text" | "json_schema", "json_schema": {...}}`).
* `reasoning` (object | null, Optional): o-series model reasoning controls (`effort`, `summary` config).
* `parallel_tool_calls` (boolean | null, Optional, Default: `true`).
* `include` (array[string] | null, Optional): E.g., `["file_search_call.results"]`.
* `metadata` (map[string, string], Optional): Max 16 pairs, key<=64 chars, val<=512 chars.
* `temperature`, `top_p`, `user`, `service_tier`.
* **`ResponseInputItemParam` Structure:** An array element for the `input` parameter. Common structures:
* `{"role": "user" | "assistant" | "tool", "content": string | array[ContentPart]}`
* **`ContentPart` Types:** `{"type": "input_text", "text": string}`, `{"type": "input_image", "image_url": string, "detail": "auto" | "low" | "high"}`, `{"type": "tool_call", "id": string, "function": {...}}`, `{"type": "tool_result", "tool_call_id": string, "result": string, "is_error": boolean}`.
* **`ToolParam` Structure:** An array element for the `tools` parameter.
* `{"type": "function", "function": FunctionDefinition}`
* `{"type": "web_search_preview", ...}` (specific config omitted for brevity)
* `{"type": "file_search", "vector_store_ids": list[string], ...}` (specific config omitted)
* `{"type": "computer_use_preview", "display_width": int, "display_height": int, "environment": string, ...}`

* **Helper Endpoints:**
* `GET /v1/responses/{response_id}`: Retrieves a stored `Response` object. Query param: `include` (array[string]).
* `DELETE /v1/responses/{response_id}`: Deletes a stored `Response`. Returns deletion status object.
* `GET /v1/responses/{response_id}/input_items`: Lists input items for a stored response. Query params: `limit`, `order`, `after`, `before`, `include`. Returns a list object containing `InputItem` objects.

* **`Response` Object Schema (Key Fields):**
* `id` (string): The response ID (used for `previous_response_id`).
* `object` (string, "response")
* `created_at` (integer)
* `status` (string: "completed", "failed", "in_progress", "incomplete")
* `error` (object | null): `{ "code": string, "message": string }` if status is "failed".
* `incomplete_details` (object | null): `{ "reason": string }` (e.g., "max_tokens") if status is "incomplete".
* `model` (string): Specific model snapshot used.
* `output` (array[ResponseOutputItem]): Sequence of generated items.
* **`ResponseOutputItem` Types:** `message`, `tool_call`, `reasoning`.
* **`message` Structure:** `{"type": "message", "id": string, "status": string, "role": "assistant", "content": array[ContentPart]}`
* **`ContentPart` Types:** `output_text`, `output_refusal`.
* **`output_text`:** `{"type": "output_text", "text": string, "annotations": array[Annotation]}`
* **`Annotation` Types:** `file_citation` (`{"type": "file_citation", "index": int, "file_id": string, "filename": string, "quote": string}`), `web_search_citation` (structure likely similar, TBD).
* **`output_refusal`:** `{"type": "output_refusal", "refusal": string}`
* **`tool_call` Structure:** `{"type": "tool_call", "id": string, "function | web_search | file_search | computer_use": {...}}` (Specific tool payload depends on type).
* **`reasoning` Structure:** `{"type": "reasoning", "id": string, "summary": array[SummaryPart]}`
* **`SummaryPart`:** `{"type": "summary_text", "text": string}`.
* `usage` (object | null): `{ "input_tokens": int, "input_tokens_details": {"cached_tokens": int}, "output_tokens": int, "output_tokens_details": {"reasoning_tokens": int}, "total_tokens": int }`. Null if `status` is not terminal.
* Other fields reflect request parameters: `instructions`, `max_output_tokens`, `parallel_tool_calls`, `previous_response_id`, `reasoning` (request config), `store`, `temperature`, `text` (request config), `tool_choice`, `tools` (request config), `top_p`, `truncation`, `user`, `metadata`.

* **Streaming Events (SSE `event: type\ndata: json\n\n`):**
* **Lifecycle:** `response.created`, `response.in_progress`, `response.completed`, `response.failed`, `response.incomplete`, `error`.
* **Structure:** `response.output_item.added`, `response.output_item.done`, `response.content_part.added`, `response.content_part.done`.
* **Content Deltas:** `response.output_text.delta`, `response.refusal.delta`, `response.function_call_arguments.delta`, `response.reasoning_summary_text.delta`.
* **Content Finalization:** `response.output_text.done`, `response.refusal.done`, `response.function_call_arguments.done`, `response.reasoning_summary_text.done`.
* **Annotations:** `response.output_text.annotation.added`.
* **Built-in Tools:** `response.file_search_call.[in_progress | searching | completed]`, `response.web_search_call.[in_progress | searching | completed]`.
* **Termination:** `event: done\ndata: [DONE]\n\n`.
* *(Payload `data` for each event contains relevant partial or full objects)*.

**II. Built-in Tools - Specific Details**

* **File Search (`file_search`):**
* **SDK Class (`FileSearchTool`):** Params: `vector_store_ids` (list[str]), `max_num_results` (int|None), `include_search_results` (bool, default False), `ranking_options` (object|None), `filters` (object|None).
* **API Tool Param:** `{"type": "file_search", "vector_store_ids": [...], "max_num_results": ..., "ranking_options": {...}, "filters": {...}}`.
* **Search Results Structure (in `include` or `include_search_results`):** Array within response (`file_search_call.results` or tool output item) containing objects like: `{"file_id": string, "filename": string, "score": float, "attributes": map, "content": [{"type": "text", "text": string}]}`.
* **Direct Search (`POST /v1/vector_stores/{id}/search`):** Request body: `query`, `max_num_results`, `filters`, `ranking_options`, `rewrite_query`. Response: `vector_store.search_results.page` object.

* **Web Search (`web_search_preview`):**
* **SDK Class (`WebSearchTool`):** Params: `user_location` (`{"type": "approximate", "city": ..., "country": ...}` | None), `search_context_size` (Literal["low", "medium", "high"], default "medium").
* **API Tool Param:** `{"type": "web_search_preview", "user_location": {...}, "search_context_size": ...}`.
* **Annotations:** `web_search_citation` (structure TBD, links text to URLs).
* **Dedicated Models:** `gpt-4o-search-preview` ($30/k queries), `gpt-4o-mini-search-preview` ($25/k queries) via Chat Completions API.

* **Computer Use (`computer_use_preview`):**
* **SDK Class (`ComputerTool`):** Requires `computer: Computer | AsyncComputer` implementation.
* **API Tool Param:** `{"type": "computer_use_preview", "display_width": int, "display_height": int, "environment": string}`.
* **`AsyncComputer` Interface Methods (Required Implementation):** `screenshot() -> str`, `click(x, y, button)`, `double_click(x, y)`, `scroll(x, y, scroll_x, scroll_y)`, `type(text)`, `wait()`, `move(x, y)`, `keypress(keys)`, `drag(path)`. Properties: `environment`, `dimensions`.
* **Output Actions (`computer_action` item in Response `output`):** Actions include `click`, `type`, `scroll`, `keypress`, `drag`, `move`, `wait`. Each has specific parameters (e.g., coordinates, text, keys, duration).
* **Requirements:** `model="computer-use-preview"`, `truncation="auto"`, Tier 3-5 access.

**III. Agents SDK - Specific Details**

* **`Agent` Class:** See full param list above. Note `tool_use_behavior` options (literal strings, `StopAtTools` object, or custom function signature `ToolsToFinalOutputFunction`). `mcp_config` is `TypedDict` with optional `convert_schemas_to_strict` (bool).
* **`Runner` Class & `RunConfig`:** See full `RunConfig` param list above. Exceptions: `AgentsException`, `MaxTurnsExceeded`, `ModelBehaviorError`, `UserError`, `InputGuardrailTripwireTriggered`, `OutputGuardrailTripwireTriggered`.
* **`Handoff` & `handoff()`:** See detailed structure and parameter descriptions above. `HandoffInputData` contains `input_history`, `pre_handoff_items`, `new_items`. Filters (`remove_all_tools`, etc.) available in `agents.extensions.handoff_filters`. Prompt helpers in `agents.extensions.handoff_prompt`.
* **Tools (`@function_tool`, `FunctionTool`):** Decorator takes `name_override`, `description_override`, `docstring_style`, `use_docstring_info`, `failure_error_function`, `strict_mode`. Manual `FunctionTool` requires `name`, `description`, `params_json_schema`, `on_invoke_tool(ctx, args_str) -> awaitable[Any]`. `default_tool_error_function` exists.
* **Results (`RunResultBase`, `RunResult`, `RunResultStreaming`):** See detailed structure above. `RunItem` types wrap `raw_item` from API. `ItemHelpers` class has static methods: `extract_last_content`, `extract_last_text`, `input_to_new_input_list`, `text_message_outputs`, `text_message_output`, `tool_call_output_item`.
* **Streaming:** `StreamEvent` union type. `RawResponsesStreamEvent` wraps `TResponseStreamEvent`. `RunItemStreamEvent` wraps `RunItem` with `name` (e.g., "tool_called"). `AgentUpdatedStreamEvent` contains `new_agent`.
* **Context (`RunContextWrapper`, `Usage`):** Wrapper holds `context: TContext` and `usage: Usage`. `Usage` has `requests`, `input_tokens`, `output_tokens`, `total_tokens`.
* **Models:** `Model` (ABC) defines `get_response`, `stream_response`. `ModelProvider` (ABC) defines `get_model`. SDK provides `OpenAIResponsesModel`, `OpenAIChatCompletionsModel`. `LitellmModel` available via optional dependency. `ModelSettings` dataclass holds temp, top_p, penalties, tool_choice, parallel_tool_calls, truncation, max_tokens, reasoning, metadata, store, include_usage, extra_query/body/headers. `resolve()` method merges settings.
* **MCP:** `MCPServer` (ABC). `MCPServerStdio` uses `MCPServerStdioParams`. `MCPServerSse` uses `MCPServerSseParams`. Both take `cache_tools_list` (bool). `MCPUtil` helps convert/invoke MCP tools. `MCPConfig` on Agent has `convert_schemas_to_strict`.
* **Configuration:** `set_default_openai_key(key, use_for_tracing=True)`, `set_default_openai_client(client, use_for_tracing=True)`, `set_default_openai_api("chat_completions"|"responses")`. Tracing: `set_tracing_export_api_key(key)`, `set_tracing_disabled(bool)`, `set/add_trace_processors`. Logging: `enable_verbose_stdout_logging()`, env vars `OPENAI_AGENTS_DONT_LOG_MODEL_DATA`/`TOOL_DATA`.
* **Visualization:** `draw_graph(agent, filename=None)` requires `openai-agents[viz]`. `.view()` shows graph.

**IV. Supporting Infrastructure - Specifics**

* **Vector Stores API:** See full object schemas (`VectorStore`, `VectorStoreFile`, `VectorStoreFileBatch`) detailed above. All CRUD endpoints (`POST`, `GET`, `DELETE`, `POST /{id}` for modify) exist for stores, files within stores, and batches. `/search` endpoint detailed under File Search tool. Key params: `chunking_strategy`, `expires_after`, `attributes`, `filters`.
* **Files & Uploads API:** See full object schemas (`File`, `Upload`, `UploadPart`) detailed above. `POST /v1/files` requires `file`, `purpose`. `POST /v1/uploads` requires `bytes`, `filename`, `mime_type`, `purpose`. `POST /v1/uploads/{id}/parts` requires `data`. `POST /v1/uploads/{id}/complete` requires `part_ids` (ordered list), optional `md5`.

**V. Peripheral APIs - Specifics**

* **Realtime API:** Session creation (`POST /v1/realtime/sessions`) takes many config params (model, voice, audio formats, turn detection, etc.) and returns `client_secret`. Extensive client/server event protocol (session.*, input\_audio\_buffer.*, conversation.*, response.*, etc.) manages interaction. Too detailed to fully replicate here; refer to dedicated Realtime API docs.
* **Audio API:** See parameters for `/speech`, `/transcriptions`, `/translations` detailed above. Note `verbose_json` format for timestamp granularities in transcriptions. Streaming available for non-whisper-1 transcription models.
* **Images API:** See parameters for `/generations`, `/edits`, `/variations` detailed above. Note model-specific options (DALL-E 2/3 vs `gpt-image-1`) for quality, size, style, background, output format.
* **Embeddings API:** `/embeddings` takes `input` (string/array), `model`, `dimensions` (int, newer models), `encoding_format` ("float"|"base64"), `user`. Returns list of `Embedding` objects (`embedding` list[float], `index`).
* **Fine-tuning API:** See parameters for `/fine_tuning/jobs` creation detailed above. Note `method` object (`{"type": "supervised" | "preference", ...}`) containing `hyperparameters`. See object schemas for Job, Event, Checkpoint, Permission detailed above. Endpoints exist for list/retrieve/cancel jobs, list events, list/retrieve checkpoints, manage checkpoint permissions (Admin API).
* **Batch API:** `/batches` creation takes `input_file_id`, `endpoint` ("/v1/responses", "/v1/chat/completions", "/v1/embeddings"), `completion_window` ("24h"), `metadata`. See `Batch` object schema above. Input/output file format: JSONL lines with `custom_id`, `method`, `url`, `body` (input) and `custom_id`, `response`, `error` (output). Endpoints for retrieve, cancel, list.
* **Moderations API:** `/moderations` takes `input` (string/array/multimodal), `model` ("text-moderation-latest", "omni-moderation-latest"). Returns `Moderation` object with `results` array (flagged, categories, category_scores, category_applied_input_types).

**VI. Certificates API (`/v1/organization/certificates`, `/v1/organization/projects/...`)**

* **Endpoints:** `POST /org/certs` (upload), `GET /org/certs/{id}`, `POST /org/certs/{id}` (modify name), `DELETE /org/certs/{id}`, `GET /org/certs`, `GET /proj/{id}/certs`, `POST /org/certs/activate`, `POST /org/certs/deactivate`, `POST /proj/{id}/certs/activate`, `POST /proj/{id}/certs/deactivate`. (Requires Admin Key).
* **Certificate Object:** See schema details above.

**VII. Admin API (`/v1/organization/...`) - Specifics**

* Requires Admin API Key.
* **Users:** List, retrieve, modify role (`owner`/`reader`), delete.
* **Invites:** List, create (with email, org role, optional project memberships), retrieve, delete (if pending).
* **Projects:** List (incl. archived), create, retrieve, modify name, archive.
* **Project Users:** List, create (add existing org user with project role `owner`/`member`), retrieve, modify role, delete.
* **Project Service Accounts:** List, create (returns **unredacted API key**), retrieve, delete.
* **Project API Keys:** List (shows redacted user/SA keys), retrieve (redacted), delete (**user keys only**).
* **Project Rate Limits:** List, modify (set limits per model).
* **Audit Logs:** List (with extensive filtering: actor, type, time, project, resource). Detailed `AuditLog` object schema includes `actor` (session, user, API key details) and event-specific payloads (e.g., `project.created: {id: ..., data: {name: ...}}`).

**VIII. Usage & Costs API (`/v1/organization/...`) - Specifics**

* Requires Admin API Key.
* **Usage Endpoints (`/usage/...`):** Separate endpoints for `completions`, `embeddings`, `moderations`, `images`, `audio_speeches`, `audio_transcriptions`, `vector_stores`, `code_interpreter_sessions`.
* **Usage Query Params:** `start_time`, `end_time`, `bucket_width` ("1m", "1h", "1d"), `group_by` (array of fields like "project_id", "model"), `limit`, `page`, specific filters (`project_ids`, `model`, etc.).
* **Usage Response:** Paginated list of `bucket` objects (`start_time`, `end_time`, `results` array). Result objects contain metrics (tokens, requests, images, seconds, bytes, sessions) and populated group_by fields.
* **Costs Endpoint (`/costs`):** `start_time`, `end_time`, `bucket_width` ("1d" only), `group_by` ("project_id", "line_item"), `limit`, `page`, `project_ids`. Returns buckets with results containing `amount` (`{value, currency}`), `line_item`, `project_id`.

**IX. Operational Details**

* **Headers:** `Authorization: Bearer $KEY`, `Content-Type: application/json`, `OpenAI-Organization`, `OpenAI-Project`. `OpenAI-Beta: assistants=v2` was needed for some older Assistants/Vector Store v2 endpoints (check if still required post-deprecation announcement).
* **Request ID:** `x-request-id` in response headers for debugging.
* **Rate Limit Headers:** `x-ratelimit-limit-*`, `x-ratelimit-remaining-*`, `x-ratelimit-reset-*` for requests and tokens.

This addendum provides a much deeper dive into the specifics omitted from the main synthesis. For exhaustive field-by-field descriptions or edge-case behaviors, consulting the original, full OpenAI API Reference and Agents SDK documentation remains necessary.

---

## Trickiest Parts and Tips for Building with OpenAI Agents

While the new Responses API and Agents SDK significantly streamline agent development, building robust, reliable, and efficient agentic systems still presents challenges. Here are some key areas to pay attention to and tips for navigating them:

**1. State Management and Context Windows:**

* **The Challenge:** The `previous_response_id` mechanism in the Responses API simplifies state handling compared to manually managing message lists, but it's different from the explicit `Thread` objects of the Assistants API. For very long conversations, hitting the context window limit is inevitable.
* **Tips:**
* **Embrace `truncation="auto"`:** When using `previous_response_id` for multi-turn conversations with the Responses API, setting `truncation="auto"` is almost essential to prevent errors when the context window overflows. Understand that this means the model might lose access to information from the middle of the conversation.
* **SDK Context vs. LLM Context:** Clearly distinguish between the `RunContextWrapper` (for passing *local* application state/dependencies to your Python code like tools and hooks) and the conversation history/input provided *to the LLM*. Don't expect data in the SDK context to be automatically seen by the model unless explicitly added via instructions, input messages, or tool results.
* **Strategic History Management:** For very long interactions where `truncation="auto"` might lose critical early information, consider implementing summarization strategies or explicitly managing which parts of the history are passed using `result.to_input_list()` in the SDK application layer, especially when *not* using `previous_response_id`.
* **Handoff Context (`input_filter`):** Pay close attention to the `input_filter` parameter when configuring handoffs in the Agents SDK. By default, the target agent sees the entire history. Use filters (like `remove_tool_calls`, `keep_last_n_items`, or custom functions) to prune the context, making it more relevant for the specialist agent and managing token count.

**2. Tool Orchestration and Execution:**

* **The Challenge:** While the SDK abstracts much of the loop, understanding the flow for custom function calls vs. built-in tools vs. handoffs is important. Custom function calls still involve a round trip: LLM requests tool -> SDK calls your Python function -> SDK sends result back to LLM -> LLM processes result. Built-in tools might be handled more seamlessly internally by OpenAI. Handoffs are a special type of tool call intercepted by the SDK.
* **Tips:**
* **Custom Tool Loop:** Be aware that using custom `@function_tool`s still involves multiple LLM interactions if the model needs to process the tool's output. Design tools to return concise, useful information.
* **`tool_use_behavior`:** Carefully consider `Agent.tool_use_behavior`. If a tool's direct output *is* the desired final answer, use `"stop_on_first_tool"` or `StopAtTools` to save an LLM turn and reduce latency/cost.
* **`tool_choice` & `reset_tool_choice`:** Forcing tool use (`tool_choice="required"` or a specific tool) can be powerful but risks infinite loops if `reset_tool_choice=False` (default is `True`, which resets to 'auto' after a tool call, mitigating this). Understand this reset behavior.
* **Error Handling in Tools:** Implement robust error handling within your `@function_tool` functions. Use the `failure_error_function` parameter (or `try...except` blocks if creating `FunctionTool` manually) to provide informative error messages back to the LLM or raise exceptions appropriately for the SDK/application layer to handle. Don't let tool failures crash the entire agent run silently.
* **Handoffs vs. Agents-as-Tools:** Choose deliberately. Use **Handoffs** for true delegation where the specialist agent takes over. Use **Agents-as-Tools** when an orchestrator needs specific outputs from sub-agents while retaining control of the main flow (e.g., for parallel queries or synthesis). Be mindful that agents-as-tools involve nested `Runner.run` calls internally.

**3. Safety, Reliability, and Computer Use:**

* **The Challenge:** Agents, especially those using Computer Use, can perform unintended or harmful actions. Reliability, particularly outside well-defined tasks or with less capable models, can be inconsistent. Hallucinated tool calls or arguments can occur.
* **Tips:**
* **Computer Use - Extreme Caution:** This tool is experimental and has known reliability limitations (especially for OS tasks).
* **Sandboxing is Mandatory:** *Never* run Computer Use actions directly on a production system or a machine with sensitive data access. Use heavily isolated environments (Docker containers, VMs).
* **Implement Confirmation Flows:** Your application layer *must* intercept potentially sensitive actions (file deletions, form submissions, purchases) generated by the `computer_use_preview` tool and require explicit human confirmation before execution.
* **Human Oversight:** Monitor CUA agent actions closely. Start with browser automation, which is more reliable than OS automation currently.
* **Guardrails:** Use SDK Guardrails (`@input_guardrail`, `@output_guardrail`) proactively to check for safety, relevance, policy violations, or nonsensical outputs *before* executing expensive steps or showing results to users. Use fast/cheap models for guardrail agents for efficiency.
* **Prompting for Safety:** Instruct agents explicitly about boundaries, ethical considerations, and what actions they *should not* take.
* **Structured Outputs for Validation:** Use `output_type` to enforce structure, reducing the chance of free-form problematic text, and allowing easier programmatic validation of the agent's output.

**4. Debugging and Observability:**

* **The Challenge:** Following the execution path of multi-step, multi-agent workflows with intermediate tool calls and LLM reasoning steps can be very difficult with traditional logging.
* **Tips:**
* **Leverage Tracing:** Make full use of the built-in tracing. Examine traces in the OpenAI dashboard (or your custom backend if configured). Pay attention to span hierarchy, timings, inputs/outputs (if enabled), and metadata. This is the *primary* tool for debugging complex agent flows.
* **Use Descriptive Names:** Give your Agents, Tools, and Handoffs clear, descriptive names, as these appear in traces.
* **Log `RunResult.new_items`:** This provides a structured log of messages, tool calls, tool outputs, and handoffs that occurred during a run, complementing the trace view.
* **Enable Verbose SDK Logging:** Use `enable_verbose_stdout_logging()` during development for detailed SDK-level logs (be mindful of sensitive data).
* **Custom Spans/Logs:** Add `custom_span()` calls or standard logging within your tool functions and hooks for application-specific context.

**5. Cost and Performance Optimization:**

* **The Challenge:** Agentic workflows can consume significant tokens and incur tool usage fees (Search, VS queries/storage, CUA tokens). Latency can increase with multiple LLM calls or slow tool executions.
* **Tips:**
* **Monitor Usage/Costs:** Regularly check the Usage and Costs APIs/dashboard to understand consumption patterns. Use `group_by` to attribute costs.
* **Model Selection:** Use smaller, faster models (like `gpt-4o-mini`) for simpler tasks like triage, routing, guardrails, or basic data extraction. Reserve larger, more capable models (`gpt-4o`, `o1`) for complex reasoning, generation, or critique steps.
* **Prompt Efficiency:** Optimize instructions and prompts for clarity and conciseness.
* **Tool Output Size:** Ensure custom tools return only necessary information to minimize tokens sent back to the LLM.
* **Optimize RAG:** Fine-tune File Search queries, use metadata filters effectively, and manage Vector Store size/expiration (`expires_after`) to control storage costs and potentially improve retrieval speed/relevance.
* **Control Iteration:** Be cautious with LLM-as-a-judge or self-correction loops; set clear exit criteria or iteration limits to prevent runaway usage.
* **Caching:** Use `cache_tools_list=True` for MCP servers with static tool lists. Implement application-level caching for frequently called custom tools or external APIs where appropriate.
* **Streaming:** Use `Runner.run_streamed()` to improve *perceived* latency for the end-user by showing partial results quickly.

**6. Data Management and Preparation:**

* **The Challenge:** The quality and format of data used for File Search (Vector Stores) and Fine-tuning heavily impact agent performance. Vector Store file processing is asynchronous. Batch API requires strict JSONL formatting.
* **Tips:**
* **Vector Store Readiness:** After adding files to a Vector Store (individually or via batch), *poll* the status of the `VectorStoreFile` objects (`GET /v1/vector_stores/{vs_id}/files/{file_id}`) or the `VectorStoreFileBatch` object until they reach `"completed"` before relying on them in File Search.
* **Fine-tuning Data Quality:** Garbage-in, garbage-out. Ensure your fine-tuning examples are high-quality, diverse, and accurately reflect the desired behavior, including correct tool usage patterns if applicable. Validate data formats rigorously. Use a validation set to monitor overfitting and select the best checkpoint.
* **Batch API Input Validation:** Programmatically validate your JSONL input file for the Batch API *before* uploading to catch formatting errors early. Ensure each line is valid JSON and conforms to the expected request structure (`custom_id`, `method`, `url`, `body`).
* **File Purpose:** Always upload files using the correct `purpose` via the Files API, as this affects validation and usability.

**7. API/SDK Evolution and Versioning:**

* **The Challenge:** The platform is evolving (Assistants API deprecation, Responses API as the future, Beta/Preview features changing). Model behavior can shift slightly between snapshots.
* **Tips:**
* **Adopt Responses API/Agents SDK:** For new projects aiming for agentic capabilities, prioritize the recommended Responses API and Agents SDK. Start planning migration if heavily invested in the Assistants API.
* **Pin Model Versions:** For production stability, use specific model snapshot versions (e.g., `gpt-4o-2024-08-06`) instead of floating tags (`gpt-4o`) when possible, to avoid unexpected behavior changes.
* **Use Evals for Regression:** Implement evaluation suites to test key agent behaviors and run them regularly, especially after upgrading SDK versions or changing model versions, to catch regressions.
* **Monitor Beta/Preview Features:** Use features marked Beta or Preview with the understanding they might change or have limitations. Build in flexibility or fallback mechanisms if relying heavily on them in production.
* **Stay Informed:** Follow OpenAI announcements and documentation changelogs.

By anticipating these tricky areas and applying these tips, developers can navigate the complexities and build more robust, reliable, and effective AI agents using OpenAI's powerful new platform.

---

## Detailed Examples Addendum

This section provides concrete examples extracted from the OpenAI documentation, illustrating the usage of the Responses API, Agents SDK, built-in tools, supporting infrastructure, and other relevant APIs. These examples aim to offer practical implementation details beyond the conceptual overview.

**I. Responses API (`/v1/responses`) Examples**

**1. Basic Text Generation**

* **Python:**
```python
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
model="gpt-4.1", # Or gpt-4o, o3-mini etc.
input="Write a one-sentence bedtime story about a unicorn."
)
print(response.output_text)
# Example Output: In a peaceful grove beneath a silver moon, a unicorn named Lumina discovered a hidden pool that reflected the stars.
```
* **JavaScript:**
```javascript
import OpenAI from "openai";
const client = new OpenAI();

const response = await client.responses.create({
model: "gpt-4.1",
input: "Write a one-sentence bedtime story about a unicorn."
});
console.log(response.output_text);
```
* **cURL:**
```bash
curl "https://api.openai.com/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4.1",
"input": "Write a one-sentence bedtime story about a unicorn."
}'

# Example Response (Structure):
# {
# "id": "resp_...", "object": "response", "created_at": ..., "status": "completed",
# "model": "gpt-4.1-...",
# "output": [ { "type": "message", "id": "msg_...", "status": "completed", "role": "assistant",
# "content": [ { "type": "output_text", "text": "...", "annotations": [] } ] } ],
# "usage": { ... }, ...
# }
```

**2. Image Input (Multimodal)**

* **Python:**
```python
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
model="gpt-4.1", # Or gpt-4o
input=[
{"role": "user", "content": "What two teams are playing in this photo?"},
{
"role": "user",
"content": [
{
"type": "input_image",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/3/3b/LeBron_James_Layup_%28Cleveland_vs_Brooklyn_2018%29.jpg",
# "detail": "auto" # Optional: auto, low, high
}
]
},
],
)
print(response.output_text)
# Example Output: The Cleveland Cavaliers and the Brooklyn Nets are playing in this photo.
```
* **JavaScript:**
```javascript
import OpenAI from "openai";
const client = new OpenAI();

const response = await client.responses.create({
model: "gpt-4.1", // Or gpt-4o
input: [
{ role: "user", content: "What two teams are playing in this photo?" },
{
role: "user",
content: [
{
type: "input_image",
image_url: "https://upload.wikimedia.org/wikipedia/commons/3/3b/LeBron_James_Layup_%28Cleveland_vs_Brooklyn_2018%29.jpg",
}
],
},
],
});
console.log(response.output_text);
```
* **cURL:**
```bash
curl "https://api.openai.com/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4.1",
"input": [
{ "role": "user", "content": "What two teams are playing in this photo?" },
{ "role": "user", "content": [
{ "type": "input_image", "image_url": "https://...jpg" }
]
}
]
}'
```

**3. Using Built-in Tools**

* **Web Search:**
* Python:
```python
response = client.responses.create(
model="gpt-4o", # Or gpt-4o-mini
tools=[{"type": "web_search_preview"}],
input="What was a positive news story that happened today?",
)
print(response.output_text) # Synthesized answer with potential citations
# To see raw search results if requested via 'include':
# for item in response.output:
# if item.type == 'tool_call' and item.web_search: # or tool_result if processed
# print(item.web_search)
```
* JavaScript:
```javascript
const response = await client.responses.create({
model: "gpt-4o",
tools: [ { type: "web_search_preview" } ],
input: "What was a positive news story that happened today?",
});
console.log(response.output_text);
```
* cURL:
```bash
curl "https://api.openai.com/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"tools": [{"type": "web_search_preview"}],
"input": "What was a positive news story from today?"
}'
```
* **File Search (Requires pre-existing Vector Store):**
* Python (Setup + Usage):
```python
# --- Setup Vector Store (One time or less frequently) ---
# Assume file1, file2 are uploaded File objects
# productDocs = client.vector_stores.create(
# name="Product Documentation",
# file_ids=[file1.id, file2.id]
# )
# vs_id = productDocs.id
# --- Poll for file processing completion ---

# --- Use in Responses API ---
vs_id = "vs_..." # Replace with your actual Vector Store ID
response = client.responses.create(
model="gpt-4o-mini",
tools=[{
"type": "file_search",
"vector_store_ids": [vs_id],
# "max_num_results": 5 # Optional
}],
input="What is deep research by OpenAI?",
# include=["file_search_call.results"] # Optional: Get raw results
)
print(response.output_text) # Synthesized answer with potential citations
```
* JavaScript (Setup + Usage):
```javascript
// --- Setup Vector Store (One time or less frequently) ---
// Assume file1, file2 are uploaded File objects
// const productDocs = await client.vector_stores.create({
// name: "Product Documentation",
// file_ids: [file1.id, file2.id],
// });
// const vs_id = productDocs.id;
// --- Poll for file processing completion ---

// --- Use in Responses API ---
const vs_id = "vs_..."; // Replace with your actual Vector Store ID
const response = await client.responses.create({
model: "gpt-4o-mini",
tools: [{
type: "file_search",
vector_store_ids: [vs_id],
}],
input: "What is deep research by OpenAI?",
// include: ["file_search_call.results"] // Optional
});
console.log(response.output_text);
```
* **Computer Use (Conceptual API Call - Execution logic is separate):**
* Python:
```python
response = client.responses.create(
model="computer-use-preview", # Requires specific model & Tier 3+ access
tools=[{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "browser",
}],
truncation="auto", # Required
input="I'm looking for a new camera. Help me find the best one.",
)
print(response.output) # Prints the list of computer_action items
```
* JavaScript:
```javascript
const response = await client.responses.create({
model: "computer-use-preview",
tools: [{
type: "computer_use_preview",
display_width: 1024,
display_height: 768,
environment: "browser",
}],
truncation: "auto",
input: "I'm looking for a new camera. Help me find the best one.",
});
console.log(response.output); // Prints the list of computer_action items
```

**4. Using Custom Function Tools**

* **Request Structure (cURL Example):**
```bash
curl https://api.openai.com/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"input": "What is the weather like in Boston?",
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto" # Or force with {"type": "function", "function": {"name": "get_current_weather"}}
}'

# Expected Response (will contain a tool_call):
# { ..., "output": [ { "type": "tool_call", "id": "call_...",
# "function": { "name": "get_current_weather", "arguments": "{\"location\": \"Boston, MA\"}" } } ], ... }
```
* **Submitting Tool Results (Conceptual Python):**
```python
# Assume previous_response contained the tool_call above, previous_id = previous_response.id
tool_call_id = "call_..."
tool_output_json_string = "{\"temperature\": \"22\", \"unit\": \"celsius\", \"description\": \"Sunny\"}"

response = client.responses.create(
model="gpt-4o",
previous_response_id=previous_id,
input=[ # Input is ONLY the tool result for this turn
{
"role": "tool",
"content": [
{
"type": "tool_result",
"tool_call_id": tool_call_id,
"result": tool_output_json_string
}
]
}
]
)
print(response.output_text) # e.g., "Currently it is 22°C and Sunny in Boston."
```

**5. Handling Conversation State (`previous_response_id`)**

* **Python Example (Two Turns):**
```python
# Turn 1
response1 = client.responses.create(
model="gpt-4o-mini", input="What is the capital of Spain?", store=True)
print(f"Assistant (Turn 1): {response1.output_text}")
response1_id = response1.id

# Turn 2
response2 = client.responses.create(
model="gpt-4o-mini",
input="What language do they speak there?", # Only new input
previous_response_id=response1_id # Link to previous response
)
print(f"Assistant (Turn 2): {response2.output_text}")
response2_id = response2.id # Store for next potential turn
```

**6. Streaming Responses (SSE)**

* **Python:**
```python
stream = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": "Say 'double bubble bath' ten times fast."}],
stream=True,
)
async for event in stream:
print(event) # Prints the raw SSE event objects (type, data)
# To print just text deltas:
# if event.event == 'response.output_text.delta':
# print(event.data.delta, end="", flush=True)
```
* **JavaScript:**
```javascript
const stream = await client.responses.create({
model: "gpt-4.1",
input: [{"role": "user", "content": "Say 'double bubble bath' ten times fast."}],
stream: true,
});
for await (const event of stream) {
console.log(event); // Prints raw SSE event objects
// if (event.event === 'response.output_text.delta') {
// process.stdout.write(event.data.delta);
// }
}
```
* **Example SSE Chunk Objects (Conceptual):**
```
event: response.created
data: {"type":"response.created","response":{"id":"resp_...", "status":"in_progress", ...}}

event: response.output_item.added
data: {"type":"response.output_item.added","output_index":0,"item":{"id":"msg_...", "type":"message", ...}}

event: response.content_part.added
data: {"type":"response.content_part.added", "item_id":"msg_...", "output_index":0, "content_index":0, "part":{"type":"output_text", ...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...", "output_index":0,"content_index":0,"delta":"Double"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...", "output_index":0,"content_index":0,"delta":" bubble"}

... many more deltas ...

event: response.output_text.done
data: {"type":"response.output_text.done", "item_id":"msg_...", ..., "text":"Double bubble bath..."}

event: response.content_part.done
data: {"type":"response.content_part.done", "item_id":"msg_...", ...}

event: response.output_item.done
data: {"type":"response.output_item.done", "output_index":0, "item":{... complete message ...}}

event: response.completed
data: {"type":"response.completed","response":{..."id":"resp_...", "status":"completed", "usage":{...}, ...}}

event: done
data: [DONE]
```

**7. Structured Output (JSON Schema)**

* **Request Structure (cURL Example):**
```bash
curl https://api.openai.com/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"input": "Extract the name and city from: John Doe lives in London.",
"text": {
"format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Person'\''s full name"},
"city": {"type": "string", "description": "City name"}
},
"required": ["name", "city"]
}
}
}
}'

# Expected Response 'output_text' content:
# { "name": "John Doe", "city": "London" }
```

**8. Reasoning Example (Conceptual)**

* **Request Structure:**
```json
{
"model": "o3-mini", // Or other o-series model
"input": "Plan a 3-day trip to Paris focusing on museums.",
"reasoning": {
"effort": "medium" // Optional: low, medium, high
},
"include": ["reasoning.summary"] // Optional: request summary
}
```
* **Response Structure (Conceptual):** The `output` array might contain a `reasoning` item: `{"type": "reasoning", "summary": [{"type": "summary_text", "text": "**Planning Paris Museum Trip**\nUser wants a 3-day itinerary... Day 1: Louvre..."}]}` along with the final `message` output.

**II. Agents SDK Examples**

**1. Basic Agent Definition and Run**

```python
import asyncio
from agents import Agent, Runner

# Define Agent
agent = Agent(
name="Haiku Assistant",
instructions="You only respond in haikus."
# model defaults to gpt-4o if not specified
)

async def main():
# Run Agent
result = await Runner.run(agent, "Write a haiku about a sunny day.")
print(result.final_output)
# Example Output:
# Golden light descends,
# Warm breeze whispers through the trees,
# Summer day delights.

if __name__ == "__main__":
asyncio.run(main())
```

**2. Agent with Function Tools** (See Currency Converter example in Section III.1)

**3. Agent with Built-in Tools**

* **Web Search:**
```python
import asyncio
from agents import Agent, Runner, WebSearchTool

agent = Agent(
name="Web Searcher",
instructions="You are a helpful agent that uses web search.",
tools=[WebSearchTool(user_location={"type": "approximate", "city": "New York"})]
)

async def main():
result = await Runner.run(agent, "Local sports news?")
print(result.final_output)

if __name__ == "__main__":
asyncio.run(main())
```
* **File Search:** (Requires `vs_id` from a pre-configured Vector Store)
```python
import asyncio
from agents import Agent, Runner, FileSearchTool

vs_id = "vs_..." # Replace with your Vector Store ID
agent = Agent(
name="Doc Searcher",
instructions="Answer based on the provided documents.",
tools=[FileSearchTool(vector_store_ids=[vs_id], max_num_results=3)]
)

async def main():
result = await Runner.run(agent, "What is the return policy?")
print(result.final_output)

if __name__ == "__main__":
asyncio.run(main())
```
* **Computer Use:** (Requires `AsyncComputer` implementation like `LocalPlaywrightComputer`)
```python
import asyncio
from agents import Agent, Runner, ComputerTool, ModelSettings
# Assume LocalPlaywrightComputer is defined and playwright installed

async def main():
async with LocalPlaywrightComputer() as computer:
agent = Agent(
name="Browser Agent",
instructions="Use the browser to find SF sports news.",
model="computer-use-preview",
model_settings=ModelSettings(truncation="auto"),
tools=[ComputerTool(computer)]
)
result = await Runner.run(agent, "Search for SF sports news and summarize.")
print(result.final_output)

if __name__ == "__main__":
asyncio.run(main())
```

**4. Handoffs and Routing** (See Language Triage example in Section III.2)

**5. Agents as Tools** (See Developer Assistant example in Section III.3)

**6. Guardrails** (See Relevance Guardrail example in Section III.3)

**7. Dynamic Instructions and Context**

```python
import asyncio
import random
from agents import Agent, RunContextWrapper, Runner
from typing import Literal

class CustomContext:
def __init__(self, style: Literal["haiku", "pirate", "robot"]):
self.style = style

# Dynamic instructions function
def custom_instructions(ctx: RunContextWrapper[CustomContext], agent: Agent) -> str:
style = ctx.context.style
if style == "haiku": return "Only respond in haikus."
if style == "pirate": return "Respond as a pirate. Arrr!"
return "Respond as a robot. Beep boop."

# Agent typed with the context
agent = Agent[CustomContext](
name="Style Agent",
instructions=custom_instructions
)

async def main():
style_choice: Literal["haiku", "pirate", "robot"] = random.choice(["haiku", "pirate", "robot"])
# Create the context object
run_context = CustomContext(style=style_choice)
print(f"Using style: {style_choice}\n")

user_message = "Tell me a joke."
print(f"User: {user_message}")
# Pass context to the runner
result = await Runner.run(agent, user_message, context=run_context)
print(f"Assistant: {result.final_output}")

if __name__ == "__main__":
asyncio.run(main())
```

**8. Lifecycle Hooks** (See `lifecycle_example.py` simulation in Section III.4)

**9. Streaming with the SDK**

* **Raw SSE Events:**
```python
# (See Responses API Streaming Python example - identical SDK usage)
# Iterate through result.stream_events() and check event.type == 'raw_response_event'
# Then access event.data (which is a ResponseStreamEvent object)
```
* **SDK Semantic Events:**
```python
import asyncio
from agents import Agent, Runner, ItemHelpers, function_tool
import random

@function_tool
def how_many_jokes() -> int: return random.randint(1, 3)

agent = Agent(
name="Joker",
instructions="Call `how_many_jokes`, then tell that many jokes.",
tools=[how_many_jokes]
)

async def main():
result = Runner.run_streamed(agent, input="Hello")
print("=== Run starting ===")
async for event in result.stream_events():
if event.type == "raw_response_event": continue
elif event.type == "agent_updated_stream_event":
print(f"Agent updated: {event.new_agent.name}")
elif event.type == "run_item_stream_event":
item = event.item
if item.type == "tool_call_item": print("-- Tool was called")
elif item.type == "tool_call_output_item": print(f"-- Tool output: {item.output}")
elif item.type == "message_output_item": print(f"-- Message output:\n {ItemHelpers.text_message_output(item)}")
print("=== Run complete ===")

if __name__ == "__main__":
asyncio.run(main())
```

**10. Using Other Model Providers (LiteLLM Example)**

```python
# Requires: pip install "openai-agents[litellm]"
# Needs API Key for the chosen provider (e.g., ANTHROPIC_API_KEY)
import asyncio
from agents import Agent, Runner, set_tracing_disabled
from agents.extensions.models.litellm_model import LitellmModel

# Disable tracing if no OpenAI key is available/desired for tracing
set_tracing_disabled(True)

async def main(model_name: str, api_key: str):
agent = Agent(
name="Claude Assistant",
instructions="Respond concisely.",
# Use LitellmModel, passing model name and API key
model=LitellmModel(model=model_name, api_key=api_key)
)
result = await Runner.run(agent, "Explain quantum entanglement simply.")
print(result.final_output)

# Example invocation (replace model/key)
# if __name__ == "__main__":
# asyncio.run(main(model="anthropic/claude-3-haiku-20240307", api_key="YOUR_ANTHROPIC_KEY"))
```

**11. Visualization**

```python
from agents import Agent, function_tool, handoff
from agents.extensions.visualization import draw_graph

@function_tool
def get_weather(city: str) -> str: return f"Weather in {city} is nice."

spanish_agent = Agent(name="Spanish agent", instructions="Hablas español.")
english_agent = Agent(name="English agent", instructions="Speak English.")

triage_agent = Agent(
name="Triage agent",
instructions="Handoff based on language.",
handoffs=[spanish_agent, english_agent],
tools=[get_weather]
)

# Generate and save/view the graph
graph = draw_graph(triage_agent, filename="triage_graph") # Saves triage_graph.png
# graph.view() # Opens in default viewer if filename is None
```

**III. Other API Examples**

**1. Fine-tuning Data Formats (JSONL Lines)**

* **Supervised Chat:**
```json
{"messages": [{"role": "system", "content": "Marv is a factual chatbot."}, {"role": "user", "content": "Who invented the lightbulb?"}, {"role": "assistant", "content": "Thomas Edison."}]}
```
* **Supervised Chat with Tools:**
```json
{"tools": [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}}}], "messages": [{"role": "user", "content": "Weather in SF?"}, {"role": "assistant", "tool_calls": [{"id": "call_1", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"San Francisco\"}"}}]}]}
```
* **Preference (DPO) Chat:**
```json
{"input": {"messages": [{"role": "user", "content": "Write a poem about the sea."}]}, "preferred_completion": [{"role": "assistant", "content": "Vast blue expanse, waves crash..."}], "non_preferred_completion": [{"role": "assistant", "content": "The sea is big and blue."}]}
```
* **Legacy Completions:**
```json
{"prompt": "Company: OpenAI\nProduct: GPT-4\nDescription:", "completion": " A large multimodal model..."}
```

**2. Batch API File Formats (JSONL Lines)**

* **Input File Line (`input_file_id`):**
```json
{"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Translate 'hello'"}]}}
{"custom_id": "req-002", "method": "POST", "url": "/v1/embeddings", "body": {"model": "text-embedding-3-small", "input": "Sample text"}}
```
* **Output File Line (`output_file_id`):**
```json
{"id": "batch_req_...", "custom_id": "req-001", "response": {"status_code": 200, "request_id": "req_...", "body": {"id": "chatcmpl-...", "choices": [...]}}, "error": null}
```
* **Error File Line (`error_file_id`):**
```json
{"id": "batch_req_...", "custom_id": "req-003", "response": {"status_code": 429, "request_id": "req_...", "body": {"error": {...rate limit error...}}}, "error": null}
{"id": "batch_req_...", "custom_id": "req-004", "response": null, "error": {"code": "invalid_request", "message": "Invalid model specified."}}
```

**3. Audio API Examples (cURL)**

* **Text-to-Speech (TTS):**
```bash
curl https://api.openai.com/v1/audio/speech \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1", # or tts-1-hd, gpt-4o-mini-tts
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy" # alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse
# "response_format": "mp3", # Optional: mp3, opus, aac, flac, wav, pcm
# "speed": 1.0 # Optional: 0.25 to 4.0
}' \
--output speech.mp3
```
* **Speech-to-Text (STT):**
```bash
curl https://api.openai.com/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/audio.mp3" \
-F model="gpt-4o-transcribe" # or whisper-1
# -F language="en" # Optional hint
# -F response_format="json" # Optional: json, text, srt, verbose_json, vtt (json only for gpt-4o-transcribe)
# -F include[]="logprobs" # Optional, requires json format, not whisper-1
# -F timestamp_granularities[]="word" # Optional, requires verbose_json

# Response (JSON format): {"text": "Transcribed text..."}
```
* **Audio Translation (to English):**
```bash
curl https://api.openai.com/v1/audio/translations \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/german.m4a" \
-F model="whisper-1"

# Response: {"text": "Translated English text..."}
```

**4. Images API Examples (cURL)**

* **Generate Image:**
```bash
curl https://api.openai.com/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "dall-e-3", # or dall-e-2, gpt-image-1
"prompt": "A photorealistic image of an astronaut playing chess with a cat on the moon",
"n": 1,
"size": "1024x1024", # Varies by model
"quality": "standard", # standard or hd (DALL-E 3) / standard (DALL-E 2) / auto, low, medium, high (gpt-image-1)
"style": "vivid" # vivid or natural (DALL-E 3 only)
# "response_format": "url" # url or b64_json (not for gpt-image-1)
}'

# Response: {"created": ..., "data": [{"url": "..." or "b64_json": "..."}]}
```
* **Edit Image (Conceptual - Requires Files):**
```bash
curl https://api.openai.com/v1/images/edits \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F model="dall-e-2" # or gpt-image-1
-F image="@image.png" \
-F mask="@mask.png" \ # Optional
-F prompt="Add sunglasses to the subject" \
-F n=1 \
-F size="1024x1024"
```
* **Create Variation (DALL-E 2 Only - Requires File):**
```bash
curl https://api.openai.com/v1/images/variations \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F image="@otter.png" \
-F n=1 \
-F size="1024x1024"
```

**5. Embeddings API Example (cURL)**

```bash
curl https://api.openai.com/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Your text string goes here",
"model": "text-embedding-3-small", # e.g., or text-embedding-ada-002
"encoding_format": "float", # or base64
"dimensions": 256 # Optional, for newer models
}'

# Response: {"object": "list", "data": [{"object": "embedding", "embedding": [0.0023,...], "index": 0}], "model": "...", "usage": {...}}
```

**6. Moderations API Example (cURL)**

```bash
curl https://api.openai.com/v1/moderations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"input": "Sample text to moderate.",
"model": "text-moderation-latest" # or omni-moderation-latest for text+image
}'

# Response: {"id": "modr-...", "model": "...", "results": [{"flagged": false, "categories": {...}, "category_scores": {...}}]}
```

**7. Administration API Examples (Conceptual cURL - Requires Admin Key)**

* **Create Project:**
```bash
curl -X POST https://api.openai.com/v1/organization/projects \
-H "Authorization: Bearer $OPENAI_ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "My New Agent Project"}'
```
* **Invite User:**
```bash
curl -X POST https://api.openai.com/v1/organization/invites \
-H "Authorization: Bearer $OPENAI_ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"email": "new.developer@example.com",
"role": "reader",
"projects": [{"id": "proj_...", "role": "member"}] # Optional project assignment
}'
```
* **Create Service Account (Returns API Key!):**
```bash
curl -X POST https://api.openai.com/v1/organization/projects/proj_.../service_accounts \
-H "Authorization: Bearer $OPENAI_ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "Agent Backend Service"}'
# Response includes {"api_key": {"value": "sk-...", ...}} -- Store this key!
```
* **List Audit Logs:**
```bash
curl "https://api.openai.com/v1/organization/audit_logs?limit=10&event_types[]=project.created" \
-H "Authorization: Bearer $OPENAI_ADMIN_KEY"
```

---

This addendum provides a significantly more detailed set of examples covering many core functionalities discussed in the documentation. Remember that these are illustrative; specific parameters, error handling, and application logic will vary based on your exact needs. Always refer to the official, up-to-date OpenAI API Reference and SDK documentation for the most precise specifications.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dicklesworthstone/guide_to_openai_response_api_and_agents_sdk

Awesome Lists containing this project

README