https://github.com/sukethrp/agentos
"The Operating System for AI Agents. Build, Test, Deploy, Monitor, Govern."
https://github.com/sukethrp/agentos
agent-framework agent-monitoring agent-testing agentos ai-agent-platform ai-agents anthropic fastapi governance llm ollama openai python rag safety
Last synced: 2 months ago
JSON representation
"The Operating System for AI Agents. Build, Test, Deploy, Monitor, Govern."
- Host: GitHub
- URL: https://github.com/sukethrp/agentos
- Owner: sukethrp
- Created: 2026-02-06T05:05:50.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-21T21:30:36.000Z (3 months ago)
- Last Synced: 2026-03-22T07:42:41.590Z (3 months ago)
- Topics: agent-framework, agent-monitoring, agent-testing, agentos, ai-agent-platform, ai-agents, anthropic, fastapi, governance, llm, ollama, openai, python, rag, safety
- Language: Python
- Homepage: https://agentos-mocha.vercel.app
- Size: 825 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
π€ AgentOS
The Operating System for AI Agents
Build, Test, Deploy, Monitor, and Govern AI agents β from prototype to production.
π Live Demo Β·
π Quick Start Β·
π Issues
> **For teams who need to deploy AI agents with testing, governance, and monitoring built in β not bolted on.**
## 3 Differentiators
- π§ͺ **Test**: Run scenario-based simulation before deploy, with quality and cost scoring.
- π‘οΈ **Govern**: Enforce budgets, permissions, and kill-switch policies with auditability.
- π **Monitor**: Observe live agent runs, tool usage, latency, and spend in one dashboard.
## Quick Start
```bash
pip install agentos-platform
```
10-line example:
```python
from agentos.governed_agent import GovernedAgent
from agentos.core.tool import tool
@tool(description="Add two numbers")
def add(a: float, b: float) -> float:
return a + b
agent = GovernedAgent(name="demo", model="gpt-4o-mini", tools=[add])
print(agent.run("What is 12.5 + 7.5?"))
```
Demo mode:
```bash
AGENTOS_DEMO_MODE=true python examples/run_web_builder.py
```
## Features
### MCP server with stdio/SSE transport (Claude Desktop + Cursor)
Install the MCP extra:
```bash
pip install 'agentos-platform[mcp]'
```
### 1) Start the MCP server
Expose built-in AgentOS tools (stdio transport is the safest choice for MCP clients like Claude Desktop and Cursor):
```bash
agentos mcp serve --transport stdio
```
Expose tools from a specific agent module (example `./my_agent/agent.py`):
```bash
agentos mcp serve --transport stdio --agent ./my_agent
```
Optional: run the HTTP SSE transport for clients that support it:
```bash
agentos mcp serve --transport sse --host 127.0.0.1 --port 8080
```
### 2) Configure Claude Desktop
Add the following snippet to your `claude_desktop_config.json` (restart Claude Desktop after editing):
```json
{
"mcpServers": {
"agentos": {
"command": "agentos",
"args": ["mcp", "serve", "--transport", "stdio"]
}
}
}
```
If you want a specific agent module:
```json
{
"mcpServers": {
"agentos": {
"command": "agentos",
"args": ["mcp", "serve", "--transport", "stdio", "--agent", "/absolute/path/to/agent.py"]
}
}
}
```
### 3) Configure Cursor
Add to Cursor `.cursor/mcp.json`:
```json
{
"mcpServers": {
"agentos": {
"command": "agentos",
"args": ["mcp", "serve", "--transport", "stdio"]
}
}
}
```
### Agent delegation (delegate tool + SharedContext + chaining)
AgentOS includes a structured delegation system that lets a βparentβ agent offload subtasks to βchildβ agents while propagating rich context through a shared, in-memory key/value store.
Key pieces:
- `delegate_subtask` tool: LLM-facing tool that accepts structured fields like `task`, `context_json`, `constraints_json`, `expected_output_schema_json`, and `timeout`.
- `SharedContext`: a key/value store child agents can read/write during the delegation chain (avoids lossy prompt compression).
- Delegation chaining: if a child agent delegates again, the same shared context key is reused automatically.
Minimal wiring example:
```python
from agentos.core.agent import Agent
from agentos.core.delegation import DelegationManager
# Define your child agents however you like.
child_agent_a = Agent(name="child-a", model="gpt-4o-mini", tools=[])
child_agent_b = Agent(name="child-b", model="gpt-4o-mini", tools=[])
manager = DelegationManager()
manager.register_agent("child-a", child_agent_a)
manager.register_agent("child-b", child_agent_b)
# Create your parent agent and attach the delegate tool.
parent = Agent(name="parent", model="gpt-4o-mini", tools=[])
manager.attach_delegate_tool(parent) # adds `delegate_subtask` to the toolset
# Now the parent agent can call `delegate_subtask`.
parent.run("Delegate a subtask and use shared context for details.")
```
SharedContext tools available to delegated agents:
- `shared_context_key()`
- `shared_context_get(key)`
- `shared_context_set(key, value_json)`
- `shared_context_dump()`
## Core Modules
| Module | What it does |
|--------|---------------|
| Agent SDK | Define agents and tools with provider-agnostic model routing |
| Simulation Sandbox | Test scenarios with LLM-as-judge quality and pass/fail scoring |
| Governance Engine | Budget controls, permissions, kill switch, and audit logging |
| Live Dashboard | Real-time traces for prompts, tool calls, latency, and spend |
| RAG Pipeline | Ingest, chunk, embed, and retrieve knowledge sources |
| Workflow Engine | Compose repeatable multi-step agent workflows |
π Full 15-module list (click to expand)
| Module | Description |
|--------|-------------|
| Agent SDK | Core governed agent runtime and tool-calling loop |
| WebSocket Streaming | Token streaming and low-latency interactive sessions |
| RAG Pipeline | Ingestion, chunking, embeddings, retrieval, and reranking |
| Simulation Sandbox | Scenario simulation, scoring, and comparison reports |
| Live Dashboard | Event stream, usage analytics, and operational visibility |
| Governance Engine | Guardrails, budget caps, permission checks, and audits |
| Agent Scheduler | Interval and cron scheduling with execution history |
| Event Bus | Trigger-driven orchestration via internal and external events |
| Plugin System | Runtime-extensible tools, providers, and adapters |
| Authentication | API key auth, org and user usage tracking, and middleware |
| A/B Testing | Side-by-side evaluation for variants and prompt changes |
| Workflow Engine | DAG-based execution with retries and branching |
| Multimodal | Vision and document flows for image and file-aware agents |
| Marketplace | Template registry for reusable agents and workflows |
| Embed SDK | Embeddable widget and integration surface for web apps |
## Honest Comparison
| Capability | AgentOS | LangChain | CrewAI | AutoGen |
|------------|---------|-----------|--------|---------|
| Built-in testing sandbox | β
Native | β External setup | β External setup | β External setup |
| Governance (budget/kill switch) | β
Native | β οΈ Custom code | β οΈ Custom code | β οΈ Custom code |
| Real-time ops dashboard | β
Native | β οΈ LangSmith add-on | β | β |
| Batteries-included platform | β
Yes | β οΈ Framework-first | β οΈ Orchestration-first | β οΈ Research-first |
| Ecosystem maturity | π± Growing | β
Very mature | β
Mature | β
Mature |
## Benchmarks
See [full benchmark results](docs/benchmarks.md). Key findings:
- Our weighted evaluation ensemble correlates 0.91 with human judgment
- Local embeddings achieve 95% of OpenAI quality at zero cost
- Governance adds <5ms overhead to any query
## Architecture
See the architecture diagram above and `docs/` for component-level details and ADRs.
## Project Structure
```text
agentos/
βββ src/agentos/ # Core platform modules
βββ frontend/ # React frontend
βββ dashboard/ # Web dashboard UI
βββ deploy/helm/ # Helm charts
βββ examples/ # Runnable examples
βββ tests/ # Unit and integration tests
βββ docs/ # Docs and ADRs
```
## Contributing
Contributions are welcome: [CONTRIBUTING.md](CONTRIBUTING.md)
## Roadmap
Roadmap and upcoming work are tracked in [GitHub Issues](https://github.com/sukethrp/agentos/issues).
- [ ] Agent-to-Agent mesh protocol
- [x] MCP server with stdio/SSE transport
- [x] Agent-to-agent delegation with shared context