https://github.com/sauravbhattacharya001/agentlens
AgentLens โ Observability and Explainability for AI Agents
https://github.com/sauravbhattacharya001/agentlens
agent-framework ai-agents ai-observability ai-safety cost-tracking dashboard debugging devtools explainability langchain llm llm-tools monitoring observability openai prompt-engineering python-sdk token-tracking tracing
Last synced: about 2 months ago
JSON representation
AgentLens โ Observability and Explainability for AI Agents
- Host: GitHub
- URL: https://github.com/sauravbhattacharya001/agentlens
- Owner: sauravbhattacharya001
- License: mit
- Created: 2026-02-14T00:18:13.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2026-04-10T13:35:44.000Z (2 months ago)
- Last Synced: 2026-04-10T15:25:47.474Z (2 months ago)
- Topics: agent-framework, ai-agents, ai-observability, ai-safety, cost-tracking, dashboard, debugging, devtools, explainability, langchain, llm, llm-tools, monitoring, observability, openai, prompt-engineering, python-sdk, token-tracking, tracing
- Language: Python
- Size: 1.52 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# ๐ AgentLens
**Observability and Explainability for AI Agents**
*Datadog meets Chain-of-Thought โ for autonomous agents*
[](https://github.com/sauravbhattacharya001/agentlens/actions/workflows/ci.yml)
[](https://github.com/sauravbhattacharya001/agentlens/actions/workflows/codeql.yml)
[](https://github.com/sauravbhattacharya001/agentlens/actions/workflows/coverage.yml)
[](LICENSE)
[](https://pypi.org/project/agentlens/)
[](https://www.npmjs.com/package/agentlens-backend)
[](https://python.org)
[](https://nodejs.org)
[](https://github.com/sauravbhattacharya001/agentlens)
[](https://github.com/sauravbhattacharya001/agentlens/commits)
[](https://github.com/sauravbhattacharya001/agentlens/issues)
[](https://github.com/sauravbhattacharya001/agentlens)
[Getting Started](#-getting-started) ยท [Features](#-features) ยท [SDK Reference](#-sdk-reference) ยท [Dashboard](#-dashboard) ยท [Architecture](#-architecture) ยท [Contributing](#-contributing) ยท [๐ Full Docs](https://sauravbhattacharya001.github.io/agentlens/) ยท [๐ฏ Live Demo](https://sauravbhattacharya001.github.io/agentlens/demo/)
---
## ๐ฏ What is AgentLens?
AgentLens gives you full visibility into what your AI agents are doing, why they're doing it, and how much it costs. As AI agents become more autonomous โ making decisions, calling tools, chaining actions โ you need to **see inside the black box**.
AgentLens provides:
- **Session-level tracing** for every agent run
- **Token and cost tracking** across models and calls
- **Decision traces** capturing *why* an agent made each choice
- **Human-readable explanations** of agent behavior
- **A real-time dashboard** to monitor everything visually
## ๐ค Why AgentLens?
| | LangSmith | Helicone | Weights & Biases | **AgentLens** |
|---|:---:|:---:|:---:|:---:|
| Self-hosted | โ | โ | โ | โ
|
| Zero external dependencies | โ | โ | โ | โ
|
| Decision-level explainability | โ | โ | โ | โ
|
| Built-in anomaly detection | โ | โ | โ | โ
|
| Session comparison & diff | โ | โ | โ | โ
|
| Cost forecasting | โ | Partial | โ | โ
|
| No vendor lock-in | โ | โ | โ | โ
|
| Free & open source | โ | Partial | โ | โ
|
AgentLens runs entirely on your infrastructure โ SQLite for storage, no cloud dependencies, no data leaving your network.
## โจ Features
| Feature | Description |
|---------|-------------|
| ๐ **Session Tracking** | Group agent actions into sessions with full execution traces |
| ๐ ๏ธ **Tool Call Capture** | Record every tool invocation with inputs, outputs, and duration |
| ๐ฐ **Token Usage** | Track token consumption and costs across models |
| ๐ง **Decision Traces** | Capture the reasoning behind each agent decision |
| ๐ **Visual Timeline** | Interactive timeline view of agent actions in the dashboard |
| ๐ก **Explainability** | Generate human-readable summaries of agent behavior |
| ๐จ **Decorators** | Zero-config instrumentation with Python decorators |
| ๐ **Analytics Dashboard** | Aggregate stats, model usage, hourly activity heatmap, sessions-over-time |
| โ๏ธ **Session Comparison** | Compare two sessions side-by-side โ token deltas, event breakdowns, tool usage diffs |
| ๐ฒ **Cost Estimation** | Configurable model pricing, per-session/event cost tracking, cost breakdown dashboard |
| ๐ **Alert Rules** | Configurable alert rules with metric thresholds and event triggers |
| ๐ท๏ธ **Session Tags** | Tag sessions for filtering, organization, and retention exemption |
| ๐ **Annotations** | Timestamped notes on sessions and events for auditing |
| ๐๏ธ **Data Retention** | Configurable retention policies with auto-purge and exempt tags |
| ๐ **Event Search** | Rich filtering across sessions โ by type, model, tokens, duration |
| ๐ฌ **Anomaly Detection** | Z-score statistical analysis to detect latency spikes, token surges, error bursts |
| ๐ฅ **Health Scoring** | Grade sessions AโF based on error rates, latency, tool failures |
| ๐ธ **Cost Budgets** | Per-agent and global spending limits with real-time tracking, warnings, and overage detection |
| ๐ **Session Narratives** | Auto-generate human-readable summaries of agent session behavior |
| ๐ **Agent Scorecards** | Per-agent performance grading with composite scores and letter grades |
| ๐ฎ **Cost Forecasting** | Budget projections with what-if simulator and model breakdown |
| ๐ **Token Heatmap** | Calendar-style visualization of token consumption patterns |
| โฑ๏ธ **Trace Waterfall** | Interactive Gantt-style event visualization for session traces |
| ๐ **Session Diff** | Side-by-side visual comparison of two agent sessions |
| โ **Error Analytics** | Error grouping by type, agent, and model with trend analysis |
| ๐ฏ **Command Center** | Unified activity feed aggregating alerts, anomalies, budget warnings, and health signals |
| ๐ **SLA Compliance** | Track SLA targets with compliance rings, violation alerts, and history |
## ๐๏ธ Architecture
```
โโโโโโโโโโโโโโโโ HTTP POST โโโโโโโโโโโโโโโโโโโโ SQLite โโโโโโโโโโโโ
โ Your Agent โ โโโโโโโโโโโโโโโโโโโบ โ AgentLens API โ โโโโโโโโโโโโโโโบ โ DB โ
โ + SDK โ /events โ (Express.js) โ โโโโโโโโโโโโ
โโโโโโโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโ
โ REST API
โโโโโโโโโโดโโโโโโโโโโ
โ Dashboard โ
โ (HTML/CSS/JS) โ
โโโโโโโโโโโโโโโโโโโโ
```
| Component | Directory | Tech Stack |
|-----------|-----------|------------|
| **Python SDK** | `sdk/` | Python 3.9+, Pydantic, httpx |
| **Backend API** | `backend/` | Node.js, Express, better-sqlite3 |
| **Dashboard** | `dashboard/` | Vanilla HTML/CSS/JS (no build step) |
## ๐ Getting Started
### Prerequisites
- **Python 3.9+** (for the SDK)
- **Node.js 18+** (for the backend)
- **npm** (comes with Node.js)
### 1. Clone the repo
```bash
git clone https://github.com/sauravbhattacharya001/agentlens.git
cd agentlens
```
### 2. Start the Backend
```bash
cd backend
npm install
node seed.js # Load demo data (optional)
node server.js # Starts on http://localhost:3000
```
The dashboard is served automatically at [http://localhost:3000](http://localhost:3000).
### 3. Install the Python SDK
```bash
pip install agentlens
```
Or install from source for development:
```bash
cd sdk
pip install -e .
```
### 4. Use the CLI
After installing the SDK, you get the `agentlens` command:
```bash
# Check backend connectivity
agentlens status
# List recent sessions
agentlens sessions --limit 10
# View cost breakdown for a session
agentlens costs
# Search events by type or model
agentlens events --type llm_call --model gpt-4
# Export a session to JSON or CSV
agentlens export --format csv -o report.csv
# Health score for a session (AโF grading)
agentlens health
# Compare two sessions side-by-side
agentlens compare
# View aggregate analytics
agentlens analytics
# List recent alerts
agentlens alerts
# Generate incident postmortem for a session
agentlens postmortem
# List sessions eligible for postmortem analysis
agentlens postmortem --candidates --min-errors 3
# Live session leaderboard
agentlens top
# Live-follow session events
agentlens tail
# Generate time-range summary report
agentlens report --from 2024-01-01 --to 2024-01-31
# Generate interactive HTML flamegraph for a session
agentlens flamegraph -o profile.html --open
# Print flamegraph statistics without generating HTML
agentlens flamegraph --stats
# Generate self-contained HTML dashboard with interactive charts
agentlens dashboard --limit 200 -o dashboard.html --open
# Evaluate sessions against SLA policies
agentlens sla --policy production --limit 100
# Custom SLA targets with verbose output
agentlens sla --latency 2000 --error-rate 5 --token-budget 8000 --slo 95 --verbose
# SLA compliance as JSON for CI/CD pipelines
agentlens sla --policy production --json
```
Configure via environment variables:
```bash
export AGENTLENS_ENDPOINT=http://localhost:3000
export AGENTLENS_API_KEY=your-key
```
Or pass `--endpoint` and `--api-key` flags to any command.
### 5. Instrument Your Agent
```python
import agentlens
# Initialize the SDK
agentlens.init(api_key="your-key", endpoint="http://localhost:3000")
# Start a tracking session
session = agentlens.start_session(agent_name="my-agent")
# Track events manually
agentlens.track(
event_type="llm_call",
input_data={"prompt": "What is 2+2?"},
output_data={"response": "4"},
model="gpt-4",
tokens_in=12,
tokens_out=3,
reasoning="Simple arithmetic question, answered directly",
)
# Get a human-readable explanation
print(agentlens.explain())
# End the session
agentlens.end_session()
```
### 5. Run the Demo
```bash
cd sdk/examples
python mock_agent.py
# Then open http://localhost:3000 to see the results
```
## ๐ SDK Reference
### Initialization
```python
import agentlens
# Connect to your AgentLens backend
tracker = agentlens.init(
api_key="your-key", # API key for authentication
endpoint="http://localhost:3000" # Backend URL
)
```
### Session Management
```python
# Start a session
session = agentlens.start_session(
agent_name="my-agent", # Name of the agent
metadata={"env": "prod"} # Optional metadata
)
# End the session (flushes all pending events)
agentlens.end_session()
```
### Manual Event Tracking
```python
event = agentlens.track(
event_type="llm_call", # Event type: llm_call, tool_call, generic
input_data={"prompt": "..."}, # Input to the operation
output_data={"text": "..."}, # Output from the operation
model="gpt-4", # Model used (if applicable)
tokens_in=100, # Input tokens
tokens_out=50, # Output tokens
reasoning="...", # Why the agent made this decision
tool_name="search", # Tool name (for tool calls)
tool_input={"query": "..."}, # Tool input
tool_output={"results": []}, # Tool output
duration_ms=1500.0, # Execution duration in ms
)
```
### Decorators (Zero-Config)
```python
from agentlens import track_agent, track_tool_call
@track_agent(model="gpt-4")
def my_agent(prompt):
"""Automatically tracked โ captures input, output, and timing."""
return call_llm(prompt)
@track_tool_call(tool_name="web_search")
def search(query):
"""Automatically tracked โ captures tool input/output."""
return do_search(query)
```
### Explainability
```python
# Get a human-readable explanation of agent behavior
explanation = agentlens.explain()
print(explanation)
# Output: "The agent received a question about arithmetic.
# It called GPT-4 which responded with '4'.
# Total tokens used: 15 (12 in, 3 out)."
```
### Session Comparison
```python
# Compare two sessions side-by-side
result = agentlens.compare_sessions(
session_a="abc123",
session_b="def456",
)
# Result includes metrics, deltas, and shared breakdowns
print(f"Token delta: {result['deltas']['total_tokens']['percent']}%")
print(f"Session A events: {result['session_a']['event_count']}")
print(f"Session B events: {result['session_b']['event_count']}")
print(f"Shared tools: {result['shared']['tools']}")
```
### Cost Estimation
```python
# Get cost breakdown for the current session
costs = agentlens.get_costs()
print(f"Total cost: ${costs['total_cost']:.4f}")
print(f"Input cost: ${costs['total_input_cost']:.4f}")
print(f"Output cost: ${costs['total_output_cost']:.4f}")
# Per-model breakdown
for model, mc in costs['model_costs'].items():
print(f" {model}: ${mc['total_cost']:.4f} ({mc['calls']} calls)")
# View/update model pricing (per 1M tokens, USD)
pricing = agentlens.get_pricing()
print(pricing['pricing']) # Current pricing config
# Set custom pricing
agentlens.set_pricing({
"my-custom-model": {
"input_cost_per_1m": 5.00,
"output_cost_per_1m": 15.00,
}
})
```
### Event Search
```python
# Search events with rich filtering
results = tracker.search_events(
q="error", # Full-text search
event_type="tool_call", # Filter by type
model="gpt-4", # Filter by model
min_tokens=100, # Minimum token count
has_tools=True, # Only events with tool calls
after="2024-01-01T00:00:00Z", # Date range
limit=50, # Max results
)
for event in results["events"]:
print(f"{event['event_type']}: {event.get('model', 'N/A')}")
```
### Session Tags
```python
# Add tags to the current session
tracker.add_tags(["production", "v2.0", "critical"])
# Remove specific tags
tracker.remove_tags(["v2.0"])
# Get tags for a session
tags = tracker.get_tags()
# List all tags across sessions
all_tags = tracker.list_all_tags()
# Find sessions by tag
sessions = tracker.list_sessions_by_tag("production")
```
### Annotations
```python
# Annotate a session with timestamped notes
tracker.annotate(
"Latency spike detected at step 5",
annotation_type="warning",
author="monitoring-bot",
)
tracker.annotate(
"Reached goal state",
annotation_type="milestone",
)
# Retrieve annotations
annotations = tracker.get_annotations(annotation_type="warning")
for ann in annotations["annotations"]:
print(f"[{ann['type']}] {ann['text']}")
# Update or delete annotations
tracker.update_annotation("ann-id-123", text="Updated note")
tracker.delete_annotation("ann-id-456")
```
### Alert Rules
```python
# Create an alert rule
tracker.create_alert_rule(
name="High Error Rate",
metric="error_rate",
condition="gt",
threshold=0.1,
description="Fires when error rate exceeds 10%",
)
# List and evaluate rules
rules = tracker.list_alert_rules()
alerts = tracker.evaluate_alerts() # Check all rules against recent data
alert_events = tracker.get_alert_events(limit=20)
```
### Anomaly Detection
```python
from agentlens import AnomalyDetector, AnomalyDetectorConfig
config = AnomalyDetectorConfig(
warning_threshold=2.0, # 2ฯ = warning
critical_threshold=3.0, # 3ฯ = critical
)
detector = AnomalyDetector(config)
# Analyze a session for anomalies
report = detector.analyze(session_events)
print(f"Found {len(report.anomalies)} anomalies")
for anomaly in report.anomalies:
print(f" [{anomaly.severity.value}] {anomaly.kind.value}: {anomaly.description}")
```
### Health Scoring
```python
from agentlens import HealthScorer, HealthThresholds
scorer = HealthScorer()
report = scorer.score(session_events)
print(f"Overall: {report.overall_grade.value} ({report.overall_score:.0f}/100)")
for metric in report.metrics:
print(f" {metric.name}: {metric.grade.value} ({metric.score:.0f}/100)")
```
### Data Retention
```python
# Configure retention policy
tracker.set_retention_config(
max_age_days=30, # Delete sessions older than 30 days
max_sessions=10000, # Keep max 10k sessions
exempt_tags=["production"], # Never delete production sessions
auto_purge=True, # Enable automatic cleanup
)
# Preview what would be purged
preview = tracker.purge(dry_run=True)
print(preview["message"])
# Actually purge
result = tracker.purge()
print(f"Purged {result['purged_sessions']} sessions")
```
### Data Models
| Model | Description |
|-------|-------------|
| `AgentEvent` | A single observable event (LLM call, tool use, decision) |
| `ToolCall` | A tool/function invocation with input and output |
| `DecisionTrace` | The reasoning behind an agent's decision |
| `Session` | A collection of events for one agent run |
| `AlertRule` | A configurable alert rule with metric and threshold |
| `Anomaly` | A detected statistical anomaly in session metrics |
| `HealthReport` | Graded health assessment of a session (AโF) |
## ๐ Dashboard
The dashboard provides a real-time view of your agent sessions:
- **Sessions List** โ Filter by status (active, completed, error)
- **Session Comparison** โ Select two sessions and compare side-by-side with visual diffs
- **Analytics Overview** โ Click ๐ Analytics to see aggregate stats, model usage, hourly activity, and top agents
- **Timeline View** โ Interactive timeline of every event in a session
- **Token Charts** โ Per-event and cumulative token usage visualization
- **Explain Tab** โ Human-readable behavior summaries
- **Costs Tab** โ Per-event and per-model cost breakdowns, cumulative cost chart, configurable model pricing
- **Cost Forecast** โ Budget projections with what-if simulator and model breakdown
- **Agent Scorecards** โ Per-agent performance grading with composite scores, letter grades, and sparkline trends
- **Token Heatmap** โ Calendar-style visualization of daily token consumption
- **Trace Waterfall** โ Gantt-style visualization of event timing within a session
- **Session Diff Viewer** โ Side-by-side comparison of two sessions with event-level diffs
- **Error Analytics** โ Error grouping by type, agent, and model with trends
- **SLA Compliance** โ Compliance rings, violation alerts, and history charts
The dashboard is a lightweight HTML/CSS/JS app served directly by the backend โ no build step required.
## ๐ API Endpoints
The backend exposes a comprehensive REST API with **80+ endpoints** across 16 route groups:
| Route Group | Endpoints | Description |
|-------------|-----------|-------------|
| **Sessions** | 8 | CRUD, search, explain, export, compare |
| **Events** | 1 | Batch event ingestion (up to 500/call) |
| **Analytics** | 4 | Aggregate stats, performance, heatmaps, cache |
| **Pricing & Costs** | 4 | Model pricing config, per-session cost calculation |
| **Alerts** | 8 | Alert rules CRUD, evaluation, acknowledgment |
| **Webhooks** | 6 | Webhook CRUD, test delivery, delivery history |
| **Correlations** | 10 | Correlation rules, groups, event correlations |
| **Correlation Scheduler** | 6 | SSE stream, schedule management, scheduler control |
| **Tags** | 5 | Session tagging, tag-based filtering |
| **Bookmarks** | 4 | Session bookmarking |
| **Annotations** | 5 | Timestamped notes on sessions and events |
| **Baselines** | 5 | Agent performance baselines and drift detection |
| **Error Analysis** | 5 | Error grouping by type, agent, model with trends |
| **Dependencies** | 5 | Service dependency graph, co-occurrence, critical paths |
| **Leaderboard** | 1 | Agent performance ranking |
| **Postmortem** | 2 | Incident report generation and candidate listing |
| **Retention** | 4 | Retention config, stats, manual purge |
| **Health** | 1 | Health check |
> ๐ **Full API reference with request/response examples:** [docs/API.md](docs/API.md)
## ๐ ๏ธ Tech Stack
- **Python SDK**: Pydantic for data validation, httpx for async HTTP
- **Backend**: Express.js with better-sqlite3 for zero-config persistence
- **Dashboard**: Vanilla JS with Canvas-based charts (no framework dependencies)
- **Database**: SQLite (embedded, no external DB setup needed)
## ๐ค Contributing
Contributions are welcome! Here's how to get started:
1. Fork the repo
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests: `cd sdk && pytest`
5. Commit (`git commit -m 'Add amazing feature'`)
6. Push (`git push origin feature/amazing-feature`)
7. Open a Pull Request
### Development Setup
```bash
# Backend (with auto-reload)
cd backend && npm install && node server.js
# SDK (editable install with dev deps)
cd sdk && pip install -e ".[dev]"
# Run SDK tests
cd sdk && pytest
```
## ๐ License
MIT โ see [LICENSE](LICENSE) for details.
---
**Built by [Saurav Bhattacharya](https://github.com/sauravbhattacharya001)**
*Because if you can't see what your agents are doing, you can't trust them.*