https://github.com/us-all/datadog-mcp-server
Datadog MCP server — 159 tools for metrics, monitors, logs, APM, RUM, synthetics, incidents, fleet, status pages, and more. Read-only by default.
https://github.com/us-all/datadog-mcp-server
apm claude claude-code datadog mcp model-context-protocol monitoring observability rum
Last synced: about 1 month ago
JSON representation
Datadog MCP server — 159 tools for metrics, monitors, logs, APM, RUM, synthetics, incidents, fleet, status pages, and more. Read-only by default.
- Host: GitHub
- URL: https://github.com/us-all/datadog-mcp-server
- Owner: us-all
- License: mit
- Created: 2026-02-26T08:41:19.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-05-03T01:46:53.000Z (about 2 months ago)
- Last Synced: 2026-05-03T03:36:38.023Z (about 2 months ago)
- Topics: apm, claude, claude-code, datadog, mcp, model-context-protocol, monitoring, observability, rum
- Language: TypeScript
- Homepage: https://www.npmjs.com/package/@us-all/datadog-mcp
- Size: 370 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-mcp-servers - **us-all/datadog-mcp-server** - Datadog observability MCP server with 165 tools across metrics, monitors, logs, APM, RUM, incidents, status pages, and fleet automation. 4 workflow Prompts and incident-triage-snapshot aggregation tool. `http` `git` `github` `metrics` (📊 Monitoring)
README
# Datadog MCP Server
> **The Datadog MCP that answers _"why is this happening?"_ — not just _"what's the value?"_**
>
> Aggregation tools that fold 5–7 sequential API calls into one structured response. Full SLO CRUD. Fleet automation. The widest Datadog API coverage in any MCP — **165 tools** built on the [@us-all MCP standard](https://github.com/us-all/mcp-toolkit/blob/main/STANDARD.md).
[](https://www.npmjs.com/package/@us-all/datadog-mcp)
[](https://www.npmjs.com/package/@us-all/datadog-mcp)
[](#full-tool-reference)
[](https://github.com/us-all/mcp-toolkit/blob/main/STANDARD.md)
[](https://glama.ai/mcp/servers/us-all/datadog-mcp-server)
## What it does that others don't
- **Aggregation tools** — `analyze-monitor-state` and `slo-compliance-snapshot` collapse 5–7 sequential API calls into one structured response with a `caveats` array for partial failures. No other Datadog MCP ships this pattern.
- **Full SLO CRUD** — create, update, delete SLOs (and their corrections). The official Bits AI MCP and community alternatives are read-only on SLOs.
- **Fleet Automation** — 17 tools across deployments, schedules, and instrumented pods. Only this server.
- **Status Pages** — 21 tools for full status-page lifecycle (components, degradations, maintenances). Only this server.
- **Token-efficient by design** — `extractFields` projection, `DD_TOOLS`/`DD_DISABLE` 16-category toggles, and a `search-tools` meta-tool keep LLM context low across 165 tools.
- **Apps SDK card** — `slo-compliance-snapshot` renders as a visual card on ChatGPT clients via `_meta["openai/outputTemplate"]`. Claude clients receive the same JSON content (non-breaking).
- **stdio + Streamable HTTP** — defaults to stdio (Claude Desktop / Code). Set `MCP_TRANSPORT=http` for ChatGPT Apps SDK or remote clients (Bearer auth via `MCP_HTTP_TOKEN`).
## Try this — 5 prompts
Connect the server to Claude Desktop or Claude Code, then paste any of these:
1. **SLO health** — *"List my SLOs and their error budget remaining this month. Group by status: compliant, at-risk, breached."*
2. **Incident triage** — *"There's an active incident on `checkout-service`. Pull the linked monitors, the recent error spikes from APM, and which deployments touched the service in the last 24h."*
3. **Monitor noise audit** — *"Find monitors that alerted more than 10 times in the last 7 days but had MTTR under 5 minutes — these are probably flapping."*
4. **RUM error spike** — *"RUM error rate jumped on the checkout funnel between 14:00 and 14:30 today. Show me the top error groups, affected sessions, and the user actions before the errors."*
5. **Fleet rollout** — *"Schedule the `datadog-agent` 7.55.0 rollout to the `staging` cluster, weekends only, starting next Saturday."*
## When to use this vs Datadog's official MCP
Datadog's official MCP (Bits AI MCP, GA 2026-03-09) is **complementary**, not a replacement:
| | Official Datadog MCP | `@us-all/datadog-mcp` (this) |
|--|----------------------|------------------------------|
| Tool count | 16+ core toolsets | **165 tools** across full API surface |
| Deployment | Remote (managed by Datadog) | **Self-host** stdio (npx / Docker / npm) |
| Auth | Datadog SSO | API + APP key |
| Sites | Public Datadog sites | **Any site, incl. internal/sovereign**; US5 default |
| SLO writes | ❌ | ✅ create/update/delete SLOs + corrections |
| Fleet automation | ❌ | ✅ 17 tools |
| Status pages | ❌ | ✅ 21 tools |
| Aggregation tools | ❌ | ✅ `analyze-monitor-state`, `slo-compliance-snapshot` |
| MCP Prompts | ❌ | ✅ 4 (`triage-incident`, `audit-monitor-noise`, `analyze-rum-error-spike`, `investigate-slow-trace`) |
| MCP Resources | ❌ | ✅ `dd://service/{serviceName}`, `dd://team/{teamId}`, `dd://synthetics/{testId}`, etc. |
Use the official Bits AI MCP for fast managed onboarding and SSO. Use this when you need full API coverage, SLO/fleet/status-page write parity, or self-hosting (internal sites, isolated networks, dev/CI sandboxes).
## Install
### Claude Desktop
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"datadog": {
"command": "npx",
"args": ["-y", "@us-all/datadog-mcp"],
"env": {
"DD_API_KEY": "",
"DD_APP_KEY": "",
"DD_SITE": "datadoghq.com"
}
}
}
}
```
### Claude Code
```bash
claude mcp add datadog -s user \
-e DD_API_KEY= -e DD_APP_KEY= -e DD_SITE=datadoghq.com \
-- npx -y @us-all/datadog-mcp
```
### Docker
```bash
docker run -e DD_API_KEY=... -e DD_APP_KEY=... -e DD_SITE=datadoghq.com \
ghcr.io/us-all/datadog-mcp-server:latest
```
### Build from source
```bash
git clone https://github.com/us-all/datadog-mcp-server.git
cd datadog-mcp-server && pnpm install && pnpm build
node dist/index.js
```
## Configuration
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `DD_API_KEY` | ✅ | — | Datadog API key |
| `DD_APP_KEY` | ✅ | — | Datadog Application key |
| `DD_SITE` | ❌ | `us5.datadoghq.com` | Datadog site (see table below) |
| `DD_ALLOW_WRITE` | ❌ | `false` | Set `true` to enable mutations (create/update/delete) |
| `DD_TOOLS` | ❌ | — | Comma-sep allowlist of categories. Only these load — biggest token saver. |
| `DD_DISABLE` | ❌ | — | Comma-sep denylist. Ignored when `DD_TOOLS` is set. |
| `MCP_TRANSPORT` | ❌ | `stdio` | `http` to enable Streamable HTTP transport |
| `MCP_HTTP_TOKEN` | conditional | — | Bearer token. Required when `MCP_TRANSPORT=http` |
| `MCP_HTTP_PORT` | ❌ | `3000` | HTTP listen port |
| `MCP_HTTP_HOST` | ❌ | `127.0.0.1` | HTTP bind host (DNS rebinding protection auto-enabled for localhost) |
| `MCP_HTTP_SKIP_AUTH` | ❌ | `false` | Skip Bearer auth — e.g. behind a reverse proxy that handles it |
**Categories** (16): `metrics`, `monitors`, `dashboards`, `logs`, `apm`, `rum`, `incidents`, `security`, `synthetics`, `ci`, `infra`, `fleet`, `status-pages`, `oncall`, `teams`, `account`.
When `MCP_TRANSPORT=http`: `POST /mcp` (Bearer-auth JSON-RPC) + `GET /health` (public liveness).
**Sites**:
| Site | Value | Region |
|------|-------|--------|
| US1 | `datadoghq.com` | US (Virginia) |
| US3 | `us3.datadoghq.com` | US (Virginia) |
| US5 | `us5.datadoghq.com` | US (Oregon) |
| EU1 | `datadoghq.eu` | EU (Frankfurt) |
| AP1 | `ap1.datadoghq.com` | Asia-Pacific (Tokyo) |
### Token efficiency
Naive setup loads ~25K tokens of tool schema before any conversation. Three knobs mitigate:
| Scenario | Tools | Schema tokens | vs default |
|----------|------:|--------------:|-----------:|
| default (all categories) | 165 | 25,200 | — |
| typical (`DD_TOOLS=metrics,monitors,logs,apm,dashboards`) | 55 | 9,300 | −63% |
| narrow (`DD_TOOLS=metrics,monitors`) | 24 | **3,800** | **−85%** |
1. **Category toggles** — `DD_TOOLS=metrics,monitors,logs,apm` (biggest win).
2. **`extractFields` response projection** — `get-dashboard { dashboardId: "abc", extractFields: "id,title,widgets.*.definition.type" }`.
3. **`search-tools` meta-tool** — always enabled; lets the LLM discover tools at runtime instead of preloading all schemas.
### Read-only mode
By default, all writes are blocked to prevent accidental mutations by AI agents. The following require `DD_ALLOW_WRITE=true`:
`create-monitor`, `update-monitor`, `delete-monitor`, `mute-monitor`, `create-dashboard`, `update-dashboard`, `delete-dashboard`, `send-logs`, `post-event`, `trigger-synthetics`, `create-synthetics-test`, `update-synthetics-test`, `delete-synthetics-test`, `create-downtime`, `cancel-downtime`, `create-case`, `update-case-status`, `send-dora-deployment`, `send-dora-incident`, `create-slo`, `update-slo`, `delete-slo`, plus all fleet/status-page/security writes.
## MCP Prompts (4)
Workflow templates the model can invoke directly:
- `triage-incident` — given an incident ID, walks linked monitors, recent error spikes, and recent deploys.
- `audit-monitor-noise` — flag flapping monitors via alert frequency × MTTR.
- `analyze-rum-error-spike` — diff RUM error rates across two windows, attribute to top error groups.
- `investigate-slow-trace` — given a slow trace ID, traverse the span tree and surface bottleneck spans.
## MCP Resources
Read-only entities by URI: `dd://monitor/{id}`, `dd://dashboard/{id}`, `dd://slo/{id}`, `dd://incident/{id}`, `dd://service/{serviceName}`, `dd://team/{teamId}` (team + members), `dd://synthetics/{testId}`, `dd://host/{name}`.
## Tool reference
165 tools across 16 categories. Use the `search-tools` meta-tool to discover at runtime; the full list is collapsed below.
| Domain | Tools |
|--------|------:|
| Status Pages | 21 |
| RUM (events + apps + metrics + retention) | 27 |
| Metrics, Hosts, SLOs, Downtimes, Containers, Processes | 19 |
| Fleet Automation | 17 |
| Synthetics, Logs/Spans Metrics, SLO Corrections | 16 |
| Monitors, Dashboards, Notebooks, Events | 16 |
| Incidents, Cases, Error Tracking, Audit | 13 |
| OnCall, Teams, Users, Services, Bots | 11 |
| Security signals + rules + suppressions | 9 |
| APM, CI Visibility, DORA, Network Devices | 9 |
| **+ aggregations** | `analyze-monitor-state`, `slo-compliance-snapshot` |
| **+ meta** | `search-tools` |
Full tool list (click to expand)
### Metrics (5)
`query-metrics`, `get-metrics`, `get-metric-metadata`, `list-active-metrics`, `list-metric-tags`
### Monitors (7)
`get-monitors`, `get-monitor`, `create-monitor`, `update-monitor`, `delete-monitor`, `mute-monitor`, `validate-monitor`, `analyze-monitor-state` *(aggregation)*
### Dashboards (5)
`get-dashboards`, `get-dashboard`, `create-dashboard`, `update-dashboard`, `delete-dashboard`
### Logs (3)
`search-logs`, `aggregate-logs`, `send-logs`
### Events (2)
`get-events`, `post-event`
### Incidents (6)
`get-incidents`, `get-incident`, `search-incidents`, `create-incident`, `update-incident`, `delete-incident`
### APM (1)
`search-spans`
### RUM (17)
`search-rum-events`, `aggregate-rum`, `list-rum-applications`, `get-rum-application`, `create-rum-application`, `update-rum-application`, `delete-rum-application`, `list-rum-metrics`, `get-rum-metric`, `create-rum-metric`, `update-rum-metric`, `delete-rum-metric`, `list-rum-retention-filters`, `get-rum-retention-filter`, `create-rum-retention-filter`, `update-rum-retention-filter`, `delete-rum-retention-filter`
### SLOs (6)
`list-slos`, `get-slo`, `get-slo-history`, `create-slo`, `update-slo`, `delete-slo`, `slo-compliance-snapshot` *(aggregation)*, plus 5 SLO-correction tools
### Synthetics (6)
`list-synthetics`, `get-synthetics-result`, `trigger-synthetics`, `create-synthetics-test`, `update-synthetics-test`, `delete-synthetics-test`
### Hosts / Containers / Processes (4)
`list-hosts`, `get-host-totals`, `list-containers`, `list-processes`
### Downtimes (3)
`list-downtimes`, `create-downtime`, `cancel-downtime`
### Security (9)
`search-security-signals`, `get-security-signal`, `list-security-rules`, `get-security-rule`, `delete-security-rule`, `list-security-suppressions`, `get-security-suppression`, `create-security-suppression`, `delete-security-suppression`
### CI Visibility (4)
`search-ci-pipelines`, `aggregate-ci-pipelines`, `search-ci-tests`, `aggregate-ci-tests`
### Cases (4)
`list-cases`, `get-case`, `create-case`, `update-case-status`
### Error Tracking (2)
`list-error-tracking-issues`, `get-error-tracking-issue`
### DORA (2)
`send-dora-deployment`, `send-dora-incident`
### Network Devices (2)
`list-network-devices`, `get-network-device`
### Notebooks (2)
`list-notebooks`, `get-notebook`
### OnCall (2)
`get-team-oncall`, `get-oncall-schedule`
### Services & Software Catalog (2)
`list-services`, `get-service-definition`
### Teams (6)
`list-teams`, `get-team`, `create-team`, `update-team`, `delete-team`, `get-team-members`
### Account & Users (2)
`get-usage-summary`, `list-users`
### Logs/Spans/APM Retention metrics (15)
5 each for `logs-metrics`, `spans-metrics`, `apm-retention-filters` (list/get/create/update/delete)
### Status Pages (21)
Full lifecycle: pages, components, degradations, maintenances. See `src/tools/status-pages.ts`.
### Fleet Automation (17)
Agents, deployments, schedules, instrumented pods. See `src/tools/fleet.ts`.
### Audit (1)
`search-audit-logs`
### Meta (1)
`search-tools` — query other tools by keyword; always enabled regardless of `DD_TOOLS`.
## Architecture
```
Claude → MCP stdio → index.ts → tools/*.ts → @datadog/datadog-api-client → Datadog API
```
Built on [`@us-all/mcp-toolkit`](https://github.com/us-all/mcp-toolkit):
- `extractFields` — token-efficient response projections
- `aggregate(fetchers, caveats)` — fan-out helper for aggregation tools
- `createWrapToolHandler` — domain-specific redaction (DD_API_KEY/DD_APP_KEY) + Datadog `ApiException` error extraction
- `search-tools` meta-tool
## Tech stack
Node.js 22+ • TypeScript strict ESM • pnpm • `@modelcontextprotocol/sdk` • `@datadog/datadog-api-client` (official) • zod • dotenv • vitest + dd-trace.
## Contributing
See [CONTRIBUTING.md](./CONTRIBUTING.md). New shared patterns belong in [`@us-all/mcp-toolkit`](https://github.com/us-all/mcp-toolkit) — single source of truth for the 7-server suite.
## License
[MIT](./LICENSE)