An open API service indexing awesome lists of open source software.

https://github.com/berrybytes/01agent

Kubernetes Alert Remediation System An intelligent Kubernetes alert remediation platform powered by LLM agents and LangGraph. Features a modern React web interface (L0) and specialized remediation agent (L1) that analyzes monitoring alerts, retrieves live cluster context via MCP, and generates executable remediation scripts.
https://github.com/berrybytes/01agent

a2a-protocol agentic-ai-development agentic-workflow deepagents-langgraph k8s kubernetes lanchain langraph mcp-server sre-agent-tools

Last synced: about 1 month ago
JSON representation

Kubernetes Alert Remediation System An intelligent Kubernetes alert remediation platform powered by LLM agents and LangGraph. Features a modern React web interface (L0) and specialized remediation agent (L1) that analyzes monitoring alerts, retrieves live cluster context via MCP, and generates executable remediation scripts.

Awesome Lists containing this project

README

          

# 01Agents: Kubernetes Alert Remediation System

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![React Router v7](https://img.shields.io/badge/React_Router-v7-green.svg)](https://reactrouter.com/)
[![LangGraph](https://img.shields.io/badge/LangGraph-1.0.5-green.svg)](https://langchain.com/langgraph)
[![MCP](https://img.shields.io/badge/MCP-1.24.0-orange.svg)](https://modelcontextprotocol.io/)

A Kubernetes alert remediation system with a modern web interface and intelligent remediation agent. This platform analyzes monitoring alerts, fetches live cluster context, and generates executable remediation scripts through LLM‑powered workflows.

## 🏗️ Architecture Overview

The system follows a two‑tier hierarchical architecture:

```
Monitoring Alerts → L0 Client (UI) → L1 Remediation Agent → Remediation Scripts
```

### **L0 Client: React Router v7 Web Interface**
- **Role**: Modern web interface for alert management and visualization.
- **Port**: Default `3000`.

### **L1: Kubernetes Alert Remediation Agent**
- **Role**: LangGraph‑powered remediation specialist using specialized subagents.
- **Port**: Default `10001`.

---

## 🚀 Quick Start

### Clone the Repository

```bash
git clone git@github.com:BerryBytes/01agent.git
cd 01agent
```

### Choose Your Deployment Approach

- **[Approach 1: Using Pre-built Images](#approach-1-using-pre-built-images)** (Recommended for most users)
- **[Approach 2: Building Custom Images](#approach-2-building-custom-images)** (For developers modifying source code)

---

## Approach 1: Using Pre-built Images

Deploy using ready-to-use images from the `01community` registry. Perfect for quick setup and production use.

### Prerequisites

- **Kubernetes cluster** (Kind, Minikube, or Cloud)
- **Helm 3+**
- **LLM API Key** (OpenRouter, DeepSeek, Google, or Anthropic)

### Step 1: Install MCP Server for Kubernetes

The MCP (Model Context Protocol) Server provides Kubernetes tools for the agents.

#### Clone the MCP Server repository

```bash
git clone https://github.com/Flux159/mcp-server-kubernetes.git
```

#### Update the schema (Required)

```bash
python3 -c "
import json
with open('mcp-server-kubernetes/helm-chart/values.schema.json') as f:
schema = json.load(f)

schema['properties']['observability'] = {
'type': 'object',
'additionalProperties': True
}

with open('mcp-server-kubernetes/helm-chart/values.schema.json', 'w') as f:
json.dump(schema, f, indent=2)
print('Schema updated successfully!')
"
```

#### Install the MCP Server

```bash
helm install mcp-server ./mcp-server-kubernetes/helm-chart \
--set kubeconfig.provider=serviceaccount \
--set transport.mode=http \
--set transport.service.type=ClusterIP \
--set security.allowOnlyNonDestructive=false \
--create-namespace \
--namespace mcp-system
```

### Step 2: Install PostgreSQL Operator

The agents use PostgreSQL for long-term memory. We use the CrunchyData Operator to manage it.

#### Install OLM (Operator Lifecycle Manager)

```bash
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.40.0/install.sh | bash -s v0.40.0
```

> [!IMPORTANT]
> OLM installation can take a few minutes. Please ensure all OLM pods in the `olm` namespace are in the **Running** state before proceeding.

#### Install the PostgreSQL Operator

```bash
kubectl create -f https://operatorhub.io/install/postgresql.yaml
```

### Step 3: Configure Agent Settings

Edit `helm-chart/values.yaml` to configure your LLM provider and API key.
You must set the `MODEL_PROVIDER`, `MODEL_NAME` and provide the corresponding API key in the `secret` section.

```yaml
agents:
- name: l0
enabled: true
image: 01community/agent-l0:v1

- name: l1
enabled: true
image: 01community/agent-l1:v1
env:
MODEL_PROVIDER: deepseek # options: gemini, openai, openrouter, anthropic, deepseek
MODEL_NAME: deepseek-chat # examples: gemini-2.0-flash, gpt-4o, claude-3-5-sonnet
MCP_SERVER_URL: http://mcp-server-mcp-server-kubernetes.mcp-system.svc.cluster.local:3001/mcp
ENABLE_K8S_TOOLS: "true"
STM_ENABLE_POSTGRES: "true"
usePostgresql: true
secret:
DEEPSEEK_API_KEY: "your-api-key-here"
# GOOGLE_API_KEY: "your-api-key"
# OPENAI_API_KEY: "your-api-key"
# OPENROUTER_API_KEY: "your-api-key"
# ANTHROPIC_API_KEY: "your-api-key"
```

### Step 4: Deploy 01Agents

```bash
helm upgrade --install 01agent ./helm-chart -n 01cloud --create-namespace
```
> [!IMPORTANT]
> Ensure all the pods in the namespace have initialized successfully and running

### Step 5: Access the Application

Port-forward the UI service:

```bash
kubectl port-forward svc/agent-l0 -n 01cloud 3000:3000
```

Open your browser and navigate to `http://localhost:3000`

### Step 6: Monitoring & Observability (Optional)

For advanced monitoring with Grafana, Loki, Tempo, and OpenTelemetry, see [OTEL-setup.md](./OTEL-setup.md). These features are **disabled by default** and should only be enabled if you have the observability stack configured.

---

## Approach 2: Building Custom Images

Build and deploy your own modified images from source code. Ideal for developers customizing the system.

### Prerequisites

- **Kubernetes cluster** (Kind, Minikube, or Cloud)
- **Helm 3+**
- **Docker** (for building images)
- **Node.js 25+** & **npm** (for L0 Client modifications)
- **Python 3.11+** (for L1 Agent modifications)
- **LLM API Key** (OpenRouter, DeepSeek, Google, or Anthropic)

### Step 1: Install MCP Server for Kubernetes

The MCP (Model Context Protocol) Server provides Kubernetes tools for the agents.

#### Clone the MCP Server repository

```bash
git clone https://github.com/Flux159/mcp-server-kubernetes.git
```

#### Update the schema (Required)

```bash
python3 -c "
import json
with open('mcp-server-kubernetes/helm-chart/values.schema.json') as f:
schema = json.load(f)

schema['properties']['observability'] = {
'type': 'object',
'additionalProperties': True
}

with open('mcp-server-kubernetes/helm-chart/values.schema.json', 'w') as f:
json.dump(schema, f, indent=2)
print('Schema updated successfully!')
"
```

#### Install the MCP Server

```bash
helm install mcp-server ./mcp-server-kubernetes/helm-chart \
--set kubeconfig.provider=serviceaccount \
--set transport.mode=http \
--set transport.service.type=ClusterIP \
--set security.allowOnlyNonDestructive=false \
--create-namespace \
--namespace mcp-system
```

### Step 2: Install PostgreSQL Operator

The agents use PostgreSQL for long-term memory. We use the CrunchyData Operator to manage it.

#### Install OLM (Operator Lifecycle Manager)

```bash
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.40.0/install.sh | bash -s v0.40.0
```

> [!IMPORTANT]
> OLM installation can take a few minutes. Please ensure all OLM pods in the `olm` namespace are in the **Running** state before proceeding.

#### Install the PostgreSQL Operator

```bash
kubectl create -f https://operatorhub.io/install/postgresql.yaml
```

### Step 3: Build Custom Images

Navigate to the repository root and build your images:

#### Build L0 Frontend

```bash
docker build -t your-registry/agent-l0:latest ./k8s-agent/level-0-agent
```

#### Build L1 Backend Agent

```bash
docker build -t your-registry/agent-l1:latest ./k8s-agent/level-1-agent
```

### Step 4: Push or Load Images

**For remote clusters** - Push to your registry:

```bash
docker push your-registry/agent-l0:latest
docker push your-registry/agent-l1:latest
```

**For local Kind clusters** - Load images directly:

```bash
kind load docker-image your-registry/agent-l0:latest --name 01cloud-cluster
kind load docker-image your-registry/agent-l1:latest --name 01cloud-cluster
```

### Step 5: Configure Agent Settings

Edit `helm-chart/values.yaml` to use your custom images and configure your LLM provider.
You must set the `MODEL_PROVIDER`, `MODEL_NAME`, `image` and provide the corresponding API key in the `secret` section.

```yaml
agents:
- name: l0
enabled: true
image: your-registry/agent-l0:latest # Your custom image

- name: l1
enabled: true
image: your-registry/agent-l1:latest # Your custom image
env:
MODEL_PROVIDER: deepseek # options: gemini, openai, openrouter, anthropic, deepseek
MODEL_NAME: deepseek-chat # examples: gemini-2.0-flash, gpt-4o, claude-3-5-sonnet
MCP_SERVER_URL: http://mcp-server-mcp-server-kubernetes.mcp-system.svc.cluster.local:3001/mcp
ENABLE_K8S_TOOLS: "true"
STM_ENABLE_POSTGRES: "true"
usePostgresql: true
secret:
DEEPSEEK_API_KEY: "your-api-key-here"
# GOOGLE_API_KEY: "your-api-key"
# OPENAI_API_KEY: "your-api-key"
# OPENROUTER_API_KEY: "your-api-key"
# ANTHROPIC_API_KEY: "your-api-key"
```

### Step 6: Deploy 01Agents

```bash
helm upgrade --install 01agent ./helm-chart -n 01cloud --create-namespace
```
> [!IMPORTANT]
> Ensure all the pods in the namespace have initialized successfully and running

### Step 7: Access the Application

Port-forward the UI service:

```bash
kubectl port-forward svc/agent-l0 -n 01cloud 3000:3000
```

Open your browser and navigate to `http://localhost:3000`

### Step 8: Monitoring & Observability (Optional)

For advanced monitoring with Grafana, Loki, Tempo, and OpenTelemetry, see [OTEL-setup.md](./OTEL-setup.md).

---

## 🔌 Local Development Tools

Access these services via port-forwarding for development and debugging:

```bash
# PostgreSQL Database
kubectl port-forward svc/agents-primary -n 01cloud 5432:5432

# MCP Server
kubectl port-forward svc/mcp-server-mcp-server-kubernetes -n mcp-system 3001:3001
```

---

## 📁 Repository Structure

- `helm-chart/`: Core Helm chart for deployment
- `k8s-agent/`: Source code for L0 and L1 agents
- `OTEL-setup.md`: Optional guide for monitoring (Prometheus, Grafana, Loki)

---

## 🤝 Contributing

We welcome contributions from the community! Whether you are reporting a bug, suggesting a feature, or submitting a pull request, your help is appreciated.

- **Found a bug?** [Open an issue](https://github.com/BerryBytes/01agent/issues/new?template=bug_report.md)
- **Have a feature idea?** [Suggest it here](https://github.com/BerryBytes/01agent/issues/new?template=feature_request.md)
- **Want to contribute code?** Check out our [Contributing Guidelines](./CONTRIBUTING.md)

This project is maintained by [Bishal Singh (@bsalsingh)](https://github.com/bsalsingh) and the 01Cloud community.

---

## 📜 Code of Conduct

To ensure a welcoming and inclusive community, please review and follow our [Code of Conduct](./CODE_OF_CONDUCT.md).

---

## ⚖️ License

This project is licensed under the [MIT License](./LICENSE.md).

---

## 📚 Component Documentation

- [L0 Client (Frontend)](k8s-agent/level-0-agent/README.md)
- [L1 Agent (Backend)](k8s-agent/level-1-agent/README.md)