https://github.com/ebwi11/agentsmith-hub
Enterprise Security Data Pipeline Platform (SDPP) with Integrated Real-Time Threat Detection Engine
https://github.com/ebwi11/agentsmith-hub
cybersecurity detection-engine rules-engine sdpp security-data-pipeline-platform
Last synced: 4 days ago
JSON representation
Enterprise Security Data Pipeline Platform (SDPP) with Integrated Real-Time Threat Detection Engine
- Host: GitHub
- URL: https://github.com/ebwi11/agentsmith-hub
- Owner: EBWi11
- License: apache-2.0
- Created: 2025-05-14T14:25:32.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-21T08:21:35.000Z (8 months ago)
- Last Synced: 2025-08-21T10:31:12.854Z (8 months ago)
- Topics: cybersecurity, detection-engine, rules-engine, sdpp, security-data-pipeline-platform
- Language: Go
- Homepage:
- Size: 76.4 MB
- Stars: 49
- Watchers: 2
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AgentSmith-HUB
[](https://github.com/EBWi11/AgentSmith-HUB/releases)
[](./LICENSE)
**A high-performance security data pipeline with a real-time rules engine and deeply integrated LLM agents — built for modern SOC and detection engineering teams.**
Process, enrich, detect, and respond at scale — with simple XML-based rules, CEP, rich plugins, and AI-powered analysis wired directly into the stream.

---
## Why AgentSmith-HUB?
If you work in security operations, you probably deal with massive volumes of raw logs and alerts every day. You need to normalize, enrich, correlate, and route them — and ideally detect threats in real time, not in batch jobs. AgentSmith-HUB is built to handle all of this in a single, opinionated platform:
- **High-signal detections, not dashboards** — Design real-time detections and data transformations with simple, readable XML rules instead of ad‑hoc scripts
- **Blazing fast at scale** — 3.90M messages/sec on just 2 vCPUs ([benchmark](docs/performance-testing-report.md)); built to sit directly in front of your SIEM / lake
- **All-in-one pipeline** — Input, normalization, enrichment, correlation, and output in one flow; no more glue scripts between Kafka, ES, ClickHouse, and “rule engines”
- **First-class CEP** — Detect ordered event sequences, absence patterns, and multi-source correlations over time with ``, ``, ``, and ``
- **LLM agents in the stream** — Drop LLM-powered agents into the same pipeline for alert triage, enrichment, rule authoring, and auto-whitelisting
- **Comment-to-memory learning loop** — Convert reviewer comments from Agent Tools Logs into durable `memory_notes`, auto-commit updates, and continuously improve agent behavior
- **Skills system** — Attach knowledge bases and operational tools to agents via Skills, with progressive disclosure so prompts stay small and fast
- **Rich plugin ecosystem** — Threat intel (VirusTotal, ThreatBook, Shodan), GeoIP, encoding, regex, time/window helpers, LLM calls, and more
- **Production features out of the box** — Cluster mode, health checks, daily stats, sample data, Push Changes / review workflow, and a modern Web UI for rule and project orchestration
### Who is this for?
- **SOC / CERT / CSIRT teams** that want an opinionated place to run detections, triage alerts, and reduce false positives without building their own engine from scratch.
- **Detection engineers / threat hunters** who care about CEP, thresholds, and precise control over when an alert fires (and when it must not).
- **Security platform / data teams** who already own Kafka / ES / ClickHouse and want a thin, fast, open platform to orchestrate security data flows and LLM-powered analysis.
## How It Works
AgentSmith-HUB uses a straightforward pipeline model:
```
INPUT (Kafka / SLS / ...) → RULESET / AGENT → RULESET / AGENT → OUTPUT (Kafka / ES / ClickHouse / SLS / ...)
```
Rulesets and agents can be freely chained within a **Project**, giving you full control over data flow and allowing you to mix “hard” rules with “soft” LLM judgement in the same stream:

### Core Components at a Glance
- **INPUT**: Connects to streaming sources like **Kafka**, Aliyun **SLS**, and cloud-managed Kafka variants; supports Grok parsing and JSON so you normalize once and reuse everywhere.
- **RULESET**: XML-based real-time rules engine with checks, checklists, thresholds (count / SUM / CLASSIFY), CEP sequences, iterators, and data append/modify/del — all executed strictly in the order you write them.
- **AGENT**: LLM-powered node that runs in the same pipeline as rulesets; for each event it can call an LLM (with tools and skills) to score, enrich, or auto-generate rules/whitelists, then forward the enriched event downstream.
- **OUTPUT**: Sends processed data to **Kafka**, **Elasticsearch** (v7/v8/v9), **ClickHouse**, or simple print, with batching, time-based flush, TLS/auth, and idempotent Kafka producers for safe delivery.
- **SKILL**: Reusable capability module for agents — knowledge skills provide on‑demand reference content, builtin skills expose Go-implemented tools like `hub_ruleset_editor` for ruleset CRUD.
- **PLUGIN**: Extensible function system powering checks, enrichment, and actions: GeoIP, URL parsing, encoding, time window helpers, threat intelligence lookups, single-shot LLM calls, and more — all composable directly in rules.
### Web UI & API Highlights
- **Visual rule and project editing**: Rich browser UI for editing rulesets with syntax help, validation, and GIF-level feedback; drag-style project orchestration to define `INPUT → RULESET / AGENT → OUTPUT` flows.
- **One-click testing everywhere**: Built-in test runners for **Output**, **Ruleset**, **Plugin**, **Agent**, and **Project** components (including sample data capture), so you can validate changes before they hit real outputs.
- **Operations, errors, and cluster view**: Dedicated views for error logs, operations history (project start/stop/restart, config changes, agent tool calls), and basic cluster status so you can see what is running where.
- **Safe change management**: All edits go through temporary configs, diff & review, and then **Push Changes** to apply — the platform automatically figures out affected projects and restarts them safely.
- **HTTP API for automation**: JSON APIs mirror the UI capabilities (component CRUD, project lifecycle, testing), so you can integrate AgentSmith-HUB into CI/CD, internal portals, or automation scripts.
### Rules Engine in 60 Seconds
At the heart of AgentSmith-HUB is a streaming rules engine designed for security detections:
- **Checks & checklists**: Match on strings, numbers, regex, and plugins; combine conditions with AND/OR/NOT using logical expressions.
- **Thresholds & windows**: Detect frequency, sums, or distinct counts over sliding time windows (e.g. brute-force, spray, exfil).
- **CEP sequences**: Express ordered multi-event patterns and absence (e.g. `login -> !mfa`, `recon -> exploit -> exfil`) with ``.
- **Data shaping**: Enrich, modify, or delete fields in place, and call plugins to pull in external context or compute derived fields.
A minimal example that enriches with threat intel and then detects on the enriched field:
```xml
threatbook(src_ip)
high
critical
```
For the full syntax (all operations, modes, and best practices), see the [Complete Guide](docs/agentsmith-hub-guide.md).
### LLM Agents & Skills
Agents are LLM-powered components that sit in the pipeline alongside rulesets. They process events independently, call an LLM with tool-use support, and forward enriched results downstream.
```yaml
# Agent: AI-powered alert triage
model: gpt-4o-mini
system_prompt: |
For each alert, add llm_confidence (0-1) and llm_analysis fields.
skills:
- hub_ruleset_expert # Knowledge skill: rules engine reference
tools: all # Expose all plugins as LLM tools
max_rounds: 3
timeout: 30s
# Optional long-term memory (recommended as YAML sequence)
memory_notes:
- Keep output JSON compact and stable.
- Treat routine CI scanner traffic as lower priority unless other signals exist.
```
**Skills** provide modular capabilities to agents:
- **Knowledge skills** — Reference docs loaded on-demand (progressive disclosure)
- **Builtin skills** — Go-implemented tools (e.g., `hub_ruleset_editor` for reading/writing rulesets)
Quick production tips:
- Prefer `tools: []` by default and allowlist only needed plugin tools.
- Use `tools: all` only for broad assistant agents (rule-authoring / deep triage).
- In cluster mode, memory write/generate actions must go to the **leader** node.
Use agents in your project like any other component:
```yaml
content: |
INPUT.kafka_alerts -> AGENT.alert_reviewer
AGENT.alert_reviewer -> OUTPUT.enriched_alerts
```
For full agent details (fields like `reasoning_mode`, `reasoning_budget_tokens`, `memory_notes`, and memory workflow in UI/API), see the [Complete Guide](docs/agentsmith-hub-guide.md#14-agent-syntax-description).
## Built-in Detection Rulesets
AgentSmith-HUB ships with production-ready detection rulesets that you can deploy immediately — no rule-writing required. All rules are mapped to [MITRE ATT&CK](https://attack.mitre.org/) for seamless integration with your security workflows.
### Built-in K8s Ruleset Files
AgentSmith-HUB includes Kubernetes security rulesets out of the box. You can use them directly without writing custom XML first:
- `config/ruleset/k8s_security/k8s_audit_baseline.xml`
- `config/ruleset/k8s_security/k8s_audit_intrusion.xml`
Recommended onboarding flow:
1. Import both built-in rulesets.
2. Route Kubernetes audit logs to these rulesets in your Project.
3. Verify detections in test mode with real sample events.
4. Tune thresholds (if needed) for your cluster's normal behavior.
### Sysmon Endpoint Security (Windows)
Two Sysmon rulesets are provided for medium/high-confidence endpoint detection use cases:
- `config/ruleset/sysmon_security/sysmon_baseline.xml`
- `config/ruleset/sysmon_security/sysmon_intrusion.xml`
- `config/ruleset/sysmon_security/sysmon_exclude.xml` (strict allowlist template)
Recommended onboarding flow for Sysmon:
1. Ensure your input normalizes core Sysmon fields used by rulesets.
2. Import `sysmon_baseline.xml` first and validate behavior in test mode.
3. Import `sysmon_intrusion.xml` and tune based on your endpoint baseline.
4. Add environment-specific allowlists with a separate EXCLUDE ruleset if needed.
More built-in rulesets for additional data sources are on the roadmap. Contributions are welcome!
## Features at a Glance
**Rule Editing**

**Rule Testing**

**Project Orchestration**

**Plugin Testing**

**Input Connection Check**

**Search**

**Error Logs & Operations History**

**Comment-to-memory learning loop**

## Deployment
1. Download and extract the release archive to `/opt/agentsmith-hub`
2. Copy the config folder: `cp -r /opt/agentsmith-hub/config /opt/hub_config`
3. Configure Redis in `/opt/hub_config/config.yaml`
4. Start the service:
```bash
# Leader mode (default)
./start.sh
# Follower mode (uses the same Redis as leader)
./start.sh --follower
# See all options
./start.sh --help
```
5. Access token is generated at `/etc/hub/.token` on first run
6. Install and configure Nginx:
```bash
sudo cp /opt/agentsmith-hub/nginx/nginx.conf /etc/nginx/
sudo nginx -s reload
```
7. Open `http://your-host` in your browser (port 80)
## Documentation
- [Complete Guide](docs/agentsmith-hub-guide.md) | [Guide (Chinese)](docs/agentsmith-hub-guide-zh.md)
- [Performance Testing Report](docs/performance-testing-report.md)
## License
AgentSmith-HUB is licensed under the [Apache License 2.0](./LICENSE) with the Commons Clause restriction.
You are free to use, modify, and deploy this software — the restriction only prevents selling the software itself as a commercial product or service. Internal enterprise use is fully permitted.