https://github.com/boshu2/12-factor-agentops
DevOps + SRE principles for operating LLM applications reliably at scale. Complementary to 12-Factor Agents for building
https://github.com/boshu2/12-factor-agentops
12-factor agent-orchestration agentops agents ai-agents ai-agents-framework ai-operations argocd context-engineering devops flux gitops infrastructure-as-code kubernetes kyverno llm openshift platform-engineering production-operations sre
Last synced: 2 months ago
JSON representation
DevOps + SRE principles for operating LLM applications reliably at scale. Complementary to 12-Factor Agents for building
- Host: GitHub
- URL: https://github.com/boshu2/12-factor-agentops
- Owner: boshu2
- License: other
- Created: 2025-11-04T17:49:10.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-11-04T18:02:08.000Z (2 months ago)
- Last Synced: 2025-11-04T20:10:15.095Z (2 months ago)
- Topics: 12-factor, agent-orchestration, agentops, agents, ai-agents, ai-agents-framework, ai-operations, argocd, context-engineering, devops, flux, gitops, infrastructure-as-code, kubernetes, kyverno, llm, openshift, platform-engineering, production-operations, sre
- Language: Makefile
- Homepage:
- Size: 143 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# 12-Factor AgentOps
---
> [!IMPORTANT]
> **Status: Alpha** - Patterns proven at production scale in federal infrastructure. Now validating generalization across domains.
>
> **Looking for Context Engineering?** See [12-Factor Agents - Factor 3](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-03-own-your-context-window.md) by [@dexhorthy](https://github.com/dexhorthy)
---
## The Intersection
**I build GPU/HPC platforms that enable AI workloads.**
**I use AI agents to automate infrastructure operations.**
**I operate both at production scale in federal, security-hardened environments.**
This framework documents operational patterns from both sides of the AI equation.
---
## The Problem
Everyone's building AI agents. Nobody's figured out how to operate them reliably.
- **Week 1:** "This is amazing!"
- **Week 4:** Errors piling up
- **Week 8:** Back to manual work
Sound familiar? **It's 2015 microservices chaos all over again.**
We know how to build reliable infrastructure. We know how to build reliable software.
**But operating AI agents in production? We're still figuring that out.**
---
## What This Is
Platform engineer with 10+ years climbing the IT stack—systems, networking, storage, security, platforms, automation.
**Current work:**
- Building GPU/HPC infrastructure for AI inference/training workloads (20+ production clusters)
- Using AI agents to automate platform operations (GitOps validation, runbooks, policy)
- Operating in mission-critical, multi-tenant, federal environments
**12-Factor AgentOps = Meta-patterns extracted from real production work at this intersection.**
---
## Why This Perspective Matters
Most people have **ONE** of these:
- Infrastructure ops (no AI exposure)
- AI/ML engineering (no infrastructure ops)
- AI agent users (no production operations)
**This framework comes from having ALL THREE:**
1. Building platforms **FOR** AI workloads
2. Using AI **TO BUILD** platforms
3. Operating both at **PRODUCTION SCALE** in **HIGH-STAKES** environments
---
## The Approach
```
Production Operations → Extract Patterns → Document → Validate → Refine
↓ ↓ ↓ ↓ ↓
(What works?) (Why works?) (Share it) (Test it) (Improve it)
```
1. **Document patterns** proven at production scale
2. **Extract meta-patterns** that generalize across contexts
3. **Share early**, validate with community
4. **Refine** based on diverse implementations
**Not theory. Production.**
---
## The Invitation
If you're at a similar intersection:
- Operating AI/ML infrastructure at scale
- Using AI agents for DevOps/SRE work
- Building platforms in constrained environments
**Try these patterns. Share what works in your context. Help prove whether operational discipline transfers.**
---
## Framework: The Factors
The 12 factors are being published as they're validated for generalization.
### Coming Soon
| Factor | Focus | Status |
|--------|-------|--------|
| **I: Git as Knowledge OS** | Commits = memory, branches = isolation, logs = audit trail | Documenting |
| **II: Context Engineering** | JIT loading, 40% rule, progressive disclosure | Documenting |
| **III: Small Specialized Agents** | Single responsibility, composable workflows | Documenting |
| **IV: Validation Gates** | Test before deploy, fail fast, rollback easy | Planned |
| **V: Observability** | Metrics, logs, traces for agent operations | Planned |
| **VI: Session Continuity** | Pause/resume, state preservation, recovery | Planned |
> [!TIP]
> **Subscribe to releases** to get notified when factors are published
---
## Background
**Platform Engineer**
- 10+ years: Systems → Networks → Security → Platforms → Automation → AI
- 20+ production Kubernetes clusters in federal/DoD environments
- GPU/HPC infrastructure for AI inference/training
- AI-assisted infrastructure operations (GitOps, observability, compliance)
**Unfair advantage:** Deep ops + automation + AI fluency + cultural translation
---
## Contributing
Early-stage documentation of production patterns.
**Want to help?**
- ✅ Implement patterns in your context
- ✅ Share results (successes AND failures)
- ✅ Suggest adaptations for your domain
- ✅ Challenge assumptions constructively
See [CLAUDE.md](CLAUDE.md) for AgentOps principles and contribution guidelines.
---
## Attribution & Inspiration
This framework builds on foundational work from:
### [12-Factor Apps](https://12factor.net) (Heroku)
The original methodology for building software-as-a-service apps. Established principles for:
- Configuration management
- Dependency isolation
- Stateless processes
- Environment parity
**Their insight:** Operational discipline makes applications reliable and portable.
### [12-Factor Agents](https://github.com/humanlayer/12-factor-agents) (Dex Horthy, HumanLayer)
Framework for building reliable LLM applications. Pioneered:
- Context engineering principles
- Human-in-the-loop patterns
- Agent reliability practices
- Production-grade AI systems
**Their insight:** AI agents need the same rigor as traditional software.
### This Project's Focus
**12-Factor AgentOps** extends these foundations to **operations**:
- Not just building reliable agents (12-Factor Agents covers this)
- Not just building reliable apps (12-Factor Apps covers this)
- **Operating AI agents and infrastructure at production scale**
We document patterns from the intersection: infrastructure FOR AI + AI FOR infrastructure.
---
## Related Work
**If you're building AI agents, read these first:**
- [12-Factor Agents](https://github.com/humanlayer/12-factor-agents) by [@dexhorthy](https://github.com/dexhorthy) - Building reliable LLM applications
- [Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) by Anthropic - Agent design patterns
- [The Outer Loop](https://theouterloop.substack.com) by Dex Horthy - AI agent development insights
**If you're operating infrastructure, you know these:**
- [12-Factor Apps](https://12factor.net) - SaaS application methodology
- [Site Reliability Engineering](https://sre.google/books/) - Google's SRE practices
- [DevOps Handbook](https://itrevolution.com/product/the-devops-handbook-second-edition/) - DevOps principles
**This framework sits at the intersection.**
---
## License
Code: [Apache 2.0 License](LICENSE) (permissive, use freely)
Documentation: [CC BY-SA 4.0 License](LICENSE) (share alike, attribute)
Full license text: [LICENSE](LICENSE)
---
**Let's make AI agents as reliable as the infrastructure they run on.**
*Patterns proven at production scale in federal infrastructure. Validating generalization across domains.*