https://github.com/nirdiamant/moltbook-agent-guard
Real-time security for AI agents on Moltbook
https://github.com/nirdiamant/moltbook-agent-guard
ai-agents llm moltbook prompt-injection security
Last synced: 5 months ago
JSON representation
Real-time security for AI agents on Moltbook
- Host: GitHub
- URL: https://github.com/nirdiamant/moltbook-agent-guard
- Owner: NirDiamant
- License: apache-2.0
- Created: 2026-02-04T15:02:13.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-02-04T15:05:48.000Z (5 months ago)
- Last Synced: 2026-02-10T00:37:24.101Z (5 months ago)
- Topics: ai-agents, llm, moltbook, prompt-injection, security
- Language: Python
- Size: 10.7 MB
- Stars: 42
- Watchers: 0
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
Moltbook Agent Guard
Real-time security for AI agents on Moltbook
---
## 📫 Stay Updated
Join 50,000+ AI enthusiasts for cutting-edge insights and tutorials
[](https://diamantai.substack.com/?r=336pe4&utm_campaign=pub-share-checklist)
---
## Why This Toolkit?
[Moltbook](https://moltbook.com) is the world's largest social network for AI agents (770K+ agents). Research shows **2.6% of posts contain prompt injection attacks** targeting vulnerable agents.
This toolkit protects your agent from hijacking, credential theft, and manipulation.
Real threats detected and blocked on Moltbook
### Attacks Blocked
| Attack Type | Risk |
|-------------|------|
| Jailbreak attempts | 🔴 High |
| Credential extraction | 🔴 High |
| Data exfiltration | 🔴 High |
| System prompt extraction | 🔴 High |
| Role hijacking | 🟡 Medium |
| Encoded payloads | 🟡 Medium |
### How It Works
When your agent runs, the security scanner protects it **in real-time**:
```python
# Inside the agent runtime (tools/agent/runtime.py)
def _process_post(self, post):
is_safe, scan_result = self._scan_content(post.content) # Every post is scanned
if not is_safe:
return None # Malicious content never reaches your LLM
# ... process safe content
```
**Without this toolkit:** Your agent processes malicious posts and risks leaking API keys or getting hijacked.
**With this toolkit:** Threats are detected and blocked before they ever reach your LLM.
---
## Quick Start
```bash
git clone https://github.com/NirDiamant/moltbook-agent-toolkit.git
cd moltbook-agent-toolkit
pip install -r requirements.txt
# Setup (interactive wizard)
export MOLTBOOK_API_KEY="your_key"
export ANTHROPIC_API_KEY="your_key"
./moltbook init
# Deploy
./moltbook deploy --direct
```
---
## Security Dashboard
Deploy your own dashboard to track threats in real-time.
**Streamlit Cloud (Free, 2 min):**
1. Fork this repo
2. Go to [share.streamlit.io](https://share.streamlit.io)
3. Set app path: `dashboard/streamlit_app.py`
4. Add secret: `MOLTBOOK_API_KEY = "your_key"`
5. Deploy
**Local:**
```bash
MOLTBOOK_API_KEY="your_key" streamlit run dashboard/streamlit_app.py
```
---
## CLI Commands
```bash
./moltbook init # Setup wizard
./moltbook deploy # Deploy agent
./moltbook security # View security incidents
./moltbook security --scan # Scan for threats
./moltbook security --html report.html # Export report
```
---
## Security Modules
24 modules across 6 protection layers:
- **Critical**: Output scanner, error sanitizer, log redactor
- **AI Firewall**: Llama Guard + LLM Guard + pattern matching
- **Platform**: Memory sanitizer, egress firewall, credential monitor
- **Social**: Social engineering detection, reputation protection
- **Data**: Exfiltration prevention, financial safety
- **Infrastructure**: Docker isolation (cap_drop ALL, read-only fs)
```python
from tools.security import SecurityManager
security = SecurityManager(level="standard")
result = security.scan_input(user_content)
if result.blocked:
print(f"Blocked: {result.reason}")
```
---
## Related Projects
- **[Agents Towards Production](https://github.com/NirDiamant/agents-towards-production)** — Production-grade GenAI agent tutorials
- **[GenAI Agents](https://github.com/NirDiamant/GenAI_Agents)** — AI agent implementations from simple to complex
- **[RAG Techniques](https://github.com/NirDiamant/RAG_Techniques)** — Comprehensive RAG guide
- **[Prompt Engineering](https://github.com/NirDiamant/Prompt_Engineering)** — Prompting strategies collection
---
## License
Apache 2.0 — see [LICENSE](LICENSE)
---
## Disclaimer
This toolkit is built in good faith with a genuine desire to help developers secure their AI agents. However, security is an ongoing battle — every lock has someone trying to pick it.
**We cannot guarantee this will stop all attacks.** Attackers evolve, new techniques emerge, and no security solution is bulletproof. This toolkit raises the bar significantly, but determined adversaries may still find ways through.
Use this as one layer in your security strategy, not your only defense. Stay vigilant, keep your dependencies updated, and monitor your agents in production.
By using this software, you accept that the authors are not liable for any security incidents, damages, or losses that may occur.