https://github.com/sunitj/biocurator
Memory-augmented multi-agent system for scientific literature curation and analysis.
https://github.com/sunitj/biocurator
cagent curation docker literature-review multi-agent-systems
Last synced: 13 days ago
JSON representation
Memory-augmented multi-agent system for scientific literature curation and analysis.
- Host: GitHub
- URL: https://github.com/sunitj/biocurator
- Owner: sunitj
- Created: 2025-09-20T17:48:50.000Z (17 days ago)
- Default Branch: main
- Last Pushed: 2025-09-22T19:55:36.000Z (15 days ago)
- Last Synced: 2025-09-22T21:21:30.386Z (15 days ago)
- Topics: cagent, curation, docker, literature-review, multi-agent-systems
- Language: Python
- Homepage:
- Size: 90.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
# BioCurator
Memory-augmented multi-agent system for scientific literature curation and analysis.
## Overview
BioCurator demonstrates how AI agents can develop domain expertise through collaborative literature analysis, using a sophisticated multi-modal memory architecture and safety-first development approach.
## Quick Start
### Development Mode (Local Models - Zero Cost)
```bash
# Set up development environment with UV
export UV_LINK_MODE=copy
./scripts/setup_venv.sh
source .venv/bin/activate
export APP_MODE=development# Configure environment (optional - uses defaults if not set)
cp .env.example .env
# Edit .env to set JUPYTER_TOKEN and other configurations# Run with local models (Ollama) - services start in dependency order
docker-compose -f docker-compose.yml -f docker-compose.development.yml up -d# Wait for all services to be healthy (takes ~30-60 seconds)
docker-compose ps # Check status# Access services (replace localhost with your server IP if remote):
# - BioCurator API: http://localhost:8080/
# - Health Status: http://localhost:8080/health/
# - Neo4j Browser: http://localhost:7474/ (user: neo4j, password: dev_password)
# - Jupyter Lab: http://localhost:8888/ (token: biocurator-dev or JUPYTER_TOKEN)
# - Ollama API: http://localhost:11434/# Verify system health
curl -s http://localhost:8080/health/ | python -m json.tool
```### Production Mode (Cloud Models)
```bash
# Set up production environment with UV
./scripts/setup_venv.sh
source .venv/bin/activate
export APP_MODE=production# Run with cloud models
docker-compose -f docker-compose.yml -f docker-compose.production.yml up
```## Architecture
```text
┌─────────────────────────────────────────────────────────┐
│ Agent Orchestra │
├─────────────────────────────────────────────────────────┤
│ Research Literature Deep Domain Knowledge │
│ Director Scout Reader Specialist Weaver │
└────────────┬────────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────────┐
│ Safety Controls │
│ Circuit Breakers │ Rate Limiting │ Cost Tracking │
└────────────┬────────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────────┐
│ Memory Systems │
│ Neo4j │ Qdrant │ PostgreSQL │ Redis │ SQLite │
└──────────────────────────────────────────────────────────┘
```## Key Features
- **Multi-Agent Coordination**: Specialized agents for literature discovery, analysis, and synthesis
- Research Director for workflow orchestration
- Literature Scout, Deep Reader, Domain Specialist, Knowledge Weaver (future PRs)
- Async message passing with request/response patterns
- Persistent task queue with dependency management and retry logic- **Multi-Modal Memory**: Knowledge graph, vector embeddings, episodic memory, and procedural patterns
- Neo4j knowledge graph with concept relationships
- Qdrant vector store for semantic search
- PostgreSQL episodic memory for interaction histories
- Redis working memory for active contexts
- InfluxDB time-series metrics (optional)- **Safety-First Design**: Circuit breakers, rate limiting, cost tracking, and anomaly detection
- Per-agent circuit breakers with configurable thresholds
- Rate limiting with token bucket algorithm
- Real-time cost tracking and budget enforcement
- Behavior monitoring with anomaly detection
- Comprehensive safety event logging- **Development Mode**: Free local model operation with Ollama (DeepSeek-R1, Llama 3.1, Qwen 2.5)
- Zero cost budget enforcement
- Hard guard against cloud model access
- Local model optimization with quality bridging- **Production Ready**: Cloud model integration with comprehensive monitoring and observability
- Claude Sonnet 4 and GPT-4o model support
- Prometheus metrics integration
- Health monitoring with agent status reporting
- Auto-scaling and load balancing capabilities## Development
### Requirements
- Python 3.11+
- [UV package manager](https://docs.astral.sh/uv/) (installed automatically by setup script)
- Docker and Docker Compose### Setup
```bash
# Automated setup with UV
./scripts/setup_venv.sh
source .venv/bin/activate# Manual setup alternative
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[dev]"
```### Common Commands
```bash
# Run tests
make test# Run linting
make lint# Format code
make format# Build containers
make build# View metrics
curl http://localhost:9090/metrics# Check health (includes agent status)
curl http://localhost:8080/health# Run agent workflow examples
python examples/basic_workflow.py # Basic multi-agent workflow
python examples/safety_demo.py # Safety controls demonstration# Agent system health
curl http://localhost:8080/health | jq '.components[] | select(.name | startswith("agent"))'
```## Documentation
- [Architecture Decision Records](docs/adr/)
- [API Documentation](docs/api/)
- [Development Guide](docs/development.md)
- [Safety Controls](docs/safety.md)## Testing
The project maintains:
- >=70% overall test coverage
- >=85% coverage for safety-critical modules
- Comprehensive integration tests
- Performance benchmarks## Troubleshooting
### Common Issues
1. **Services fail to start or restart continuously**
- Check Docker logs: `docker logs `
- Neo4j memory settings require specific format in Docker Compose
- Ensure all required ports are available: 8080, 7474, 7687, 6333, 5432, 6379, 80862. **Application can't connect to databases**
- Verify environment variables are set in docker-compose files
- Services must use container names (e.g., `redis`, `postgres`) not `localhost`
- Check that all services are healthy: `docker-compose ps`3. **Health endpoint shows "unhealthy" but system works**
- This is expected if optional backends (like InfluxDB) aren't initialized
- Check individual component status in the health response
- Only required backends (Redis, PostgreSQL, Neo4j, Qdrant) need to be healthy4. **Cannot access endpoints from browser (EC2/Remote)**
- Ensure security groups allow inbound traffic on required ports
- Use server's public IP instead of localhost
- Consider SSH tunneling for secure development access5. **Fresh start after issues**
```bash
docker-compose down
docker volume rm $(docker volume ls -q | grep biocurator) # Removes all data
docker-compose build --no-cache app
docker-compose -f docker-compose.yml -f docker-compose.development.yml up -d
```## License
Apache 2.0 - See LICENSE file for details