{"id":29225634,"url":"https://github.com/nexmonyx/health-controller","last_synced_at":"2025-08-21T19:29:05.437Z","repository":{"id":302579365,"uuid":"1012923322","full_name":"nexmonyx/health-controller","owner":"nexmonyx","description":"Health check aggregation and monitoring service","archived":false,"fork":false,"pushed_at":"2025-07-03T05:21:15.000Z","size":84,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-03T06:26:21.376Z","etag":null,"topics":["controller","go","health","kubernetes","microservice","nexmonyx"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nexmonyx.png","metadata":{"files":{"readme":"README-original.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-03T05:16:35.000Z","updated_at":"2025-07-03T05:21:18.000Z","dependencies_parsed_at":"2025-07-03T06:26:47.058Z","dependency_job_id":"bea4219d-4969-4a05-b07b-5d03f5515fc6","html_url":"https://github.com/nexmonyx/health-controller","commit_stats":null,"previous_names":["nexmonyx/health-controller"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nexmonyx/health-controller","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexmonyx%2Fhealth-controller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexmonyx%2Fhealth-controller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexmonyx%2Fhealth-controller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexmonyx%2Fhealth-controller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nexmonyx","download_url":"https://codeload.github.com/nexmonyx/health-controller/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexmonyx%2Fhealth-controller/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263279305,"owners_count":23441683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["controller","go","health","kubernetes","microservice","nexmonyx"],"created_at":"2025-07-03T07:11:13.456Z","updated_at":"2025-07-03T07:11:14.263Z","avatar_url":"https://github.com/nexmonyx.png","language":"Go","readme":"# Nexmonyx Health Controller\n\nThe Nexmonyx Health Controller is a standalone microservice responsible for monitoring the health of servers, services, APIs, databases, and external dependencies in the Nexmonyx platform.\n\n## Features\n\n### Health Monitoring Capabilities\n- **Server Health**: Heartbeat monitoring, resource utilization\n- **Service Health**: Systemd service status monitoring\n- **API Health**: HTTP endpoint availability and response time monitoring\n- **Database Health**: Database connectivity and query performance\n- **External Health**: Third-party service monitoring (AWS, Stripe, Auth0, etc.)\n- **Custom Health**: User-defined health checks with custom scripts\n\n### Advanced Features\n- **Health Scoring**: 0-100 health score calculation with configurable weights\n- **Predictive Analysis**: Anomaly detection and predictive failure detection\n- **Historical Tracking**: 30-day health history retention\n- **Incident Management**: Automatic incident creation and resolution\n- **Maintenance Windows**: Health check suspension during maintenance\n- **Alerting Integration**: Integration with alert controller for notifications\n\n### Performance \u0026 Scalability\n- **Concurrent Execution**: Configurable worker pool for parallel health checks\n- **Batch Processing**: Efficient batch processing of health checks\n- **Rate Limiting**: Built-in rate limiting to prevent API overload\n- **Caching**: Local caching for improved performance\n- **High Availability**: Leader election support for multi-instance deployments\n\n## Configuration\n\nThe controller uses environment variables for configuration. Copy `.env.example` to `.env` and update the values as needed.\n\n### Server Configuration\n```bash\nHEALTH_SERVER_HOST=0.0.0.0\nHEALTH_SERVER_PORT=8080\nHEALTH_SERVER_READ_TIMEOUT=30s\nHEALTH_SERVER_WRITE_TIMEOUT=30s\nHEALTH_SERVER_SHUTDOWN_TIMEOUT=10s\n```\n\n### Health Monitoring Configuration\n```bash\nHEALTH_CHECK_INTERVAL=30s                      # How often to schedule health checks\nHEALTH_HEARTBEAT_THRESHOLD=2m                  # Threshold for heartbeat checks\nHEALTH_WARNING_THRESHOLD=2m                    # Warning threshold\nHEALTH_CRITICAL_THRESHOLD=5m                   # Critical threshold\nHEALTH_MAX_CONCURRENT_CHECKS=100               # Max parallel health checks\nHEALTH_HISTORY_RETENTION_DAYS=30               # Health history retention\nHEALTH_SUMMARY_UPDATE_INTERVAL=1m              # How often to update summaries\nHEALTH_ANOMALY_DETECTION_ENABLED=true          # Enable anomaly detection\nHEALTH_PREDICTIVE_ANALYSIS_ENABLED=true        # Enable predictive analysis\nHEALTH_ALERTING_ENABLED=true                   # Enable alerting\nHEALTH_CHECK_BATCH_SIZE=50                     # Batch size for health checks\n```\n\n### Health Score Weights\n```bash\nHEALTH_SCORE_CRITICAL_WEIGHT=0      # Score for critical status\nHEALTH_SCORE_WARNING_WEIGHT=60      # Score for warning status\nHEALTH_SCORE_HEALTHY_WEIGHT=100     # Score for healthy status\nHEALTH_SCORE_UNKNOWN_WEIGHT=25      # Score for unknown status\n```\n\n### Resource Thresholds\n```bash\nHEALTH_CPU_WARNING_PERCENT=80.0\nHEALTH_CPU_CRITICAL_PERCENT=95.0\nHEALTH_MEMORY_WARNING_PERCENT=85.0\nHEALTH_MEMORY_CRITICAL_PERCENT=95.0\nHEALTH_DISK_WARNING_PERCENT=80.0\nHEALTH_DISK_CRITICAL_PERCENT=90.0\nHEALTH_NETWORK_WARNING_LATENCY_MS=100\nHEALTH_NETWORK_CRITICAL_LATENCY_MS=500\nHEALTH_NETWORK_WARNING_LOSS_PERCENT=1.0\nHEALTH_NETWORK_CRITICAL_LOSS_PERCENT=5.0\n```\n\n### Nexmonyx API Configuration\n```bash\nNEXMONYX_BASE_URL=https://api.nexmonyx.com\nNEXMONYX_ACCESS_KEY=your_access_key_here\nNEXMONYX_ACCESS_SECRET=your_access_secret_here\nNEXMONYX_TIMEOUT=30s\nNEXMONYX_RETRY_COUNT=3\nNEXMONYX_RETRY_DELAY=1s\nNEXMONYX_RATE_LIMIT_RPS=100\n```\n\n### Leader Election Configuration\n```bash\nHEALTH_LEADER_ELECTION_ENABLED=true\nHEALTH_LEADER_ELECTION_LOCK_NAME=nexmonyx-health-controller\nHEALTH_LEADER_ELECTION_LOCK_NAMESPACE=nexmonyx-system\nHEALTH_LEADER_ELECTION_LEASE_DURATION=15s\nHEALTH_LEADER_ELECTION_RENEW_DEADLINE=10s\nHEALTH_LEADER_ELECTION_RETRY_PERIOD=2s\n```\n\n## Health Check Types\n\n### 1. Heartbeat Checks\nMonitor server heartbeat and last seen time:\n```json\n{\n  \"check_type\": \"heartbeat\",\n  \"threshold\": {\n    \"warning_minutes\": 2,\n    \"critical_minutes\": 5\n  }\n}\n```\n\n### 2. Service Checks\nMonitor systemd service status:\n```json\n{\n  \"check_type\": \"service\",\n  \"config\": {\n    \"service_name\": \"nginx\"\n  }\n}\n```\n\n### 3. Resource Checks\nMonitor CPU, memory, disk usage:\n```json\n{\n  \"check_type\": \"resource\",\n  \"config\": {\n    \"resource_type\": \"cpu\"\n  },\n  \"threshold\": {\n    \"warning_percent\": 80,\n    \"critical_percent\": 95\n  }\n}\n```\n\n### 4. API Checks\nMonitor HTTP endpoint availability:\n```json\n{\n  \"check_type\": \"api\",\n  \"config\": {\n    \"url\": \"https://api.example.com/health\",\n    \"method\": \"GET\",\n    \"expected_status\": 200,\n    \"headers\": {\n      \"Authorization\": \"Bearer token\"\n    }\n  }\n}\n```\n\n### 5. Database Checks\nMonitor database connectivity:\n```json\n{\n  \"check_type\": \"database\",\n  \"config\": {\n    \"db_type\": \"postgresql\",\n    \"host\": \"db.example.com\",\n    \"port\": 5432,\n    \"database\": \"myapp\",\n    \"username\": \"monitor\"\n  }\n}\n```\n\n### 6. External Service Checks\nMonitor third-party services:\n```json\n{\n  \"check_type\": \"external\",\n  \"config\": {\n    \"service_type\": \"aws\",\n    \"region\": \"us-east-1\",\n    \"service\": \"ec2\"\n  }\n}\n```\n\n### 7. Custom Checks\nExecute custom scripts:\n```json\n{\n  \"check_type\": \"custom\",\n  \"config\": {\n    \"script\": \"#!/bin/bash\\necho 'Health check passed'\\nexit 0\",\n    \"interpreter\": \"bash\"\n  }\n}\n```\n\n## API Endpoints\n\n### Health and Status\n- `GET /health` - Controller health status\n- `GET /ready` - Readiness check with statistics\n- `GET /metrics` - Prometheus metrics\n- `GET /stats` - Detailed statistics\n\n### Controller Management\n- `GET /api/v1/status` - Controller status and statistics\n\n## Deployment\n\n### Docker\n```bash\n# Build the image\ndocker build -t nexmonyx-health-controller .\n\n# Run the container\ndocker run -d \\\n  --name health-controller \\\n  -p 8080:8080 \\\n  -p 9090:9090 \\\n  -e NEXMONYX_ACCESS_KEY=your-access-key \\\n  -e NEXMONYX_ACCESS_SECRET=your-access-secret \\\n  nexmonyx-health-controller\n```\n\n### Kubernetes\n```yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: nexmonyx-health-controller\nspec:\n  replicas: 2\n  selector:\n    matchLabels:\n      app: nexmonyx-health-controller\n  template:\n    metadata:\n      labels:\n        app: nexmonyx-health-controller\n    spec:\n      containers:\n      - name: health-controller\n        image: nexmonyx-health-controller:latest\n        ports:\n        - containerPort: 8080\n        - containerPort: 9090\n        env:\n        - name: NEXMONYX_ACCESS_KEY\n          valueFrom:\n            secretKeyRef:\n              name: nexmonyx-secrets\n              key: access-key\n        - name: NEXMONYX_ACCESS_SECRET\n          valueFrom:\n            secretKeyRef:\n              name: nexmonyx-secrets\n              key: access-secret\n        livenessProbe:\n          httpGet:\n            path: /health\n            port: 8080\n          initialDelaySeconds: 30\n          periodSeconds: 30\n        readinessProbe:\n          httpGet:\n            path: /ready\n            port: 8080\n          initialDelaySeconds: 5\n          periodSeconds: 10\n```\n\n## Monitoring and Observability\n\n### Metrics\nThe controller exposes Prometheus metrics at `/metrics`:\n- `health_checks_total` - Total health checks executed\n- `health_checks_successful_total` - Successful health checks\n- `health_checks_failed_total` - Failed health checks\n- `health_check_average_duration_ms` - Average check duration\n- `health_workers_active` - Active worker count\n- `health_checks_queued` - Queued health checks\n\n### Logging\nStructured JSON logging with configurable levels:\n- `trace` - Detailed execution flow\n- `debug` - Debug information\n- `info` - General information\n- `warn` - Warning conditions\n- `error` - Error conditions\n\n### Health Checks\n- Liveness probe: `GET /health`\n- Readiness probe: `GET /ready`\n\n## Architecture\n\n### Components\n1. **Health Service** - Core health monitoring logic\n2. **Worker Pool** - Concurrent health check execution\n3. **Check Executors** - Type-specific health check implementations\n4. **Configuration** - Environment-based configuration management\n5. **HTTP Server** - REST API and metrics endpoints\n\n### Data Flow\n1. Health Service schedules health checks based on intervals\n2. Due checks are submitted to the Worker Pool\n3. Workers execute health checks using appropriate executors\n4. Results are stored via the Nexmonyx API\n5. Health summaries are updated periodically\n6. Incidents are automatically created/resolved based on health status\n\n## Development\n\n### Prerequisites\n- Go 1.24+\n- Docker\n- Access to Nexmonyx API\n\n### Building\n```bash\n# Install dependencies\ngo mod download\n\n# Build the binary\ngo build -o health-controller .\n\n# Run locally with environment file\ncp .env.example .env\n# Edit .env with your configuration\n./health-controller\n```\n\n### Testing\n```bash\n# Run tests\ngo test ./...\n\n# Run with coverage\ngo test -cover ./...\n```\n\n## Integration\n\n### Nexmonyx SDK\nThe controller uses the official Nexmonyx Go SDK for all API operations:\n- Health check CRUD operations\n- Server information retrieval\n- Health history management\n- Integration with alerting system\n\n### Alert Controller\nAutomatic integration with the alert controller for:\n- Health status change notifications\n- Incident creation and updates\n- Escalation policies\n- Communication channels\n\n## Performance Considerations\n\n### Scalability\n- Supports monitoring 100,000+ servers\n- 1M+ health checks per minute\n- Horizontal scaling with leader election\n- Efficient resource utilization\n\n### Optimization\n- Batch processing for reduced API calls\n- Local caching for improved performance\n- Configurable concurrency limits\n- Rate limiting to prevent API overload\n\n### Resource Usage\n- Memory: \u003c 200MB under normal load\n- CPU: Scales with concurrent check count\n- Network: Configurable rate limiting\n- Storage: Local SQLite for caching\n\n## Security\n\n### Authentication\n- Access key/secret authentication with Nexmonyx API\n- Secure credential management via environment variables\n- RBAC integration through API permissions\n\n### Network Security\n- HTTPS communication with Nexmonyx API\n- Configurable TLS settings\n- Network isolation support\n\n### Runtime Security\n- Non-root container execution\n- Read-only filesystem where possible\n- Resource limits and quotas\n- Secure secret handling","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexmonyx%2Fhealth-controller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnexmonyx%2Fhealth-controller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexmonyx%2Fhealth-controller/lists"}