https://github.com/poacosta/service-health-monitor
Basic Service Health Monitor to notify outages using Slack
https://github.com/poacosta/service-health-monitor
aws aws-lambda infrastructure-as-code python slack terraform
Last synced: 3 months ago
JSON representation
Basic Service Health Monitor to notify outages using Slack
- Host: GitHub
- URL: https://github.com/poacosta/service-health-monitor
- Owner: poacosta
- License: mit
- Created: 2024-12-19T20:55:29.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-02-04T21:00:38.000Z (3 months ago)
- Last Synced: 2025-02-04T22:18:21.218Z (3 months ago)
- Topics: aws, aws-lambda, infrastructure-as-code, python, slack, terraform
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Service Health Monitor
Ever had that moment when your production services decided to take an unannounced vacation?
Yeah, me too.
That's why I
built this automated health monitoring system that keeps tabs on your services and sends Slack notifications when things
go sideways.
Think of it as your infrastructure's personal health assistant.## 🎯 Prerequisites
Before diving in, make sure you have all the necessary components set up.
Check out [PREREQUISITES.md](PREREQUISITES.MD) for a detailed setup guide.### Quick Sanity Check ✅
Before proceeding, verify:
- [ ] AWS CLI configured (`aws sts get-caller-identity`)
- [ ] Terraform installed (`terraform -v`)
- [ ] Python 3.9 available (`python3.9 --version`)
- [ ] Slack webhook URL obtained
- [ ] Virtual environment activated
- [ ] `dist/` directory with all necessary filesIf any of these are missing, check the detailed sections above. Trust me, it's worth getting these, right from the
start!## Features
- **Async Health Checks**: Because waiting is so 2010
- **Slack Integration**: Get notifications that actually look good (and are useful!)
- **AWS Lambda Ready**: Serverless, because who wants to manage servers for monitoring servers?
- **Infrastructure as Code**: Everything in Terraform, because we're professionals here
- **Configurable Monitoring**: Customize everything from timeouts to headers
- **Multi-Service Support**: Monitor both frontend and backend services in one go## 🚀 Quick Start
1. Clone this repo:
```bash
git clone https://github.com/poacosta/service-health-monitor
cd service-health-monitor
```2. Set up your Python environment:
```bash
python -m venv .venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
pip install -r requirements.txt
```3. Create your `terraform.tfvars`:
```hcl
project_name = "my-awesome-project"
environment = "production"
slack_webhook_url = "https://hooks.slack.com/services/your/webhook/url"
services_config = [
{
name = "Backend API"
url = "https://api.example.com/health"
type = "backend"
timeout = 30
expected_status = 200
custom_headers = {
"Authorization" = "Bearer your-token-if-needed"
}
},
{
name = "Frontend App"
url = "https://app.example.com"
type = "frontend"
timeout = 30
expected_status = [200, 429, 403]
}
]
```4. Deploy to AWS:
```bash
cd terraform
terraform init -upgrade
terraform plan
terraform apply
```## 🎯 Use Cases
- **Microservices Monitoring**: Keep track of your distributed services
- **Frontend Health**: Monitor your user-facing applications
- **API Availability**: Ensure your APIs are responding correctly
- **Custom Health Checks**: Add custom headers for authenticated endpoints## 🔧 Configuration
### Service Configuration
Each service in your `terraform.tfvars` can have:
- `name`: Service identifier
- `url`: Health check endpoint
- `type`: "backend" or "frontend"
- `timeout`: Request timeout in seconds (default: 30)
- `expected_status`: Expected HTTP statuses (default: 200)
- `custom_headers`: Additional HTTP headers### Schedule Configuration
Modify the check frequency in `terraform.tfvars`:
```hcl
schedule_expression = "rate(5 minutes)" # Default
# OR
schedule_expression = "cron(0/15 * * * ? *)" # Every 15 minutes
```### Config Example
📓 [terraform.tfvars.example](terraform/terraform.tfvars.example)
## 🏗 Architecture
```
┌─────────────┐ ┌──────────┐ ┌────────────┐
│ EventBridge │ ──▶ │ Lambda │ ──▶ │ Services │
└─────────────┘ └──────────┘ └────────────┘
│
▼
┌─────────┐
│ Slack │
└─────────┘
```## 📈 Future Improvements
- [ ] Add metrics export to CloudWatch
- [ ] Implement retry mechanisms with exponential backoff
- [ ] Add support for custom health check logic
- [ ] Create a dashboard for historical uptime data
- [ ] Add support for multiple notification channels## 🤝 Contributing
Feel free to dive in! [Open an issue](https://github.com/poacosta/service-health-monitor/issues/new) or submit PRs.
### Development Setup
1. Fork the Repository
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request## 📝 License
This project is licensed under the MIT License — see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- The async Python community for making non-blocking requests a breeze
- Terraform for making infrastructure manageable
- Coffee ☕ for making everything possible## 🔐 Security
Please ensure you never commit sensitive information like tokens or webhook URLs. Use environment variables or AWS
Secrets Manager for production deployments.## ✨ About
Built with love for DevOps engineers who want to sleep better at night.
Because your services should notify you before your users do.