https://github.com/isometry/platform-health
Lightweight & extensible platform health monitoring
https://github.com/isometry/platform-health
Last synced: 2 months ago
JSON representation
Lightweight & extensible platform health monitoring
- Host: GitHub
- URL: https://github.com/isometry/platform-health
- Owner: isometry
- License: other
- Created: 2024-03-30T16:00:58.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-10-07T05:39:48.000Z (7 months ago)
- Last Synced: 2025-10-07T07:22:06.993Z (7 months ago)
- Language: Go
- Size: 503 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Platform Health
Lightweight & extensible platform health monitoring.
## Overview
Platform Health is a simple client/server system for lightweight health monitoring of platform components and systems.
The Platform Health client (`ph client`) sends a gRPC health check request to a Platform Health server which is configured to probe a set of network services. Probes run asynchronously on the server (subject to configurable timeouts), with the accumulated response returned to the client.
## Providers
Probes use a compile-time [provider plugin system](pkg/provider) that supports extension to monitoring of arbitrary services. Integrated providers include:
* [`system`](pkg/provider/system): Hierarchical grouping of related health checks with status aggregation
* [`satellite`](pkg/provider/satellite): A separate satellite instance of the Platform Health server
* [`ssh`](pkg/provider/ssh): SSH protocol handshake with host key verification
* [`tcp`](pkg/provider/tcp): TCP connectivity checks
* [`tls`](pkg/provider/tls): TLS handshake and certificate verification
* [`http`](pkg/provider/http): HTTP(S) health checks with CEL-based response validation, full REST/GraphQL API support, and TLS details
* [`grpc`](pkg/provider/grpc): gRPC Health v1 service status checks
* [`kubernetes`](pkg/provider/kubernetes): Kubernetes resource existence and readiness
* [`helm`](pkg/provider/helm): Helm release existence and deployment status
* [`vault`](pkg/provider/vault): [Vault](https://www.vaultproject.io/) cluster initialization and seal status
Each provider implements the `Instance` interface, with the health of each instance obtained asynchronously, and contributing to the overall response.
## Installation
### macOS/Linux
```bash
brew install isometry/tap/platform-health
```
```console
$ ph server -l & sleep 1 && ph client && kill %1
{"status":"HEALTHY", "duration":"0.000004833s"}
```
### Kubernetes
#### Install via `helm` chart
```console
helm upgrade \
--install platform-health \
-n platform-health --create-namespace \
oci://ghcr.io/isometry/charts/platform-health
```
#### Install via `kubectl`
```bash
kubectl create configmap platform-health --from-file=platform-health.yaml=/dev/stdin <<-EOF
components:
ssh@localhost:
type: tcp
spec:
host: localhost
port: 22
gmail:
type: tls
spec:
host: smtp.gmail.com
port: 465
google:
type: http
spec:
url: https://google.com
EOF
kubectl create deployment platform-health --image ghcr.io/isometry/platform-health:latest --port=8080
kubectl patch deployment platform-health --patch-file=/dev/stdin <<-EOF
spec:
template:
spec:
volumes:
- name: config
configMap:
name: platform-health
containers:
- name: platform-health
args:
- -vv
volumeMounts:
- name: config
mountPath: /config
EOF
kubectl create service loadbalancer platform-health --tcp=8080:8080
```
## Usage
### Client
```bash
# Check all components
ph client
# Check specific components
ph client -c google -c github
# Check with hierarchical path (system/component)
ph client -c fluxcd/source-controller
# Connect to remote server
ph client prod:8080 -c google
```
### One-Shot Mode
Run health checks once and exit without starting a server:
```bash
ph check
# Check specific components only
ph check -c google -c fluxcd/source-controller
```
This is useful for:
- Validating configuration files
- Local health check verification
- CI/CD pipeline integration
- Testing specific components
### Ad-hoc Checks
Create and run health checks without a configuration file:
```bash
# TCP connectivity check
ph check tcp --host example.com --port 443
# HTTP health check
ph check http --url https://api.example.com/health
# HTTP check with CEL expression
ph check http --url https://api.example.com/health \
--check 'response.status == 200'
# TLS certificate check
ph check tls --host example.com --port 443
```
### Context Inspection
Inspect the CEL evaluation context for debugging expressions:
```bash
# View context for a configured component
ph context my-app
# View context for nested system components
ph context fluxcd/source-controller
# View context for ad-hoc provider
ph context http --url https://api.example.com/health
```
## Configuration
The Platform Health server reads configuration from a YAML file. By default, it searches for `platform-health.yaml` in standard config paths (`.` and `/config`).
You can customize this with:
- `--config-path`: Override config file search paths (can be specified multiple times)
- `--config-name`: Change the config file name (without extension)
```bash
# Use custom config file
ph server --config-name myconfig
# Add search paths
ph server --config-path /custom/path --config-path ./local
```
### Configuration Structure
All health check components are defined under the `components` key:
```yaml
components:
:
type: # required
spec: # provider-specific configuration
:
checks: [...] # optional CEL expressions
timeout: # optional per-instance timeout
components: {...} # optional nested children (system provider)
```
Component names can contain any characters valid in YAML keys, but should avoid `/` which is used for path-filtered queries. The `type` field specifies which provider to use, and provider-specific configuration goes under `spec`.
### Example
The following configuration will monitor that /something/ is listening on `tcp/22` of localhost; validate connectivity and TLS handshake to the Gmail SSL mail-submission port; and validate that Google is accessible and returning a 200 status code:
```yaml
components:
ssh@localhost:
type: tcp
spec:
host: localhost
port: 22
gmail:
type: tls
spec:
host: smtp.gmail.com
port: 465
google:
type: http
spec:
url: https://google.com
api-health:
type: http
spec:
url: https://api.example.com/health
method: GET
checks:
- check: 'response.status == 200'
message: "Expected HTTP 200"
- check: 'response.json.status == "healthy"'
message: "Service unhealthy"
```
### Hierarchical Grouping
Use the `system` provider to group related checks:
```yaml
components:
fluxcd:
type: system
components:
source-controller:
type: kubernetes
spec:
kind: Deployment
namespace: flux-system
name: source-controller
kustomize-controller:
type: kubernetes
spec:
kind: Deployment
namespace: flux-system
name: kustomize-controller
```
The system is reported "healthy" only if all sub-components are healthy.
### CEL Expressions
Several providers support CEL (Common Expression Language) expressions for custom health check validation:
- [`http`](pkg/provider/http): HTTP request and response details with JSON parsing for REST/GraphQL API validation
- [`tls`](pkg/provider/tls): TLS connection and certificate details
- [`ssh`](pkg/provider/ssh): SSH host key and connection details
- [`kubernetes`](pkg/provider/kubernetes): Full resource(s), including metadata, spec, status, etc.
- [`helm`](pkg/provider/helm): Release info, chart metadata, values and manifests
Use `ph context` to inspect the evaluation context available to your expressions. See [pkg/checks/README.md](pkg/checks/README.md) for CEL syntax examples and patterns.