Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/flanksource/canary-checker
Kubernetes Native Health Check Platform
https://github.com/flanksource/canary-checker
canary continuous-testing gitops kubernetes kubernetes-operator monitoring prometheus prometheus-exporter synthethic-testing
Last synced: about 2 months ago
JSON representation
Kubernetes Native Health Check Platform
- Host: GitHub
- URL: https://github.com/flanksource/canary-checker
- Owner: flanksource
- License: apache-2.0
- Created: 2019-12-11T18:40:30.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-10-29T11:06:50.000Z (3 months ago)
- Last Synced: 2024-10-29T11:51:55.158Z (3 months ago)
- Topics: canary, continuous-testing, gitops, kubernetes, kubernetes-operator, monitoring, prometheus, prometheus-exporter, synthethic-testing
- Language: Go
- Homepage: https://canarychecker.io
- Size: 12.1 MB
- Stars: 192
- Watchers: 8
- Forks: 35
- Open Issues: 166
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-k8s-resources - Canary Checker - Canary Checker is a kubernetes-native health check platform with 30+ built-in health check types. (Tools and Libraries / Monitoring, Alerts, and Visualization)
README
---
Canary checker is a kubernetes-native platform for monitoring health across application and infrastructure using both passive and active (synthetic) mechanisms.## Features
* **Batteries Included** - 35+ built-in check types
* **Kubernetes Native** - Health checks (or canaries) are CRD's that reflect health via the `status` field, making them compatible with GitOps, [Flux Health Checks](https://fluxcd.io/flux/components/kustomize/kustomization/#health-checks), Argo, Helm, etc..
* **Secret Management** - Leverage K8S secrets and configmaps for authentication and connection details
* **Prometheus** - Prometheus compatible metrics are exposed at `/metrics`. A Grafana Dashboard is also available.
* **Dependency Free** - Runs an embedded postgres instance by default, can also be configured to use an external database.
* **JUnit Export (CI/CD)** - Export health check results to JUnit format for integration into CI/CD pipelines
* **JUnit Import (k6/newman/puppeter/etc)** - Use any container that creates JUnit test results
* **Scriptable** - Go templates, Javascript and [CEL](https://canarychecker.io/scripting/cel) can be used to:
* Evaluate whether a check is passing and severity to use when failing
* Extract a user friendly error message
* Transform and filter check responses into individual check results
* Extract custom metrics
* **Multi-Modal** - While designed as a Kubernetes Operator, canary checker can also run as a CLI and a server without K8s## Getting Started
1. Install canary checker with Helm
```shell
helm repo add flanksource https://flanksource.github.io/charts
helm repo updatehelm install \
canary-checker \
flanksource/canary-checker \
-n canary-checker \
--create-namespace
--wait
```2. Create a new check
```yaml title="canary.yaml"
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: http-check
spec:
interval: 30
http:
- name: basic-check
url: https://httpbin.demo.aws.flanksource.com/status/200
- name: failing-check
url: https://httpbin.demo.aws.flanksource.com/status/500
```2a. Run the check locally (Optional)
```shell
wget https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_linux_amd64 \
-O canary-checker && chmod +x canary-checker
./canary-checker run canary.yaml
```[![asciicast](https://asciinema.org/a/cYS6hlmX516JQeECHH7za3IDG.svg)](https://asciinema.org/a/cYS6hlmX516JQeECHH7za3IDG)
3. Apply the check
```shell
kubectl apply -f canary.yaml
```4. Check the health status
```shell
kubectl get canary
`````` title="sample output"
NAME INTERVAL STATUS LAST CHECK UPTIME 1H LATENCY 1H LAST TRANSITIONED
http-check. 30 Passed 13s 18/18 (100.0%) 480ms 13s
```See [fixtures](https://github.com/flanksource/canary-checker/tree/master/fixtures) for more examples and [docs](https://canarychecker.io/getting-started) for more comprehensive documentation.
## Use Cases
### Synthetic Testing
Run simple HTTP/DNS/ICMP probes or more advanced full test suites using JMeter, K6, Playright, Postman.
```yaml
# Run a container that executes a playwright test, and then collect the
# JUnit formatted test results from the /tmp folder
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: playwright-junit
spec:
interval: 120
junit:
- testResults: "/tmp/"
name: playwright-junit
spec:
containers:
- name: playwright
image: ghcr.io/flanksource/canary-playwright:latest
```### Infrastructure Testing
Verify that infrastructure is fully operational by [deploying new pods](https://canarychecker.io/reference/pod), spinning up new EC2 instances and pushing/pulling from docker and helm repositories.
```yaml
# Schedule a new pod with an ingress and then time how long it takes to schedule, be ready, respond to an http request and finally be cleaned up.
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: pod-check
spec:
interval: 30
pod:
- name: golang
spec: |
apiVersion: v1
kind: Pod
metadata:
name: hello-world-golang
namespace: default
labels:
app: hello-world-golang
spec:
containers:
- name: hello
image: quay.io/toni0/hello-webserver-golang:latest
port: 8080
path: /foo/bar
scheduleTimeout: 20000
readyTimeout: 10000
httpTimeout: 7000
deleteTimeout: 12000
ingressTimeout: 10000
deadline: 60000
httpRetryInterval: 200
expectedContent: bar
expectedHttpStatuses: [200, 201, 202]
```### Backup Checks / Batch File Monitoring
Check that batch file processes are functioning correctly by checking the age and size of files in local file systems, SFTP, SMB, S3 and GCS.
```yaml
# Checks that a recent DB backup has been uploaded
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: folder-check
spec:
schedule: 0 22 * * *
folder:
- path: s3://database-backups/prod
name: prod-backup
maxAge: 1d
minSize: 10gb
```### Alert Aggregation
Aggregate alerts and recommendations from Prometheus, AWS Cloudwatch, Dynatrace, etc.
```yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: alertmanager-check
spec:
schedule: "*/5 * * * *"
alertmanager:
- url: alertmanager.monitoring.svc
alerts:
- .*
ignore:
- KubeScheduler.*
- Watchdog
transform:
# for each alert, transform it into a new check
javascript: |
var out = _.map(results, function(r) {
return {
name: r.name,
labels: r.labels,
icon: 'alert',
message: r.message,
description: r.message,
}
})
JSON.stringify(out);
```### Prometheus Exporter Replacement
Export [custom metrics](https://canarychecker.io/concepts/metrics-exporter) from the result of any check, making it possible to replace various other promethus exporters that collect metrics via HTTP, SQL, etc..
```yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: exchange-rates
spec:
schedule: "every 1 @hour"
http:
- name: exchange-rates
url: https://api.frankfurter.app/latest?from=USD&to=GBP,EUR,ILS
metrics:
- name: exchange_rate
type: gauge
value: result.json.rates.GBP
labels:
- name: "from"
value: "USD"
- name: to
value: GBP
```## Platform Ready
Canary checker is ideal for building platforms, developers can include health checks for their applications in whatever tooling they prefer, with secret management that uses native Kubernetes constructs.
```yaml
apiVersion: v1
kind: Secret
metadata:
name: basic-auth
stringData:
user: john
pass: doe
---
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: http-basic-auth-configmap
spec:
http:
- url: https://httpbin.demo.aws.flanksource.com/basic-auth/john/doe
username:
valueFrom:
secretKeyRef:
name: basic-auth
key: user
password:
valueFrom:
secretKeyRef:
name: basic-auth
key: pass
```## Dashboard
Canary checker comes with a built-in dashboard by default
![](https://canarychecker.io/img/canary-ui.png)
There is also a [grafana](https://canarychecker.io/concepts/grafana) dashboard, or build your own using the metrics exposed.
## Getting Help
If you have any questions about canary checker:
* Read the [docs](https://canarychecker.io)
* Invite yourself to the [CNCF community slack](https://slack.cncf.io/) and join the [#canary-checker](https://cloud-native.slack.com/messages/canary-checker/) channel.
* Check out the [Youtube Playlist](https://www.youtube.com/playlist?list=PLz4F_KggvA58D6krlw433TNr8qMbu1aIU).
* File an [issue](https://github.com/flanksource/canary-checker/issues/new) - (We do provide user support via Github Issues, so don't worry if your issue is a bug or not)Your feedback is always welcome!
## Check Types
| Protocol | Status | Checks |
| ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ |
| [HTTP(s)](https://canarychecker.io/reference/http) | GA | Response body, headers and duration |
| [DNS](https://canarychecker.io/reference/dns) | GA | Response and duration |
| [Ping/ICMP](https://canarychecker.io/reference/icmp) | GA | Duration and packet loss |
| [TCP](https://canarychecker.io/reference/tcp) | GA | Port is open and connectable |
| **Data Sources** | | |
| [SQL](https://canarychecker.io/reference/sql) (MySQL, Postgres, SQL Server) | GA | Ability to login, results, duration, health exposed via stored procedures |
| [LDAP](https://canarychecker.io/reference/ldap) | GA | Ability to login, response time |
| [ElasticSearch / Opensearch](https://canarychecker.io/reference/elasticsearch) | GA | Ability to login, response time, size of search results |
| [Mongo](https://canarychecker.io/reference/mongo) | Beta | Ability to login, results, duration, |
| [Redis](https://canarychecker.io/reference/redis) | GA | Ability to login, results, duration, |
| [Prometheus](https://canarychecker.io/reference/prometheus) | GA | Ability to login, results, duration, |
| **Alerts** | | Prometheus |
| [Prometheus Alert Manager](https://canarychecker.io/reference/alert-manager) | GA | Pending and firing alerts |
| [AWS Cloudwatch Alarms](https://canarychecker.io/reference/aws-cloudwatch) | GA | Pending and firing alarms |
| [Dynatrace Problems](./fixtures/external/dynatrace.yaml) | Beta | Problems deteced |
| **DevOps** | | |
| [Git](https://canarychecker.io/reference/git) | GA | Query Git and Github repositories via SQL |
| [Azure Devops](https://canarychecker.io/reference/azure-devops) | Beta | |
| **Integration Testing** | | |
| [JMeter](https://canarychecker.io/reference/jmeter) | Beta | Runs and checks the result of a JMeter test |
| [JUnit / BYO](https://canarychecker.io/reference/junit) | Beta | Run a pod that saves Junit test results |
| [K6](https://canarychecker.io/examples/k6) | Beta | Runs K6 tests that export JUnit via a container |
| [Newman](https://canarychecker.io/examples/newman) | Beta | Runs Newman / Postman tests that export JUnit via a container |
| [Playwright](https://canarychecker.io/examples/Playwright) | Beta | Runs Playwright tests that export JUnit via a container |
| **File Systems / Batch** | | |
| [Local Disk / NFS](https://canarychecker.io/reference/folder) | GA | Check folders for files that are: too few/many, too old/new, too small/large |
| [S3](https://canarychecker.io/reference/folder#s3) | GA | Check contents of AWS S3 Buckets |
| [GCS](https://canarychecker.io/reference/folder#gcs) | GA | Check contents of Google Cloud Storage Buckets |
| [SFTP](https://canarychecker.io/reference/folder#sftp) | GA | Check contents of folders over SFTP |
| [SMB / CIFS](https://canarychecker.io/reference/folder#smb) | GA | Check contents of folders over SMB/CIFS |
| **Config** | | |
| [AWS Config](https://canarychecker.io/reference/aws-config) | GA | Query AWS config using SQL |
| [AWS Config Rule](https://canarychecker.io/reference/aws-config-rule) | GA | AWS Config Rules that are firing, Custom AWS Config queries |
| [Config DB](https://canarychecker.io/reference/catalog) | GA | Custom config queries for Mission Control Config D |
| [Kubernetes Resources](https://canarychecker.io/reference/kubernetes) | GA | Kubernetes resources that are missing or are in a non-ready state |
| **Backups** | | |
| [GCP Databases](https://canarychecker.io/reference/gcs-database-backup#gcpdatabase) | GA | Backup freshness |
| [Restic](https://canarychecker.io/reference/restic) | Beta | Backup freshness and integrity |
| **Infrastructure** | | |
| [EC2](https://canarychecker.io/reference/ec2) | GA | Ability to launch new EC2 instances |
| [Kubernetes Ingress](https://canarychecker.io/reference/pod) | GA | Ability to schedule and then route traffic via an ingress to a pod |
| [Docker/Containerd](https://canarychecker.io/reference/containerd) | Deprecated | Ability to push and pull containers via docker/containerd |
| [Helm](https://canarychecker.io/reference/helm) | Deprecated | Ability to push and pull helm charts |
| [S3 Protocol](https://canarychecker.io/reference/s3-protocol) | GA | Ability to read/write/list objects on an S3 compatible object store |## Contributing
See [CONTRIBUTING.md](./CONTRIBUTING.md)
Thank you to all our contributors !
## License
Canary Checker core (the code in this repository) is licensed under [Apache 2.0](https://raw.githubusercontent.com/flanksource/canary-checker/main/LICENSE) and accepts contributions via GitHub pull requests after signing a CLA.
The UI (Dashboard) is free to use with canary checker under a license exception of [Flanksource UI](https://github.com/flanksource/flanksource-ui/blob/main/LICENSE#L7)