Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/litmuschaos/chaos-exporter

Prometheus Exporter for Litmus Chaos Metrics
https://github.com/litmuschaos/chaos-exporter

chaos-engineering chaos-exporter hacktoberfest kubernetes metrics prometheus prometheus-exporter

Last synced: about 1 month ago
JSON representation

Prometheus Exporter for Litmus Chaos Metrics

Awesome Lists containing this project

README

        

# Litmus Chaos Exporter

[![Slack Channel](https://img.shields.io/badge/Slack-Join-purple)](https://slack.litmuschaos.io)
![GitHub Workflow](https://github.com/litmuschaos/chaos-exporter/actions/workflows/push.yml/badge.svg?branch=master)
[![Docker Pulls](https://img.shields.io/docker/pulls/litmuschaos/chaos-exporter.svg)](https://hub.docker.com/r/litmuschaos/chaos-exporter)
[![GitHub issues](https://img.shields.io/github/issues/litmuschaos/chaos-exporter)](https://github.com/litmuschaos/chaos-exporter/issues)
[![Twitter Follow](https://img.shields.io/twitter/follow/litmuschaos?style=social)](https://twitter.com/LitmusChaos)
[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/5296/badge)](https://bestpractices.coreinfrastructure.org/projects/5296)
[![Go Report Card](https://goreportcard.com/badge/github.com/litmuschaos/chaos-exporter)](https://goreportcard.com/report/github.com/litmuschaos/chaos-exporter)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Flitmuschaos%2Fchaos-exporter.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Flitmuschaos%2Fchaos-exporter?ref=badge_shield)
[![YouTube Channel](https://img.shields.io/badge/YouTube-Subscribe-red)](https://www.youtube.com/channel/UCa57PMqmz_j0wnteRa9nCaw)


- This is a custom Prometheus and CloudWatch exporter to expose Litmus Chaos metrics.
To learn more about Litmus Chaos Experiments & the Litmus Chaos Operator,
visit this link: [Litmus Docs](https://docs.litmuschaos.io/)

- Typically deployed along with the chaos-operator deployment, which,
in-turn is associated with all chaosresults in the cluster.

- Two types of metrics are exposed:

- AggregateMetrics: These metrics are derived from the all the chaosresults present inside `WATCH_NAMESPACE`. If `WATCH_NAMESPACE` is not defined then it derived metrics from all namespaces. It exposes total_passed_experiment, total_failed_experiment, total_awaited_experiment, experiment_run_count, experiment_installed_count metrices.

- ExperimentScoped: Individual experiment run status. It exposes passed_experiment, failed_experiment, awaited_experiment, result_verdict,probe_success_percentage, startTime, endTime, totalDuration, chaosInjectTime metrices.

### ExperimentScoped Metrics

Metrics Name
litmuschaos_passed_experiments

Description
It contains total number of passed experiments

Source
ChaosResult

Sample Metrics
litmuschaos_passed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1

Notes
The litmuschaos_passed_experiments contains the cumulative sum of passed runs for the given ChaosResult.

Metrics Name
litmuschaos_failed_experiments

Description
It contains total number of failed experiments

Source
ChaosResult

Sample Metrics
litmuschaos_failed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0

Notes
The litmuschaos_failed_experiments contains the cumulative sum of failed runs for the given ChaosResult.

Metrics Name
litmuschaos_awaited_experiments

Description
It contains total number of awaited experiments

Source
ChaosResult

Sample Metrics
litmuschaos_awaited_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1

Notes
The litmuschaos_awaited_experiments denotes the queued experiments for each ChaosResult. It contains the value as 1 if the ChaosResult's verdict is Awaited otherwise it's value is 0.

Metrics Name
litmuschaos_probe_success_percentage

Description
It contains the ProbeSuccessPercentage for the experiment

Source
ChaosResult

Sample Metrics
litmuschaos_probe_success_percentage{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 100

Notes
The litmuschaos_probe_success_percentage defines the percentage of passed probes out of total probes defined inside the ChaosEngine.

Metrics Name
litmuschaos_experiment_start_time

Description
It contains the start time of the experiment

Source
ExperimentDependencyCheck event inside the ChaosEngine

Sample Metrics
litmuschaos_experiment_start_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425155e+09

Notes
The litmuschaos_experiment_start_time denotes the start time of the experiment, which calculated based on the ExperimentDependencyCheck event(created by the chaos-runner just before launching experiment pod).

Metrics Name
litmuschaos_experiment_end_time

Description
It contains the end time of the experiment

Source
Summary event inside the ChaosEngine

Sample Metrics
litmuschaos_experiment_end_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425219e+09

Notes
The litmuschaos_experiment_end_time denotes the end time of the experiment, which calculated based on the Summary event(created by experiment pod in the end of experiment).

Metrics Name
litmuschaos_experiment_chaos_injected_time

Description
It contains the chaos injection time of the experiment

Source
ChaosInject event inside the ChaosEngine

Sample Metrics
litmuschaos_experiment_chaos_injected_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425199e+09

Notes
The litmuschaos_experiment_chaos_injected_time defines the time duration when chaos is actually injected, which calculated based on the ChaosInject event(created by the experiment/helper pod just before chaos injection).

Metrics Name
litmuschaos_experiment_total_duration

Description
It contains the total chaos duration of the experiment

Source
It is time difference b/w startTime and endTime

Sample Metrics
litmuschaos_experiment_total_duration{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 64

Notes
The litmuschaos_experiment_total_duration defines the total chaos duration of the experiment. It is time interval betweeen start time and the end time.

Metrics Name
litmuschaos_experiment_verdict

Description
It contains the experiment verdict details

Source
ChaosResult

Sample Metrics
litmuschaos_experiment_verdict{app_kind="deployment",app_label="run=nginx",app_namespace="nginx",chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus",chaosresult_verdict="Pass",probe_success_percentage="100.000000"} 1

Notes
The litmuschaos_experiment_verdict sets the metrics based on the ChaosResult verdict. In case of Awaited verdict it always set to 0. In case of other verdicts it contains value as 1. But if the verdict is repeated more than TSDB_SCRAPE_INTERVAL(passed as ENV) then it will set to 0 until verdict change to a different value.


### NamespacedScoped Metrics

Metrics Name
litmuschaos_namespace_scoped_passed_experiments

Description
It contains the total passed experiments count in the WATCH_NAMESPACE

Source
Aggregated sum of all the litmuschaos_passed_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE

Sample Metrics
litmuschaos_namespace_scoped_passed_experiments 2

Notes
The litmuschaos_namespace_scoped_passed_experiments defines the total number of passed experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_passed_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.

Metrics Name
litmuschaos_namespace_scoped_failed_experiments

Description
It contains the total failed experiments count in the WATCH_NAMESPACE

Source
Aggregated sum of all the litmuschaos_failed_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE

Sample Metrics
litmuschaos_namespace_scoped_failed_experiments 0

Notes
The litmuschaos_namespace_scoped_failed_experiments defines the total number of failed experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_failed_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.

Metrics Name
litmuschaos_namespace_scoped_awaited_experiments

Description
It contains the total awaited experiments count in the WATCH_NAMESPACE

Source
Aggregated sum of all the litmuschaos_awaited_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE

Sample Metrics
litmuschaos_namespace_scoped_awaited_experiments 0

Notes
The litmuschaos_namespace_scoped_awaited_experiments defines the total number of awaited/queued experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_awaited_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.

Metrics Name
litmuschaos_namespace_scoped_experiments_run_count

Description
It contains the total experiments run count in the WATCH_NAMESPACE

Source
Aggregated sum of all the experiments runs in the WATCH_NAMESPACE

Sample Metrics
litmuschaos_namespace_scoped_experiments_run_count 2

Notes
The litmuschaos_namespace_scoped_experiments_run_count defines the total experiment runs in the WATCH_NAMESPACE. It is summation of litmuschaos_passed_experiments + litmuschaos_failed_experiments + litmuschaos_awaited_experiments for every ChaosResult present present inside the WATCH_NAMESPACE.

Metrics Name
litmuschaos_namespace_scoped_experiments_installed_count

Description
It contains the total unique experiments installed/run in the WATCH_NAMESPACE

Source
It contains total unique experiments count in the WATCH_NAMESPACE

Sample Metrics
litmuschaos_namespace_scoped_experiments_installed_count 1

Notes
The litmuschaos_namespace_scoped_experiments_installed_count defines the total unique experiments installed/run in the WATCH_NAMESPACE. It is equal to the total number of ChaosResult present inside the WATCH_NAMESPACE.


### ClusterScoped Metrics

Metrics Name
litmuschaos_cluster_scoped_passed_experiments

Description
It contains the total passed experiments count in all the namespaces

Source
Aggregated sum of all the litmuschaos_passed_experiments metrics derived from the ChaosResult present inside all the namespaces

Sample Metrics
litmuschaos_cluster_scoped_passed_experiments 2

Notes
The litmuschaos_cluster_scoped_passed_experiments defines the total number of passed experiments across the cluster. It is the summation of litmuschaos_passed_experiments metrics for every ChaosResult in all the namespaces.

Metrics Name
litmuschaos_cluster_scoped_failed_experiments

Description
It contains the total failed experiments count in all the namespaces

Source
Aggregated sum of all the litmuschaos_failed_experiments metrics derived from the ChaosResult present inside all the namespaces

Sample Metrics
litmuschaos_cluster_scoped_failed_experiments 0

Notes
The litmuschaos_cluster_scoped_failed_experiments defines the total number of failed experiments across the cluster. It is the summation of litmuschaos_failed_experiments metrics for every ChaosResult in all the namespaces.

Metrics Name
litmuschaos_cluster_scoped_awaited_experiments

Description
It contains the total awaited experiments count in all the namespaces

Source
Aggregated sum of all the litmuschaos_awaited_experiments metrics derived from the ChaosResult present inside all the namespaces

Sample Metrics
litmuschaos_cluster_scoped_awaited_experiments 0

Notes
The litmuschaos_cluster_scoped_awaited_experiments defines the total number of awaited/queued experiments across the cluster. It is the summation of litmuschaos_awaited_experiments metrics for every ChaosResult in all the namespaces.

Metrics Name
litmuschaos_cluster_scoped_experiments_run_count

Description
It contains the total experiments run count in all the namespaces

Source
Aggregated sum of all the experiments runs in all the namespaces

Sample Metrics
litmuschaos_cluster_scoped_experiments_run_count 2

Notes
The litmuschaos_cluster_scoped_experiments_run_count defines the total experiment runs across the cluster. It is summation of litmuschaos_passed_experiments + litmuschaos_failed_experiments + litmuschaos_awaited_experiments for every ChaosResult present inside all the namespaces.

Metrics Name
litmuschaos_cluster_scoped_experiments_installed_count

Description
It contains the total unique experiments installed/run in all the namespaces

Source
It contains total unique experiments count in all the namespaces

Sample Metrics
litmuschaos_cluster_scoped_experiments_installed_count 1

Notes
The litmuschaos_cluster_scoped_experiments_installed_count defines the total unique experiments installed/run across the cluster. It is equal to the total number of ChaosResult present inside all the namespaces.

## Steps to build & deploy:

### Running Litmus Chaos Experiments in order to generate metrics

- Follow the steps described [here](https://v1-docs.litmuschaos.io/docs/getstarted/) to run litmus chaos experiments which stores the chaos results. The chaos custom resources(chaosresult and chaosengine) are used by the exporter to generate metrics.

### Running Chaos Exporter on the local Machine

- Run the exporter container (litmuschaos/chaos-exporter:ci) on host network. It is necessary to mount the kubeconfig
& override entrypoint w/ `./exporter -kubeconfig `

- Execute `curl 127.0.0.1:8080/metrics` to view metrics

### Running Chaos Exporter as a deployment on the Kubernetes Cluster

- Install the RBAC (serviceaccount, role, rolebinding) as per deploy/rbac.md

- Deploy the chaos-exporter.yaml

- From a cluster node, execute `curl :8080/metrics`

### Example Metrics

```
# HELP litmuschaos_awaited_experiments Total number of awaited experiments
# TYPE litmuschaos_awaited_experiments gauge
litmuschaos_awaited_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
# HELP litmuschaos_cluster_scoped_awaited_experiments Total number of awaited experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_awaited_experiments gauge
litmuschaos_cluster_scoped_awaited_experiments 0
# HELP litmuschaos_cluster_scoped_experiments_installed_count Total number of experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_experiments_installed_count gauge
litmuschaos_cluster_scoped_experiments_installed_count 1
# HELP litmuschaos_cluster_scoped_experiments_run_count Total experiments run in all namespaces
# TYPE litmuschaos_cluster_scoped_experiments_run_count gauge
litmuschaos_cluster_scoped_experiments_run_count 2
# HELP litmuschaos_cluster_scoped_failed_experiments Total number of failed experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_failed_experiments gauge
litmuschaos_cluster_scoped_failed_experiments 0
# HELP litmuschaos_cluster_scoped_passed_experiments Total number of passed experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_passed_experiments gauge
litmuschaos_cluster_scoped_passed_experiments 2
# HELP litmuschaos_experiment_chaos_injected_time chaos injected time of the experiments
# TYPE litmuschaos_experiment_chaos_injected_time gauge
litmuschaos_experiment_chaos_injected_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426086e+09
# HELP litmuschaos_experiment_end_time end time of the experiments
# TYPE litmuschaos_experiment_end_time gauge
litmuschaos_experiment_end_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426108e+09
# HELP litmuschaos_experiment_start_time start time of the experiments
# TYPE litmuschaos_experiment_start_time gauge
litmuschaos_experiment_start_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426056e+09
# HELP litmuschaos_failed_experiments Total number of failed experiments
# TYPE litmuschaos_failed_experiments gauge
litmuschaos_failed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
# HELP litmuschaos_passed_experiments Total number of passed experiments
# TYPE litmuschaos_passed_experiments gauge
litmuschaos_passed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 2
# HELP litmuschaos_probe_success_percentage ProbeSuccesPercentage for the experiments
# TYPE litmuschaos_probe_success_percentage gauge
litmuschaos_probe_success_percentage{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 100
```

## How do I contribute?

You can contribute by raising issues, improving the documentation, contributing to the core framework and tooling, etc.

Head over to the [Contribution guide](CONTRIBUTING.md)

## License
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Flitmuschaos%2Fchaos-exporter.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Flitmuschaos%2Fchaos-exporter?ref=badge_large)