An open API service indexing awesome lists of open source software.

https://github.com/statcan/terraform-kubernetes-kube-prometheus-stack

Terraform module for Kube-Prometheus Stack
https://github.com/statcan/terraform-kubernetes-kube-prometheus-stack

Last synced: 11 months ago
JSON representation

Terraform module for Kube-Prometheus Stack

Awesome Lists containing this project

README

          

# Terraform Kubernetes Kube-Prometheus Stack

## Introduction

This module deploys and configures the Kube-Prometheus Stack inside a Kubernetes Cluster.

## Requirements

| Name | Version |
|------|---------|
| [terraform](#requirement\_terraform) | >= 1.0 |
| [helm](#requirement\_helm) | >= 2.0.0 |

## Providers

| Name | Version |
|------|---------|
| [helm](#provider\_helm) | >= 2.0.0 |
| [kubernetes](#provider\_kubernetes) | n/a |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| [chart\_version](#input\_chart\_version) | Version of the Helm chart | `any` | n/a | yes |
| [helm\_namespace](#input\_helm\_namespace) | The namespace Helm will install the chart under | `any` | n/a | yes |
| [cluster\_domain](#input\_cluster\_domain) | Cluster domain for DestinationRules | `string` | `"cluster.local"` | no |
| [destinationrules\_labels](#input\_destinationrules\_labels) | Labels applied to DestinationRules | `map(string)` | `{}` | no |
| [destinationrules\_mode](#input\_destinationrules\_mode) | DestionationRule TLS mode | `string` | `"DISABLE"` | no |
| [enable\_destinationrules](#input\_enable\_destinationrules) | Creates DestinationRules for Prometheus, Alertmanager, Grafana, and Node Exporters | `bool` | `false` | no |
| [enable\_prometheusrules](#input\_enable\_prometheusrules) | Adds PrometheusRules for alerts | `bool` | `true` | no |
| [helm\_release](#input\_helm\_release) | The name of the Helm release | `string` | `"kube-prometheus-stack"` | no |
| [helm\_repository](#input\_helm\_repository) | The repository where the Helm chart is stored | `string` | `"https://prometheus-community.github.io/helm-charts"` | no |
| [helm\_repository\_password](#input\_helm\_repository\_password) | The password of the repository where the Helm chart is stored | `string` | `""` | no |
| [helm\_repository\_username](#input\_helm\_repository\_username) | The username of the repository where the Helm chart is stored | `string` | `""` | no |
| [prometheus\_pvc\_name](#input\_prometheus\_pvc\_name) | Used for storage alert. Set if using non-default helm\_release | `string` | `"prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0"` | no |
| [values](#input\_values) | Values to be passed to the Helm chart | `string` | `""` | no |
| [alertmanager_replicas](#alertmanager\_replicas) | Number of replicas for Alertmanager | `number` | `1` | no |

## Outputs

| Name | Description |
|------|-------------|
| [helm\_namespace](#output\_helm\_namespace) | n/a |
| [helm\_release](#output\_helm\_release) | The name of the Helm release. For use by external ServiceMonitors |
| [status](#output\_status) | n/a |

## Usage

```terraform
module "helm_kube_prometheus_stack" {
source = "git::https://github.com/canada-ca-terraform-modules/terraform-kubernetes-kube-prometheus-stack?ref=v3.3.0"

chart_version = "43.3.0"
depends_on = [
module.namespace_monitoring,
]

helm_namespace = module.namespace_monitoring.name
helm_release = "kube-prometheus-stack"
helm_repository = "https://prometheus-community.github.io/helm-charts"

enable_destinationrules = true

values = <true. Removes variables: prometheusrules_labels, cluster_rules_name, namespace_rules_name, cert_manager_rules_name |
| 2023-01-09 | v3.1.0 | Add runbook links to Prometheus rules |
| 2023-01-11 | v3.1.1 | Fix ManyContainerRestarts alert to account for multiple metrics sources |
| 2023-02-01 | v3.2.0 | Node clock alerts and README update |
| 2023-02-03 | v3.2.1 | Specify sensitive variables |
| 2023-02-08 | v3.3.0 | Add abilitity to add DestinationRule for Alertmanager replicas |
| 2023-02-16 | v3.4.0 | Add rules for CoreDNS alerts |
| 2023-03-10 | v3.4.1 | Fix syntax error in CoreDNS alert rules |
| 2023-03-14 | v3.5.0 | Add rule for ContainerImagePullProblem, refactor container alert unit tests |
| 2023-03-15 | v3.6.0 | Add DestinationRule for Thanos Sidecar |
| 2023-03-28 | v3.7.0 | Add generic PVC alerts |
| 2023-04-05 | v3.8.0 | Add "cluster" in prometheus rule aggregations to make compatible with Thanos. Add Prometheus heartbeat recording rule |
| 2023-04-19 | v3.8.1 | Fix CoreDNSDown alert |
| 2023-04-21 | v3.8.2 | Ensure prometheus heartbeat recording rule is evaluated by Prometheus |
| 2023-05-04 | v3.8.3 | Fix ContainerImagePullProblem flapping |
| 2023-06-08 | v3.9.0 | Ignore terminated pods in pod capacity alerts |
| 2023-06-19 | v3.9.1 | Fix PersistentVolume status alerts |
| 2023-12-07 | v3.9.2 | Adjust node alerts for clock synchronization |
| 2024-02-29 | v3.9.3 | Adjust Node and PVC storage alerts |
| 2024-04-15 | v3.9.4 | Adjust Node alerts, report agentpool, standardize node label |
| 2024-05-31 | v3.9.5 | Update container alerts |
| 2024-09-09 | v3.9.6 | Debounce ContainerCrashLooping |
| 2024-12-03 | v3.9.7 | Add NodeDiskFull and fix/refactor some node alerts |

## Upgrading

### From v1.x to v2.x
1. Note that in [Usage](#usage) the `dependencies` array has been replaced by the `depends_on` array.

2. If **`enable_destinationrules`** was `true` in **v1.x**, locate the DestinationRules that were created in `helm_namespace`. There should be 4 correspoding to Prometheus, Alertmanager, Grafana, and the Prometheus Node Exporter. Delete them prior to the upgrade. If `enable_destinationrules` remains true, they will be recreated with minimal downtime.

3. If **`enable_prometheusrules`** was `true` in **v1.x**, locate the PrometheusRule definitions that were created in `helm_namespace`. There should be 2: `general-platform-alerts` and `general-project-alerts`. Delete them prior to the upgrade. If `enable_prometheusrules` remains true, they will be recreated. This may resolve any presently firing alerts. If it does, they will fire again once their conditions are met.

- The default names for these PrometheusRule resources are now `general-cluster-alerts` and `general-namespace-alerts`. The scopes have changed from `platform` to `cluster` and from `project` to `namespace`. Adjust Alertmanager routing criteria accordingly.
- The severities for these rules have been adjusted from `minor/major/urgent` to `debug/minor/major`. Adjust Alertmanager routing criteria accordingly.

## Previous Module

This module replaces [terraform-kubernetes-prometheus](https://github.com/StatCan/terraform-kubernetes-prometheus). The previous module used the custom chart [prometheus-operator](https://github.com/StatCan/charts/tree/master/stable/prometheus-operator), which used the now-deprecated upstream chart [prometheus-operator](https://github.com/helm/charts/tree/master/stable/prometheus-operator) as a sub-chart and added DestinationRules.

This new module uses the new upstream chart [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) directly. DestinationRules, as well as a set of general alerts, can be added through the module.

To migrate from the old custom chart to the new upstream chart, the following changes should be made to Helm values:

1. Remove the top-level `prometheus-operator:` and realign indentation, as you are no longer applying values to a subchart.
2. Remove any ` destinationRule:` specification and its contents, as this is now handled by [terraform variables](#variables-values).

The upstream `prometheus-operator` chart was renamed to `kube-prometheus-stack` to reflect that additional components beyond the Prometheus Operator are installed.