An open API service indexing awesome lists of open source software.

https://github.com/sourcefuse/terraform-aws-arc-observability-stack

The Observability Terraform module simplifies deploying a robust monitoring and logging stack in EKS. It provides flexibility to choose between AWS OpenSearch or ElasticSearch for search engines and Fluentd or Fluent Bit for log aggregation, along with Prometheus, Grafana, Alertmanager, and Blackbox Exporter for metrics and alerting.
https://github.com/sourcefuse/terraform-aws-arc-observability-stack

Last synced: 6 months ago
JSON representation

The Observability Terraform module simplifies deploying a robust monitoring and logging stack in EKS. It provides flexibility to choose between AWS OpenSearch or ElasticSearch for search engines and Fluentd or Fluent Bit for log aggregation, along with Prometheus, Grafana, Alertmanager, and Blackbox Exporter for metrics and alerting.

Awesome Lists containing this project

README

          

![Module Structure](./static/banner.png)

# [terraform-aws-arc-observability-stack](https://github.com/sourcefuse/terraform-aws-arc-observability-stack)

Latest Release Last Updated ![Terraform](https://img.shields.io/badge/terraform-%235835CC.svg?style=for-the-badge&logo=terraform&logoColor=white) ![GitHub Actions](https://img.shields.io/badge/github%20actions-%232671E5.svg?style=for-the-badge&logo=githubactions&logoColor=white)

[![Quality gate](https://sonarcloud.io/api/project_badges/quality_gate?project=sourcefuse_terraform-aws-arc-observability-stack&token=bf80d04b87395dadd9473538860728fd9d32cfc6)](https://sonarcloud.io/summary/new_code?id=sourcefuse_terraform-aws-arc-observability-stack)

[![Known Vulnerabilities](https://github.com/sourcefuse/terraform-aws-arc-observability-stack/actions/workflows/snyk.yaml/badge.svg)](https://github.com/sourcefuse/terraform-aws-arc-observability-stack/actions/workflows/snyk.yaml)
## Introduction

The Observability Terraform Module is a comprehensive solution designed to simplify the deployment of a full-stack observability ecosystem in Kubernetes environments. This module enables organizations to monitor and troubleshoot their infrastructure and applications effectively, offering the flexibility to choose between various open-source tools.

### Key Features:
1. EFK Stack for Log Management:

- Deploy either Fluentd or Fluent Bit as the log collector, providing lightweight and efficient options for log aggregation.
- Seamlessly integrate with either Elasticsearch or OpenSearch for scalable and reliable log storage.

2. Prometheus Stack for Metrics Monitoring:

- Includes Prometheus for metrics collection and Alertmanager for alerting.
- Integrated support for Grafana, offering rich dashboards to visualize metrics effectively.
- Enables monitoring of HTTP endpoints using the Blackbox Exporter.

3. Flexibility and Customization:

- Fully customizable configurations for each component, allowing fine-grained control over deployment and resources.
- Supports multiple log collectors and storage backends, giving users the freedom to choose based on their requirements.

4. Streamlined Deployment:

- Automates the deployment of the entire observability stack, reducing complexity and ensuring consistency.
- Includes preconfigured dashboards and alert rules for quick setup and immediate insights.

5. Signoz Community Edition Support
- Adds native support for Signoz CE, an all-in-one observability platform.
- Enables logs, metrics, and traces to be collected and correlated in one unified interface.
- Simplifies tracing setup with OpenTelemetry Collector and works out of the box with distributed applications.

For more information about this repository and its usage, please see [Terraform AWS ARC Observability Module Usage Guide](docs/module-usage-guide/README.md).

Create the following resources in a single region.

* VPC
* Multi-AZ private and public subnets
* Route tables, internet gateway, and NAT gateways
* Configurable VPC Endpoints

### Prerequisites
Before using this module, ensure you have the following:

- AWS credentials configured.
- Terraform installed.
- A working knowledge of Terraform.

## Usage
See the `examples` folder for a complete example.

### EFK Stack
```hcl

module "efk" {
source = "sourcefuse/arc-observability-stack/aws"
version = "0.0.1"

environment = var.environment
namespace = var.namespace

search_engine = "elasticsearch"
log_aggregator = "fluentd"

elasticsearch_config = {
name = "elasticsearch-master"
k8s_namespace = {
name = "logging"
create = true
}

tls_self_signed_cert_data = {
organisation = "ARC"
validity_period_hours = 26280 # 3 years validity
early_renewal_hours = 168 # 1 week early renewal
}

cluster_config = {
port = "9200"
transport_port = "9300"
user = "elastic"
log_level = "INFO"
cpu_limit = "2000m"
memory_limit = "4Gi"
cpu_request = "1000m"
memory_request = "2Gi"
storage_class = "gp2"
storage = "40Gi"
}

kibana_config = {
log_level = "info"
cpu_limit = "500m"
memory_limit = "1Gi"
cpu_request = "250m"
memory_request = "500Mi"

ingress_enabled = true
aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
ingress_host = "kibana.xx-xx.xx"

}
}

fluentd_config = {
k8s_namespace = {
name = "logging"
create = false
}
name = "fluentd"
search_engine = "elasticsearch"
cpu_limit = "100m"
memory_limit = "512Mi"
cpu_request = "100m"
memory_request = "128Mi"
logstash_dateformat = "%Y.%m.%d"
log_level = "info"
}
}

```
### Prometheus

```hcl

module "prometheus" {
source = "sourcefuse/arc-observability-stack/aws"
version = "0.0.1"

environment = var.environment
namespace = var.namespace

metrics_monitoring_system = "prometheus"

prometheus_config = {
k8s_namespace = {
name = "metrics"
create = true
}
log_level = "info"
resources = {
cpu_limit = "100m"
memory_limit = "512Mi"
cpu_request = "100m"
memory_request = "128Mi"
}
replicas = 1
storage = "8Gi"
enable_kube_state_metrics = true
enable_node_exporter = true
retention_period = "30d"

grafana_config = {
replicas = 1
ingress_enabled = true
ingress_host = "grafana.arc-xx.xx"
aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
lb_visibility = "internet-facing"
dashboard_list = [
{
name = "node-metrics"
json = templatefile("${path.module}/grafana-dashbord.json", {})
}
]
}

blackbox_exporter_config = {
name = "blackbox-exporter"
monitoring_targets = [{
name = "google"
url = "https://google.com"
scrape_interval = "60s"
status_code_pattern_list = "[http_2xx]" // Note :- This i string not list
}]
}

alertmanager_config = {
name = "alertmanager"
replica_count = 1
alert_rule_yaml = ""
}

}
}

```

## Requirements

| Name | Version |
|------|---------|
| [terraform](#requirement\_terraform) | >= 1.4, < 2.0.0 |
| [aws](#requirement\_aws) | >= 4.0, < 6.0 |
| [helm](#requirement\_helm) | 2.17.0 |
| [random](#requirement\_random) | ~> 3.6.0 |
| [tls](#requirement\_tls) | ~> 4.0.6 |

## Providers

No providers.

## Modules

| Name | Source | Version |
|------|--------|---------|
| [elasticsearch](#module\_elasticsearch) | ./modules/elasticsearch | n/a |
| [fluentbit](#module\_fluentbit) | ./modules/fluent-bit | n/a |
| [fluentd](#module\_fluentd) | ./modules/fluentd | n/a |
| [jaeger](#module\_jaeger) | ./modules/jaeger | n/a |
| [prometheus](#module\_prometheus) | ./modules/prometheus | n/a |
| [signoz](#module\_signoz) | ./modules/signoz | n/a |
| [signoz\_metrics\_logs](#module\_signoz\_metrics\_logs) | ./modules/signoz-infra | n/a |

## Resources

No resources.

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| [elasticsearch\_config](#input\_elasticsearch\_config) | Configuration settings for deploying Elasticsearch |

object({
name = optional(string, "elasticsearch-master") # Name of the Elasticsearch cluster
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, true)
})

tls_self_signed_cert_data = optional(object({ # Self-signed TLS certificate data
organisation = optional(string, null) # Organisation name for certificate
validity_period_hours = optional(number, 26280) # 3 years validity
early_renewal_hours = optional(number, 168) # 1 week early renewal
}))

cluster_config = object({
port = optional(string, "9200") # Elasticsearch HTTP port
transport_port = optional(string, "9300") # Elasticsearch transport port
user = optional(string, "elastic") # Elasticsearch username
log_level = optional(string, "INFO") # Log level (DEBUG, INFO, WARN, ERROR)
cpu_limit = optional(string, "2000m") # CPU limit for the Elasticsearch container
memory_limit = optional(string, "4Gi") # Memory limit for the Elasticsearch container
cpu_request = optional(string, "1000m") # CPU request for the Elasticsearch container
memory_request = optional(string, "2Gi") # Memory request for the Elasticsearch container
storage_class = optional(string, "gp2")
storage = optional(string, "30Gi") # Persistent volume storage for Elasticsearch
replica_count = optional(string, 3)
})

kibana_config = object({
name = optional(string, "kibana")
replica_count = optional(string, 3)
http_port = optional(string, "5601")
user = optional(string, "elastic")
log_level = optional(string, "info") // values include Options are all, fatal, error, warn, info, debug, trace, off
cpu_limit = optional(string, "500m")
memory_limit = optional(string, "1Gi")
cpu_request = optional(string, "250m")
memory_request = optional(string, "500Mi")
ingress_enabled = optional(bool, false)
ingress_host = optional(string, "")
aws_certificate_arn = optional(string, "")
lb_visibility = optional(string, "internet-facing")
})
})
|
{
"cluster_config": {
"cpu_limit": "2000m",
"cpu_request": "1000m",
"log_level": "INFO",
"memory_limit": "4Gi",
"memory_request": "2Gi",
"port": "9200",
"replica_count": 3,
"storage": "30Gi",
"transport_port": "9300",
"user": "elastic"
},
"k8s_namespace": {
"create": true,
"name": "logging"
},
"kibana_config": {
"cpu_limit": "500m",
"cpu_request": "250m",
"elasticsearch_url": "https://elasticsearch-master:9200",
"http_port": "5601",
"ingress_enabled": false,
"ingress_host": "",
"log_level": "info",
"memory_limit": "1Gi",
"memory_request": "500Mi",
"name": "kibana",
"user": "elastic"
},
"name": "elasticsearch-master",
"tls_self_signed_cert_data": {
"early_renewal_hours": 168,
"organisation": null,
"validity_period_hours": 26280
}
}
| no |
| [environment](#input\_environment) | Environment name | `string` | n/a | yes |
| [fluentbit\_config](#input\_fluentbit\_config) | Configuration for Fluentbit |
object({
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, false)
})
name = optional(string, "fluent-bit")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format
time_format = optional(string, "%Y-%m-%dT%H:%M:%S.%L")
log_level = optional(string, "info") # Default log level
aws_region = optional(string, "")
aws_role_arn = optional(string, "")
})
|
{
"cpu_limit": "100m",
"cpu_request": "100m",
"k8s_namespace": {
"create": false,
"name": "logging"
},
"logstash_dateformat": "%Y.%m.%d",
"memory_limit": "512Mi",
"memory_request": "128Mi",
"name": "fluent-bit",
"search_engine": "elasticsearch"
}
| no |
| [fluentd\_config](#input\_fluentd\_config) | Configuration for Fluentd |
object({
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, false)
})
name = optional(string, "fluentd")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format
log_level = optional(string, "info") # Default log level
opensearch_url = optional(string, "")
aws_region = optional(string, "")
aws_role_arn = optional(string, "")
})
|
{
"cpu_limit": "100m",
"cpu_request": "100m",
"k8s_namespace": {
"create": false,
"name": "logging"
},
"logstash_dateformat": "%Y.%m.%d",
"memory_limit": "512Mi",
"memory_request": "128Mi",
"name": "fluentd",
"search_engine": "elasticsearch"
}
| no |
| [log\_aggregator](#input\_log\_aggregator) | (optional) Log aggregator to choose | `string` | `null` | no |
| [metrics\_monitoring\_system](#input\_metrics\_monitoring\_system) | Monotoring system for metrics | `string` | `null` | no |
| [namespace](#input\_namespace) | Namespace for the resources. | `string` | n/a | yes |
| [prometheus\_config](#input\_prometheus\_config) | Configuration settings for deploying Prometheus |
object({
name = optional(string, "prometheus")
k8s_namespace = object({
name = optional(string, "metrics")
create = optional(bool, true)
})
log_level = optional(string, "info")
replica_count = optional(number, 1)
storage = optional(string, "8Gi")
storage_class = optional(string, "gp2")
enable_kube_state_metrics = optional(bool, true)
enable_node_exporter = optional(bool, true)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
retention_period = optional(string, "15d")

grafana_config = object({
name = optional(string, "grafana")
replica_count = optional(number, 1)
ingress_enabled = optional(bool, false)
lb_visibility = optional(string, "internet-facing") # Options: "internal" or "internet-facing"
aws_certificate_arn = optional(string, "")
ingress_host = optional(string, "")
admin_user = optional(string, "admin")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "128Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
dashboard_list = optional(list(object({
name = string
json = string
})), [])
})

blackbox_exporter_config = object({
name = optional(string, "blackbox-exporter")
replica_count = optional(number, 1)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "500Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "50Mi")
monitoring_targets = list(object({
name = string # Target name (e.g., google)
url = string # URL to monitor (e.g., https://google.com)
scrape_interval = optional(string, "60s") # Scrape interval (e.g., 60s)
scrape_timeout = optional(string, "60s") # Scrape timeout (e.g., 60s)
status_code_pattern_list = optional(string, "[http_2xx]") # Blackbox module to use (e.g., http_2xx)
}))
})

alertmanager_config = object({
name = optional(string, "alertmanager")
replica_count = optional(number, 1)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "128Mi")
cpu_request = optional(string, "10m")
memory_request = optional(string, "32Mi")
custom_alerts = optional(string, "")
alert_notification_settings = optional(string, "")
})
})
|
{
"alertmanager_config": {
"name": "alertmanager"
},
"blackbox_exporter_config": {
"monitoring_targets": [],
"name": "blackbox-exporter"
},
"enable_kube_state_metrics": true,
"enable_node_exporter": true,
"grafana_config": {
"admin_user": "admin",
"ingress_enabled": false,
"lb_visibility": "internet-facing",
"prometheus_endpoint": "prometheus"
},
"k8s_namespace": {
"create": true,
"name": "metrics"
},
"log_level": "info",
"replica_count": 1,
"resources": {
"cpu_limit": "100m",
"cpu_request": "100m",
"memory_limit": "512Mi",
"memory_request": "128Mi"
},
"retention_period": "15d",
"storage": "8Gi"
}
| no |
| [search\_engine](#input\_search\_engine) | (optional) Search engine for logs | `string` | `null` | no |
| [signoz\_config](#input\_signoz\_config) | Configuration for observability components in the monitoring stack. This variable encapsulates
settings for the following components:

- ClickHouse:
Used as the backend storage engine for observability data (like traces and metrics).
Includes credentials and resource limits/requests for tuning performance.

- SigNoz:
Provides the UI and analytics for monitoring and tracing applications.
Includes ingress setup and compute resource configuration.

- Alertmanager:
Handles alerting rules and notifications for monitoring data.
Includes configuration for storage, scaling, and ingress settings.

- OTEL Collector:
Collects telemetry data (logs, metrics, traces) from the applications and
routes it to appropriate backends.
Includes resource definitions and optional ingress configuration.

This structure enables centralized management of observability stack deployment in Kubernetes
via Terraform. |
object({
k8s_namespace = object({
name = optional(string, "signoz")
create = optional(bool, false)
})
name = optional(string, "signoz")
storage_class = optional(string, "gp3")
cluster_name = string
clickhouse = optional(object({
user = optional(string, "admin")
cpu_limit = optional(string, "2000m")
memory_limit = optional(string, "4Gi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
storage = optional(string, "20Gi")
}))

signoz_bin = optional(object({
replica_count = optional(number, 1)
cpu_limit = optional(string, "750m")
memory_limit = optional(string, "1000Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
ingress_enabled = optional(bool, false)
aws_certificate_arn = optional(string, null)
domain = string
root_domain = optional(string, null) // if root domain is provided, it creates DNS record
lb_visibility = optional(string, "internet-facing") # Options: "internal" or "internet-facing"
}))

alertmanager = optional(object({
enable = optional(bool, false)
replica_count = optional(number, 1)
cpu_limit = optional(string, "750m")
memory_limit = optional(string, "1000Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
storage = optional(string, "100Mi")
enable_ingress = optional(bool, false)
aws_certificate_arn = optional(string, null)
domain = optional(string, "signoz.example.com")
}))

otel_collector = optional(object({
cpu_limit = optional(string, "1")
memory_limit = optional(string, "2Gi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
storage = optional(string, "100Mi")
enable_ingress = optional(bool, false)
aws_certificate_arn = optional(string, null)
domain = optional(string, "signoz.example.com")
}))
})
|
{
"cluster_name": null,
"k8s_namespace": {
"create": true,
"name": "signoz"
},
"name": null
}
| no |
| [signoz\_infra\_monitor\_config](#input\_signoz\_infra\_monitor\_config) | Configuration object for deploying SigNoz infrastructure monitoring components.

Attributes:
- name: A name identifier for the monitoring deployment (used in naming resources).
- storage\_class: (Optional) The Kubernetes storage class to be used for persistent volumes. Defaults to "gp3".
- cluster\_name: The name of the Kubernetes cluster where SigNoz is being deployed.
- otel\_collector\_endpoint: The endpoint URL for the OpenTelemetry Collector to which metrics, logs, and traces will be exported.
- metric\_collection\_interval: (Optional) The interval at which metrics are collected. Defaults to "30s".
- if any one ofr the values enable\_log\_collection,enable\_metrics\_collection is true, then helm chart gets installed

This variable is used to centralize configuration related to monitoring infrastructure via SigNoz. |
object({
k8s_namespace = optional(object({
name = optional(string, "signoz")
create = optional(bool, false)
}))
name = string
storage_class = optional(string, "gp3")
cluster_name = string
enable_log_collection = optional(bool, false)
enable_metrics_collection = optional(bool, false)
otel_collector_endpoint = optional(string, null)
metric_collection_interval = optional(string, "30s")
})
|
{
"cluster_name": null,
"name": null
}
| no |
| [tags](#input\_tags) | (optional) Tags for AWS resources | `map(string)` | `{}` | no |
| [tracing\_stack](#input\_tracing\_stack) | (optional) Distributed tracing stack | `string` | `null` | no |

## Outputs

| Name | Description |
|------|-------------|
| [grafana\_lb\_dns](#output\_grafana\_lb\_dns) | Grafana ingress loadbalancer DNS |
| [kibana\_lb\_dns](#output\_kibana\_lb\_dns) | Kibana ingress loadbalancer DNS |
| [otel\_collector\_endpoint](#output\_otel\_collector\_endpoint) | OTEL collector endpoint |
| [signoz\_lb\_dns](#output\_signoz\_lb\_dns) | Signoz ingress loadbalancer DNS |

## Development

### Prerequisites
- [terraform](https://learn.hashicorp.com/terraform/getting-started/install#installing-terraform)
- [terraform-docs](https://github.com/segmentio/terraform-docs)
- [pre-commit](https://pre-commit.com/#install)
- [golang](https://golang.org/doc/install#install)
- [golint](https://github.com/golang/lint#installation)

### Configurations
- Configure pre-commit hooks
```sh
pre-commit install
```
- Configure golang deps for tests
```sh
go get github.com/gruntwork-io/terratest/modules/terraform
go get github.com/stretchr/testify/assert
```
### Git commits

while Contributing or doing git commit please specify the breaking change in your commit message whether its major,minor or patch

For Example

```sh
git commit -m "your commit message #major"
```
By specifying this , it will bump the version and if you dont specify this in your commit message then by default it will consider patch and will bump that accordingly

### Tests
- Tests are available in `test` directory
- In the test directory, run the below command
```sh
go test -timeout 1800s
```

## Authors
This project is authored by:
- SourceFuse