{"id":25015057,"url":"https://github.com/daviaraujocc/lgtm-stack","last_synced_at":"2025-04-24T03:03:32.538Z","repository":{"id":277927960,"uuid":"929095472","full_name":"daviaraujocc/lgtm-stack","owner":"daviaraujocc","description":"A step-by-step to install LGTM Stack + Opentelemetry Collector","archived":false,"fork":false,"pushed_at":"2025-03-12T17:44:22.000Z","size":350,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-24T03:03:20.915Z","etag":null,"topics":["gcp","grafana","helm","kubernetes","lgtm","mimir","opentelemetry","tempo"],"latest_commit_sha":null,"homepage":"","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daviaraujocc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-07T19:53:38.000Z","updated_at":"2025-04-23T09:13:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"bdf42552-5e3e-4012-a0b8-b2b3ac2db6dc","html_url":"https://github.com/daviaraujocc/lgtm-stack","commit_stats":null,"previous_names":["daviaraujocc/lgtm-stack"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daviaraujocc%2Flgtm-stack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daviaraujocc%2Flgtm-stack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daviaraujocc%2Flgtm-stack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daviaraujocc%2Flgtm-stack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daviaraujocc","download_url":"https://codeload.github.com/daviaraujocc/lgtm-stack/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250552073,"owners_count":21449164,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gcp","grafana","helm","kubernetes","lgtm","mimir","opentelemetry","tempo"],"created_at":"2025-02-05T08:17:00.064Z","updated_at":"2025-04-24T03:03:32.518Z","avatar_url":"https://github.com/daviaraujocc.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cbr\u003e\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n    \u003ca href=\"README.md\"\u003e🇺🇸 English\u003c/a\u003e | \u003ca href=\"README.pt-br.md\"\u003e🇧🇷 Português (Brasil)\u003c/a\u003e\r\n\u003c/div\u003e\r\n\u003cbr\u003e\r\n\r\n# 🔍 LGTM Stack for Kubernetes\r\n\r\n## Introduction\r\n\r\nThe LGTM stack, by Grafana Labs, combines best-in-class open-source tools to provide comprehensive system visibility, consisting of:\r\n\r\n- **Loki**: Log aggregation system https://grafana.com/oss/loki/\r\n- **Grafana**: Interface \u0026 Dashboards https://grafana.com/oss/grafana/\r\n- **Tempo**: Distributed tracing storage and management https://grafana.com/oss/tempo/\r\n- **Mimir**: Long-term metrics storage for Prometheus https://grafana.com/oss/mimir/\r\n\r\n\r\nWith this stack, we have a complete observability solution that covers logs, metrics, and traces, with support for high availability and scalability, plus all data will be present in a single location (grafana), making it easier to analyze and correlate events, and by using object storage as a backend, the solution becomes much more economical compared to others that require dedicated databases or persistent disks.\r\n\r\n## Table of Contents\r\n\u003cdetails\u003e\r\n\u003csummary\u003eClick to expand\u003c/summary\u003e\r\n\r\n- [Architecture](#architecture)\r\n  - [Hardware Requirements](#hardware-requirements)\r\n- [Getting Started](#-getting-started)\r\n  - [Prerequisites](#-prerequisites)\r\n  - [Installation](#installation)\r\n    - [Option 1: Makefile](#option-1-makefile)\r\n    - [Option 2: Manual Installation](#option-2-manual-installation)\r\n      - [Setup](#setup)\r\n      - [Choose Your Environment](#choose-your-environment)\r\n        - [Local](#local-k3s-minikube)\r\n        - [GCP Production Setup](#gcp-production-setup)\r\n- [Install Dependencies](#install-dependencies)\r\n- [Testing](#testing)\r\n  - [Access Grafana](#access-grafana)\r\n  - [Sending Data](#sending-data)\r\n    - [Loki (Logs)](#loki-logs)\r\n    - [Tempo (Traces)](#tempo-traces)\r\n    - [Mimir (Metrics)](#mimir-metrics)\r\n- [OpenTelemetry](#opentelemetry)\r\n  - [OpenTelemetry Collector](#opentelemetry-collector)\r\n  - [Flask App Integration](#flask-app-integration)\r\n  - [Testing the Integration](#testing-the-integration)\r\n  - [Extra Configuration](#extra-configuration)\r\n    - [Loki Labels Customization](#loki-labels-customization)\r\n- [Uninstall](#uninstall)\r\n\u003c/details\u003e\r\n\r\n## Architecture\r\n\r\n![LGTM Architecture](./assets/images/lgtm.jpg)\r\n\r\nThe architecture of the LGTM stack in a Kubernetes environment follows a well-defined flow of data collection, processing, and visualization:\r\n\r\n1. Applications send telemetry data to an agent, in this case, the OpenTelemetry Collector.\r\n\r\n2. OpenTelemetry Collector acts as a central hub, routing each type of data to its specific backend:\r\n* Loki: for log processing\r\n* Mimir: for metrics storage\r\n* Tempo: for trace analysis\r\n3. Data is stored in an Object Storage, with dedicated buckets for each tool.\r\n\r\n4. Grafana is the interface where all data is queried, allowing for unified dashboards and alerts.\r\n\r\nAlso this architecture includes three optional components:\r\n- Prometheus: collects custom metrics from apps and cluster and sends to Mimir\r\n- Kube-state-metrics: collects metrics (CPU/Memory) of services/apps through the API server and outputs to Prometheus\r\n- Promtail: agent that captures container logs and sends to Loki\r\n\r\n### Hardware Requirements\r\n\r\nLocal:\r\n- 2-4 CPUs\r\n- 8 GB RAM\r\n\r\nProduction setup:\r\n- Can vary a lot depending on the amount of data and traffic, it's recommended to start with a small setup and scale as needed, for small-mid environments the following is recommended (minimum):\r\n  - 8 CPUs\r\n  - 24 GB RAM\r\n  - 100 GB disk space (SSD, don't count for storage backends)\r\n\r\n\r\n## 🚀 Getting Started\r\n\r\n### ✨ Prerequisites\r\n- [Helm v3+](https://helm.sh/docs/intro/install/)\r\n- [kubectl](https://kubernetes.io/docs/tasks/tools/)\r\n  - For local testing: [k3s](https://k3s.io/) or [minikube](https://minikube.sigs.k8s.io/docs/start/) kubernetes cluster configured\r\n- For GCP: [gcloud CLI](https://cloud.google.com/sdk/docs/install)\r\n\r\n\u003e **Note**: This guide uses the official [lgtm-distributed](https://artifacthub.io/packages/helm/grafana/lgtm-distributed) Helm chart from Grafana Labs for deployment.\r\n\r\n### Installation\r\n\r\n### Option 1: Makefile\r\n\r\nTo simplify the installation process, you can use the Makefile commands:\r\n\r\n```bash\r\n# Clone repository\r\ngit clone git@github.com:daviaraujocc/lgtm-stack.git\r\ncd lgtm-stack\r\nmake install-local # For local testing, for using GCP cloud storage use make install-gcp and set PROJECT_ID\r\n```\r\n\r\nThis will install the LGTM stack with the default configuration for local with the dependencies (promtail, dashboards, prometheus, MiniO). If you want to customize the installation, you can edit the `helm/values-lgtm.local.yaml` file.\r\n\r\n### Option 2: Manual Installation\r\n\r\n### Setup\r\n```bash\r\n# Clone repository\r\ngit clone git@github.com:daviaraujocc/lgtm-stack.git\r\ncd lgtm-stack\r\n\r\n# Add repositories \u0026 create namespace\r\nhelm repo add prometheus-community https://prometheus-community.github.io/helm-charts\r\nhelm repo add grafana https://grafana.github.io/helm-charts\r\nhelm repo update\r\nkubectl create ns monitoring\r\n\r\n# Install prometheus operator for metrics collection and CRDs\r\nhelm install prometheus-operator --version 66.3.1 -n monitoring \\\r\n  prometheus-community/kube-prometheus-stack -f helm/values-prometheus.yaml\r\n```\r\n\r\n### Choose Your Environment\r\n\r\n#### Local (k3s, minikube)\r\n\r\nFor local testing scenarios. Uses local storage via MinIO.\r\n\r\n```bash\r\nhelm install lgtm --version 2.1.0 -n monitoring \\\r\n  grafana/lgtm-distributed -f helm/values-lgtm.local.yaml\r\n```\r\n\r\n#### GCP Production Setup\r\n\r\nFor production environments, using GCP resources for storage and monitoring.\r\n\r\n1. Set up GCP resources:\r\n\r\n```bash\r\n# Set your project ID\r\nexport PROJECT_ID=your-project-id\r\n\r\n# Create buckets with random suffix\r\nexport BUCKET_SUFFIX=$(openssl rand -hex 4 | tr -d \"\\n\")\r\nfor bucket in logs traces metrics metrics-admin; do\r\n  gsutil mb -p ${PROJECT_ID} -c standard -l us-east1 gs://lgtm-${bucket}-${BUCKET_SUFFIX}\r\ndone\r\n\r\n# Update bucket names in config\r\nsed -i -E \"s/(bucket_name:\\s*lgtm-[^[:space:]]+)/\\1-${BUCKET_SUFFIX}/g\" helm/values-lgtm.gcp.yaml\r\n\r\n# Create and configure service account\r\ngcloud iam service-accounts create lgtm-monitoring \\\r\n    --display-name \"LGTM Monitoring\" \\\r\n    --project ${PROJECT_ID}\r\n\r\n# Set permissions\r\nfor bucket in logs traces metrics metrics-admin; do \r\n  gsutil iam ch serviceAccount:lgtm-monitoring@${PROJECT_ID}.iam.gserviceaccount.com:admin \\\r\n    gs://lgtm-${bucket}-${BUCKET_SUFFIX}\r\ndone\r\n\r\n# Create service account key and secret\r\ngcloud iam service-accounts keys create key.json \\\r\n    --iam-account lgtm-monitoring@${PROJECT_ID}.iam.gserviceaccount.com\r\nkubectl create secret generic lgtm-sa --from-file=key.json -n monitoring\r\n```\r\n\r\n2. Install LGTM stack:\r\n\r\n\r\nYou can change values in `helm/values-lgtm.gcp.yaml` to fit your environment if you want like ingress for grafana, etc.\r\n\r\n```bash\r\nhelm install lgtm --version 2.1.0 -n monitoring \\\r\n  grafana/lgtm-distributed -f helm/values-lgtm.gcp.yaml\r\n```\r\n\r\n## Install dependencies \r\n\r\n\r\n```bash\r\n# Install Promtail for collecting container logs\r\n# Check if you are using Docker or CRI-O runtime\r\n## Docker runtime\r\nkubectl apply -f manifests/promtail.docker.yaml\r\n## CRI-O runtime \r\n## kubectl apply -f manifests/promtail.cri.yaml\r\n```\r\n\r\n\r\n## Testing\r\n\r\nAfter installation you can check components by running:\r\n\r\n```bash\r\n# Check if all pods are running\r\nkubectl get pods -n monitoring\r\n\r\n# To check logs\r\n\r\n# Loki\r\nkubectl logs -l app.kubernetes.io/name=loki -n monitoring\r\n\r\n# Tempo\r\nkubectl logs -l app.kubernetes.io/name=tempo -n monitoring\r\n\r\n# Mimir\r\nkubectl logs -l app.kubernetes.io/name=mimir -n monitoring\r\n```\r\n\r\nFollow the steps below to test each component:\r\n\r\n### Access Grafana\r\n```bash\r\n# Access dashboard\r\nkubectl port-forward svc/lgtm-grafana 3000:80 -n monitoring\r\n\r\n# Get password credentials\r\nkubectl get secret --namespace monitoring lgtm-grafana -o jsonpath=\"{.data.admin-password}\" | base64 --decode\r\n```\r\n- Default username: `admin`\r\n- Access URL: http://localhost:3000\r\n- Check default Grafana dashboards and Explore tab\r\n\r\n### Sending Data\r\n\r\nAfter installation, verify each component is working correctly:\r\n\r\n#### Loki (Logs)\r\nTest log ingestion and querying:\r\n\r\n```bash\r\n# Forward Loki port\r\nkubectl port-forward svc/lgtm-loki-distributor 3100:3100 -n monitoring\r\n\r\n# Send test log with timestamp and labels\r\ncurl -XPOST http://localhost:3100/loki/api/v1/push -H \"Content-Type: application/json\" -d '{\r\n  \"streams\": [{\r\n    \"stream\": { \"app\": \"test\", \"level\": \"info\" },\r\n    \"values\": [[ \"'$(date +%s)000000000'\", \"Test log message\" ]]\r\n  }]\r\n}'\r\n```\r\n\r\nTo verify:\r\n1. Open Grafana (http://localhost:3000)\r\n2. Go to Explore \u003e Select Loki datasource\r\n3. Query using labels: `{app=\"test\", level=\"info\"}`\r\n4. You should see your test message in the results\r\n\r\n\r\nIf you have installed promtail you can check the container logs also on Explore tab.\r\n\r\n#### Tempo (Traces)\r\n\r\nSince Tempo is compatible with the OpenTelemetry OTLP protocol, we will use the Jaeger Trace Generator, a tool that generates example traces and sends the data using OTLP.\r\n\r\n```bash\r\n# Forward Tempo port\r\nkubectl port-forward svc/lgtm-tempo-distributor 4318:4318 -n monitoring\r\n\r\n# Generate sample traces with service name 'test'\r\ndocker run --add-host=host.docker.internal:host-gateway --env=OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4318 jaegertracing/jaeger-tracegen -service test -traces 10\r\n```\r\n\r\nTo verify:\r\n1. Go to Explore \u003e Select Tempo datasource\r\n2. Search by Service Name: 'test'\r\n3. You should see 10 traces with different spans\r\n\r\n#### Mimir (Metrics)\r\n\r\nSince we have a Prometheus instance running inside the cluster sending basic metrics (CPU/Memory) to Mimir, you can already check the metrics in Grafana:\r\n\r\n1. Access Grafana\r\n2. Go to Explore \u003e Select Mimir datasource\r\n3. Try these example queries:\r\n   - `rate(container_cpu_usage_seconds_total[5m])` - CPU usage\r\n   - `container_memory_usage_bytes` - Container memory usage\r\n\r\nYou can also push custom metrics to Mimir using Prometheus Pushgateway, to endpoint `http://lgtm-mimir-nginx.monitoring:80/api/v1/push`.\r\n\r\n\r\n## OpenTelemetry\r\n\r\nOpenTelemetry is a set of APIs, libraries, agents, and instrumentation to provide observability for cloud-native software. It consists of three main components:\r\n\r\n- **OpenTelemetry SDK**: Libraries for instrumenting applications to collect telemetry data (traces, metrics, logs).\r\n- **OpenTelemetry Collector**: A vendor-agnostic agent that collects, processes, and exports telemetry data to backends.\r\n- **OpenTelemetry Protocol (OTLP)**: A standard for telemetry data exchange between applications and backends.\r\n\r\nIn this setup, we will use the OpenTelemetry Collector to route telemetry data to the appropriate backends (Loki, Tempo, Mimir).\r\n\r\n### OpenTelemetry Collector\r\n\r\nTo install the OpenTelemetry Collector:\r\n\r\n```bash\r\n# Install OpenTelemetry Collector\r\nkubectl apply -f manifests/otel-collector.yaml\r\n```\r\n\r\nCheck if the collector is up and running:\r\n\r\n```bash\r\nkubectl get pods -l app=otel-collector\r\nkubectl logs -l app=otel-collector\r\n```\r\n\r\n### Flask App Integration\r\n\r\nWe'll use a pre-instrumented Flask application (source code at `flask-app/`) that generates traces, metrics, and logs using OpenTelemetry.\r\n\r\nThe application exposes an endpoint `/random` that returns random numbers and generates telemetry data. The default endpoint used for sending telemetry data will be `http://otel-collector:4318`.\r\n\r\n\r\n1. Deploy the sample application:\r\n```bash\r\n# Deploy sample app\r\nkubectl apply -f manifests/app/flask-app.yaml\r\n```\r\n\r\n2. Verify application deployment:\r\n```bash\r\nkubectl get pods -l app=flask-app \r\nkubectl get svc flask-app-service \r\n```\r\n\r\n3. Apply PodMonitor for metrics scraping:\r\n```bash\r\nkubectl apply -f manifests/app/podmonitor.yaml\r\n```\r\n\r\n### Testing the integration\r\n\r\n1. Generate traffic to the application:\r\n```bash\r\n# Get the application URL\r\n# Port-forward the application\r\nkubectl port-forward svc/flask-app 8000:8000 -n monitoring\r\n\r\n# Send requests to generate telemetry data\r\nfor i in {1..50}; do\r\n  curl http://localhost:8000/random\r\n  sleep 0.5\r\ndone\r\n```\r\n\r\n2. Check the generated telemetry data in Grafana:\r\n\r\n**Traces (Tempo):**\r\n\r\n1. Go to Explore \u003e Select Tempo datasource\r\n\r\n2. Search for Service Name: flask-app\r\n\r\n3. You should see traces with GET /random operations\r\n\r\n**Metrics (Mimir):**\r\n\r\n1. Go to Explore \u003e Select Mimir datasource\r\n\r\n2. Try these queries:\r\n```promql\r\n# Total requests count\r\nrate(request_count_total[5m])\r\n```\r\n\r\n**Logs (Loki):**\r\n\r\n1. Go to Explore \u003e Select Loki datasource\r\n\r\n2. Query using labels:\r\n\r\n```logql\r\n{job=\"flask-app\"}\r\n```\r\nYou should see structured logs from the application.\r\n\r\n#### Extra Configuration\r\n\r\n##### Loki Labels Customization\r\n\r\nIn case you have new labels you want to add to logs in Loki through the OpenTelemetry Collector, you need to perform the following configuration:\r\n\r\n1. Edit the ConfigMap `otel-collector-config`\r\n2. Locate the `processors.attributes/loki` section\r\n3. Add your custom labels to the `loki.attribute.labels` list:\r\n\r\n```yaml\r\nprocessors:\r\n  attributes/loki:\r\n    actions:\r\n      - action: insert\r\n        key: loki.format\r\n        value: raw\r\n      - action: insert\r\n        key: loki.attribute.labels\r\n        value: facility, level, source, host, app, namespace, pod, container, job, your_label\r\n```\r\n\r\n\u003e After modifying the ConfigMap, restart the collector pod to apply the changes:\r\n\u003e ```bash\r\n\u003e kubectl rollout restart daemonset/otel-collector -n monitoring\r\n\u003e ```\r\n\r\n## Uninstall\r\n\r\n```bash\r\n# Using Makefile\r\nmake uninstall\r\n\r\n# or manual\r\n\r\n# Remove LGTM stack\r\nhelm uninstall lgtm -n monitoring\r\n\r\n# Remove prometheus operator \r\nhelm uninstall prometheus-operator -n monitoring\r\n\r\n# Remove namespace\r\nkubectl delete ns monitoring\r\n\r\n# Remove promtail \u0026 otel-collector \r\nkubectl delete -f manifests/promtail.yaml\r\nkubectl delete -f manifests/otel-collector.yaml\r\n\r\n# For GCP setup, cleanup:\r\nfor bucket in logs traces metrics metrics-admin; do\r\n  gsutil rm -r gs://lgtm-${bucket}-${BUCKET_SUFFIX}\r\ndone\r\n\r\ngcloud iam service-accounts delete lgtm-monitoring@${PROJECT_ID}.iam.gserviceaccount.com\r\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaviaraujocc%2Flgtm-stack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaviaraujocc%2Flgtm-stack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaviaraujocc%2Flgtm-stack/lists"}