{"id":44029568,"url":"https://github.com/ikramhasan/lgtm-stack","last_synced_at":"2026-02-07T18:39:00.874Z","repository":{"id":329728518,"uuid":"1120268792","full_name":"ikramhasan/lgtm-stack","owner":"ikramhasan","description":"An observability stack based on OpenTelemetry (OTel) and the Grafana \"LGTM\" suite, with an example app provided to demonstrate orchastration.","archived":false,"fork":false,"pushed_at":"2025-12-21T12:21:11.000Z","size":42,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-23T03:43:17.572Z","etag":null,"topics":["fastapi","grafana","grafana-dashboard","lgtm","lgtm-stack","loki","mimir","minio","observability","python","tempo"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ikramhasan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-20T20:56:03.000Z","updated_at":"2025-12-21T12:00:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ikramhasan/lgtm-stack","commit_stats":null,"previous_names":["ikramhasan/lgtm-stack"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ikramhasan/lgtm-stack","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikramhasan%2Flgtm-stack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikramhasan%2Flgtm-stack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikramhasan%2Flgtm-stack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikramhasan%2Flgtm-stack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ikramhasan","download_url":"https://codeload.github.com/ikramhasan/lgtm-stack/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikramhasan%2Flgtm-stack/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29203792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T17:44:10.191Z","status":"ssl_error","status_checked_at":"2026-02-07T17:44:07.936Z","response_time":63,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","grafana","grafana-dashboard","lgtm","lgtm-stack","loki","mimir","minio","observability","python","tempo"],"created_at":"2026-02-07T18:39:00.119Z","updated_at":"2026-02-07T18:39:00.868Z","avatar_url":"https://github.com/ikramhasan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LGTM Stack POC (Loki, Grafana, Tempo, Mimir)\n\nAn observability stack based on OpenTelemetry (OTel) and the Grafana \"LGTM\" suite.\n\n## Architecture\n\n- **Loki**: Log aggregation.\n- **Grafana**: Visualization and dashboards.\n- **Tempo**: Distributed tracing.\n- **Mimir**: Scalable Long-term storage for Prometheus metrics.\n- **OTel Collector**: Central gateway for receiving and routing telemetry data.\n- **MinIO/S3**: Object storage backend for long-term data retention.\n\n## Quick Start\n\n### 1. Prerequisites\n- Docker and Docker Compose.\n- [uv](https://github.com/astral-sh/uv) (for running the example app).\n\n### 2. Environment Setup\nCopy the example environment file and adjust if necessary:\n```bash\ncp .env.example .env\n```\n\n### 2.1 MinIO Setup (Optional)\n\nFollow the MinIO setup instructions below if you want to use MinIO for local development.\n\n### 3. Start the Stack\n```bash\ndocker-compose up -d\n```\nThis starts Loki, Tempo, Mimir, Grafana, the OTel Collector, and a local MinIO instance.\n\n### 4. Run the Example Application\n```bash\ncd example/fastapi-app\nuv sync\nuv run python main.py\n```\nTrigger some data by visiting `http://localhost:8000/process`.\n\n## Configuration: MinIO vs. AWS S3\n\nThe stack is currently configured to use **MinIO** for local development.\n\n### Using Local MinIO (Default)\n\nIn your `.env` file:\n```env\nS3_ENDPOINT=host.docker.internal:9000\nS3_INSECURE=true\nS3_FORCE_PATH_STYLE=true\nAWS_ACCESS_KEY_ID=minioadmin\nAWS_SECRET_ACCESS_KEY=minioadmin\n```\n\nRun the container `docker compose up -d` from `example/minio` to start the MinIO instance.\n\nThe MinIO instance is running at `http://localhost:9000` with the default credentials `minioadmin/minioadmin`.\n\nGo to the MinIO dashboard, and create the buckets `loki-logs`, `tempo-traces`, and `mimir-metrics`.\n\n### Using AWS S3\nTo switch to production AWS S3:\n1. Update `.env`:\n   - `S3_ENDPOINT`: `s3.us-east-1.amazonaws.com` (or your region's endpoint).\n   - `S3_INSECURE`: `false`.\n   - `S3_FORCE_PATH_STYLE`: `false`.\n   - `AWS_ACCESS_KEY_ID` \u0026 `AWS_SECRET_ACCESS_KEY`: Your AWS credentials.\n2. Ensure the buckets (`loki-logs`, `tempo-traces`, `mimir-metrics`) exist in your AWS account or update the bucket name variables in `.env`.\n\n### Mimir Metrics \u0026 Dashboards\n\nUse the following table to set up your primary observability dashboard. These metrics are exported by the FastAPI application.\n\n| Panel Name | Visualization | Query (PromQL) | Description |\n| :--- | :--- | :--- | :--- |\n| **Total Request Rate** | Time series | `sum(rate(http_requests_total[$__rate_interval])) by (http_target)` | Real-time traffic per endpoint (Requests/sec). |\n| **Error Rate (%)** | Stat | `sum(rate(http_errors_total[$__range])) / sum(rate(http_requests_total[$__range]))` | Percentage of requests resulting in 4xx/5xx errors over the selected time range. |\n| **P95 Latency** | Time series | `histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[$__rate_interval])) by (le))` | 95th percentile response time for all endpoints. |\n| **Active Requests** | Gauge | `sum(http_server_active_requests)` | Number of concurrent requests being processed. |\n| **Errors by Endpoint** | Bar chart | `sum(increase(http_errors_total[$__range])) by (http_target)` | Total errors grouped by path over the selected time range. |\n| **Top 5 Slowest Paths** | Table | `topk(5, sum(rate(http_request_duration_seconds_sum[$__range])) by (http_target) / sum(rate(http_request_duration_seconds_count[$__range])) by (http_target))` | List of endpoints with the highest average latency. |\n\n#### How to Add a Panel\n1. Click **+ Add** in the top right of your dashboard -\u003e **Visualization**.\n2. Select **Mimir** as the data source.\n3. Paste the **Query** from the table above.\n4. Set the **Title** to the Panel Name.\n5. Select the **Visualization** type from the right sidebar.\n6. Click **Save** or **Apply**.\n\n### Loki Logs \u0026 Analysis\n\nLoki allows you to query logs using **LogQL**. The stack is configured to automatically label logs with metadata like `service_name` and `deployment_environment`.\n\n#### Key Queries\n| Panel Name | Visualization | Query (LogQL) | Description |\n| :--- | :--- | :--- | :--- |\n| **Application Logs** | Logs | `{service_name=\"fastapi-service\"}` | Live stream of all logs from the FastAPI app. |\n| **Error Log Stream** | Logs | `{service_name=\"fastapi-service\"} \\|= \"error\"` | Filtered stream showing only lines containing \"error\" (case-insensitive). |\n| **Log Volume** | Time series | `count_over_time({service_name=\"fastapi-service\"}[$__interval])` | Bar chart showing the number of log lines produced per interval. |\n| **Severity Distribution** | Pie chart | `sum by (level) (count_over_time({service_name=\"fastapi-service\"}[$__range]))` | Breakdown of log levels (INFO, ERROR, WARN) for the selected time range. |\n| **Error Frequency** | Time series | `count_over_time({service_name=\"fastapi-service\"} \\|= \"error\" [$__interval])` | Specifically tracks the rate of error-level logs. |\n\n#### How to Add a Log Panel\n1. Click **+ Add** -\u003e **Visualization**.\n2. Select **Loki** as the data source.\n3. Paste one of the **Queries** above.\n4. Select the matching **Visualization** type from the right sidebar.\n\n#### Trace Correlation (Loki -\u003e Tempo)\nWhen viewing logs in the **Explore** tab or a **Logs** panel:\n1. Click on a log line to expand it.\n2. Look for the `trace_id` field.\n3. Click the **Tempo** button next to the ID to instantly see the full distributed trace for that specific log entry.\n\n### Advanced Observability Patterns\n\nBeyond basic metrics, you can leverage the full power of the LGTM stack with these advanced patterns:\n\n| Pattern / Metric | Visualization | Query | Description |\n| :--- | :--- | :--- | :--- |\n| **RED: Rate** | Time series | `sum(rate(http_requests_total[$__rate_interval]))` | Request rate per second. |\n| **RED: Errors** | Time series | `sum(rate(http_errors_total[$__rate_interval]))` | Error rate per second. |\n| **RED: Duration** | Time series | `histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[$__rate_interval])) by (le))` | 90th percentile response time. |\n| **Latency Heatmap** | Heatmap | `sum(rate(http_request_duration_seconds_bucket[$__rate_interval])) by (le)` | Visual distribution of latency buckets. |\n| **Log Severity** | Time series / Bar gauge | `sum by (level) (count_over_time({service_name=\"fastapi-service\"} [$__range]))` | Monitor log health by severity over time. |\n| **Apdex Score** | Stat | `(sum(rate(http_request_duration_seconds_bucket{le=\"0.5\"}[$__range])) + sum(rate(http_request_duration_seconds_bucket{le=\"1.0\"}[$__range])) / 2) / sum(rate(http_request_duration_seconds_count[$__range]))` | Single score (0-1) for user satisfaction. |\n| **Resource Grouping** | Time series | `sum(rate(http_requests_total[$__rate_interval])) by (service_version, deployment_environment)` | Compare performance across versions/environments. |\n\n\u003e [!TIP]\n\u003e Update the `fastapi-service` service name to your application name.\n\u003e - **Dynamic Time Ranges**: Instead of hardcoding `[5m]`, use Grafana global variables:\n\u003e - **`[$__range]`**: Adjusts to the exact time period selected in the dashboard picker (e.g., Last 1 hour). Use this for total counts (with `increase()`) or \"Stat\" panels.\n\u003e - **`[$__rate_interval]`**: Automatically calculates the best interval for `rate()` based on the graph's time range and resolution. Use this for Time series graphs.\n\n## Debugging Tips\n\n- **Unhealthy Ring**: If Mimir/Loki report ring issues, ensure `replication_factor` is set to `1` in the YAML configs for single-node setups.\n- **Log Ingestion**: Check the OTel Collector logs (`docker logs otel-collector`) to see if data is being received and exported correctly.\n- **S3 Connectivity**: Ensure the S3 endpoint is reachable from *within* the Docker containers. On MacOS, `host.docker.internal` is used to reach the host's port 9000.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fikramhasan%2Flgtm-stack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fikramhasan%2Flgtm-stack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fikramhasan%2Flgtm-stack/lists"}