An open API service indexing awesome lists of open source software.

https://github.com/clickhouse/pg_stat_ch


https://github.com/clickhouse/pg_stat_ch

clickhouse metrics monitoring postgresql postgresql-extension

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

          

pg_stat_ch logo
pg_stat_ch logo

# pg_stat_ch: PostgreSQL Query Telemetry Exporter to ClickHouse

A PostgreSQL extension that captures per-query execution telemetry and exports it to ClickHouse in real-time. Unlike pg_stat_statements which aggregates statistics in PostgreSQL, pg_stat_ch exports **raw events** to ClickHouse where aggregation happens via ClickHouse's powerful analytical engine.

## Quickstart

Run local PostgreSQL + ClickHouse with schema preloaded:

```bash
./scripts/quickstart.sh up
./scripts/quickstart.sh check
```

Stop:

```bash
./scripts/quickstart.sh down
```

See [docker/quickstart/README.md](docker/quickstart/README.md) for endpoints and stack details.

## Table of Contents

- [Overview](#overview)
- [Architecture](#architecture)
- [Features](#features)
- [Supported Versions](#supported-versions)
- [Building from Source](#building-from-source)
- [Installation](#installation)
- [Recommended PostgreSQL Settings](#recommended-postgresql-settings)
- [Configuration](#configuration)
- [SQL API](#sql-api)
- [Example Queries](#example-queries)
- [Testing](#testing)
- [Troubleshooting](#troubleshooting)
- [License](#license)

## Overview

Dashboard showing query telemetry: expensive queries with percentiles, cache miss analysis, QPS over time, latency, shared buffer and WAL metrics

pg_stat_ch captures detailed telemetry for every query executed in PostgreSQL and exports it to ClickHouse via a single data pipeline:

```
PostgreSQL Hooks (foreground) → Shared Memory Queue → Background Worker → ClickHouse
```

**Key design principles:**
- **Zero network I/O on query path** - Events are queued in shared memory, not sent synchronously
- **Raw events, not aggregates** - All aggregation (p50/p95/p99, top queries, errors) happens in ClickHouse
- **Bounded memory** - Fixed-size ring buffer with overflow counters; dropped events don't block queries
- **Minimal overhead** - ~5μs p99 overhead per captured statement

## Architecture

### Data Flow

1. **PostgreSQL Hooks** capture query start/end with full instrumentation
2. **Shared Memory Ring Buffer** stores events (MPSC: multi-producer, single-consumer)
3. **Background Worker** dequeues batches and exports to ClickHouse
4. **ClickHouse** stores raw events in `events_raw` table; views/MVs provide aggregates

## Features

- **Full Query Telemetry**: Timing, row counts, buffer usage, WAL usage, CPU time
- **All Statement Types**: DML (SELECT/INSERT/UPDATE/DELETE/MERGE), DDL, utility statements
- **Error Capture**: SQLSTATE codes and error levels via emit_log_hook
- **JIT Instrumentation** (PG15+): Function count, generation/inlining/optimization/emission time
- **Parallel Worker Stats** (PG18+): Planned vs launched workers
- **Client Context**: Application name, client IP address
- **Query Text**: Captured with truncation (2KB max)
- **Graceful Degradation**: Queue overflow drops events with counters; ClickHouse unavailability doesn't block PostgreSQL

## Supported Versions

PostgreSQL 16, 17, and 18 are fully supported. See [docs/version-compatibility.md](docs/version-compatibility.md) for the feature matrix and version-specific fields.

## Building from Source

Prerequisites:

- CMake 3.16+, C++17 compiler (GCC 9+, Clang 10+)
- PostgreSQL 16+ development headers
- [mise](https://mise.jdx.dev/) (recommended) or manual PostgreSQL installation

[clickhouse-cpp](https://github.com/ClickHouse/clickhouse-cpp) is vendored as a submodule and statically linked.

```bash
git submodule update --init --recursive

# Using mise (recommended)
mise run build # Debug build (use build:16/17/18 for specific versions)
mise run build:release # Release build
mise run install # Install the extension

# Or manually
cmake -B build -G Ninja -DPG_CONFIG=/path/to/pg_config
cmake --build build && cmake --install build
```

## Installation

### 1. Configure PostgreSQL

Add to `postgresql.conf`:

```ini
shared_preload_libraries = 'pg_stat_ch'
track_io_timing = on # Enables I/O timing columns for captured events

# ClickHouse connection (change for your setup)
pg_stat_ch.clickhouse_host = 'localhost'
pg_stat_ch.clickhouse_port = 9000
pg_stat_ch.clickhouse_database = 'pg_stat_ch'

# TLS (recommended for production)
pg_stat_ch.clickhouse_use_tls = on
pg_stat_ch.clickhouse_skip_tls_verify = off
```

### 2. Restart PostgreSQL

```bash
# Restart PostgreSQL using your service manager (systemd, brew services, Docker, etc.)
```

### 3. Set Up Schema on ClickHouse

Quickstart path: run `./scripts/quickstart.sh up` and schema setup is handled automatically.

Manual / existing ClickHouse paths:

```bash
clickhouse-client < docker/init/00-schema.sql
```

Creates `events_raw` plus materialized views. See [docs/clickhouse.md](docs/clickhouse.md) for full schema details.

### 4. Create the Extension

```sql
CREATE EXTENSION pg_stat_ch;
```

### 5. Verify

```sql
SELECT pg_stat_ch_version();
SELECT * FROM pg_stat_ch_stats();
```

## Configuration

### GUC Variables

| Parameter | Type | Default | Reload | Description |
|-----------|------|---------|--------|-------------|
| `pg_stat_ch.enabled` | bool | `on` | SIGHUP | Enable/disable telemetry collection |
| `pg_stat_ch.clickhouse_host` | string | `localhost` | Restart | ClickHouse server hostname |
| `pg_stat_ch.clickhouse_port` | int | `9000` | Restart | ClickHouse native protocol port |
| `pg_stat_ch.clickhouse_user` | string | `default` | Restart | ClickHouse username |
| `pg_stat_ch.clickhouse_password` | string | `""` | Restart | ClickHouse password |
| `pg_stat_ch.clickhouse_database` | string | `pg_stat_ch` | Restart | ClickHouse database name |
| `pg_stat_ch.queue_capacity` | int | `65536` | Restart | Ring buffer size (must be power of 2) |
| `pg_stat_ch.flush_interval_ms` | int | `1000` | SIGHUP | Export batch interval in milliseconds |
| `pg_stat_ch.batch_max` | int | `10000` | SIGHUP | Maximum events per ClickHouse insert |
| `pg_stat_ch.clickhouse_use_tls` | bool | `off` | Restart | Enable TLS for ClickHouse connections |
| `pg_stat_ch.clickhouse_skip_tls_verify` | bool | `off` | Restart | Skip TLS certificate verification (insecure) |
| `pg_stat_ch.log_min_elevel` | enum | `warning` | Superuser | Minimum error level to capture (debug5..panic) |

See [Error Level Values](docs/version-compatibility.md#error-fields) for the complete list of error levels and their numeric values in ClickHouse.

## SQL API

| Function | Description |
|----------|-------------|
| `pg_stat_ch_version()` | Returns extension version string |
| `pg_stat_ch_stats()` | Queue and exporter statistics ([column details](docs/troubleshooting.md#sql-api-reference)) |
| `pg_stat_ch_reset()` | Reset all queue counters to zero |
| `pg_stat_ch_flush()` | Trigger immediate flush of queued events to ClickHouse |

## Example Queries

Raw per-execution events enable percentiles, time-series, and per-app drill-downs:

```sql
-- Slowest queries for a specific application, with percentiles
SELECT query_id,
count() AS calls,
quantile(0.95)(duration_us) / 1000 AS p95_ms,
quantile(0.99)(duration_us) / 1000 AS p99_ms
FROM pg_stat_ch.events_raw
WHERE app = 'myapp'
AND ts_start > now() - INTERVAL 1 HOUR
GROUP BY query_id
ORDER BY p99_ms DESC
LIMIT 10;
```

See [docs/clickhouse.md](docs/clickhouse.md) for materialized view definitions and more query examples.

## Testing

```bash
mise run test:all # Run all tests
mise run test:regress # SQL regression tests only
./scripts/run-tests.sh 18 all # Specific PG version
./scripts/run-tests.sh ../postgres/install_tap tap # TAP tests with local PG build
```

See [docs/testing.md](docs/testing.md) for test types, TAP test setup, and a full listing of test files.

## Troubleshooting

Common issues: extension not loading (check `shared_preload_libraries`), events not appearing (check `pg_stat_ch_stats()` for errors), high queue usage or dropped events (tune `pg_stat_ch.queue_capacity`, `pg_stat_ch.flush_interval_ms`, `pg_stat_ch.batch_max`).

See [docs/troubleshooting.md](docs/troubleshooting.md) for detailed solutions.

## License

This project is licensed under the Apache License, Version 2.0. See [LICENSE.md](LICENSE.md) for the full license text.