An open API service indexing awesome lists of open source software.

https://github.com/ivanrododendro/aiqo-report

AI Query Optimizer Report
https://github.com/ivanrododendro/aiqo-report

ai sql sql-optimization sql-tuning

Last synced: 19 days ago
JSON representation

AI Query Optimizer Report

Awesome Lists containing this project

README

          

# AIQO PostgreSQL AI Report Generator

The AIQO PostgreSQL AI Report Generator is a powerful command-line tool designed to analyze PostgreSQL `auto_explain` logs using Artificial Intelligence. It provides query tracking over time and actionable insights and optimization suggestions to improve database performance, presenting its findings in a comprehensive HTML report.

It is well suited to scenarios where complexity, data volume and fragmentation make it difficult or impossible to reproduce the execution plan outside the production environment.

Download the [sample HTML report](https://github.com/ivanrododendro/aiqo-report/releases/download/1.4.2/sample_report.zip) or the [target query report for query 421865](https://github.com/ivanrododendro/aiqo-report/releases/download/1.4.2/target_query_report_421865.zip) for a randomized dataset created using explain.dalibo.com sample plans. AI hint is at day may 4, real reports are more meaningful due to detailed context you can provide to AI analisys.

![](docs/infographic.webp)

## Features

* **Comprehensive HTML Reports**: Generates detailed, easy-to-read HTML reports summarizing performance metrics, queries timelines, AI findings and optimization opportunities.
* **Query Code Tracking**: Generates a unique "query code" (hash) for each normalized SQL query, enabling consistent tracking and application of query-specific optimizations across different log executions.
* **AI-Powered Analysis with Customizable AI Models**: Leverages large language models (LLMs) to analyze `EXPLAIN` plans from PostgreSQL logs and identify performance bottlenecks. Provides concrete recommendations for query, server, and infrastructure optimizations based on AI analysis. By default, AI analysis is executed only once for each query code. Subsequent occurrences are kept in the report but marked as already analyzed instead of triggering a new AI call. Supports various AI providers and models (e.g. ChatGPT, Gemini, DeepSeek, ...) via `litellm`. Supports provider-side prompt caching for compatible LLM providers, with an option to disable it when required.
* **Exceution plan compare**: The report allows to compare execution plans for two occurencies of a same query to analyze the evolution over time.
* **Flexible Contextualization**: Allows users to provide DDL, server configuration, infrastructure details, and custom prompts to enhance AI analysis accuracy.
* **Native Builds Available**: Standalone native executables are available for Linux, macOS (Apple Silicon), and Windows.
* **Multilingual Output**: Supports generating reports in different languages.
* **Reproducible Outputs**: When using OpenAI models, analyses can be reproduced by providing the same input and context, ensuring consistent results across runs.
* **Privacy-First Anonymization**: Before any data is sent to an AI provider, all PostgreSQL object names (tables, schemas, indexes, columns, custom functions) are replaced with deterministic aliases (e.g. `tbl_3a1f9c`). The AI response is de-anonymized before it appears in the report, so the HTML output always shows the real names. Anonymization is enabled by default and can be disabled with `--no-anonymize`.

## API Keys and LiteLLM Configuration

This tool uses `litellm` to interface with various AI providers. You must configure your API keys as environment variables.

For example, for OpenAI models:

```bash
set OPENAI_API_KEY="your_openai_api_key_here"
```

For Gemini models:

```bash
set GEMINI_API_KEY="your_gemini_api_key_here"
```

You can also use a generic `LITELLM_API_KEY` if you are using a provider that `litellm` supports via this generic key. Refer to the [LiteLLM documentation](https://litellm.ai/docs/providers) for specific provider configurations.

To route analysis through a LiteLLM Proxy Server, use the `litellm_proxy/` model prefix and configure the proxy endpoint:

```bash
set LITELLM_API_BASE="http://localhost:4000"
set LITELLM_API_KEY="your_litellm_proxy_key_here"

./pg_aiqo_report /path/to/your/postgresql.log --model "litellm_proxy/pg-report-model"
```

The proxy must expose a model or alias matching the value after `litellm_proxy/`. You can also pass the proxy settings explicitly with `--litellm-api-base` and `--litellm-api-key`.

## Basic Usage

To analyze a PostgreSQL log file with default settings:

```bash
./pg_aiqo_report /path/to/your/postgresql.log
```

This will generate an HTML report in the current working directory (or `output/` if it exists), named similarly to `pg-ai-report_.html`.

By default, the AI analysis is executed only on the first occurrence of each query code. Subsequent occurrences are reported with a message indicating that the same query was already analyzed earlier. Use `--analyze-all-queries` if you want an independent AI analysis for every matching log entry.

## PostgreSQL Configuration

To generate the necessary log files for analysis, your PostgreSQL instance must be configured to use `auto_explain`. Here's a typical configuration to add to your `postgresql.conf`:

```ini
# auto_explain settings
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 60000 # Log queries that exceed duration target
auto_explain.log_analyze = on # Include EXPLAIN ANALYZE output
auto_explain.log_buffers = on # Include buffer usage
auto_explain.log_timing = off # Exclude unecessary detailed timing information
auto_explain.log_nested_pages = on # For nested queries
auto_explain.log_verbose = on # For verbose output
auto_explain.log_format = text # Or json
```

> **_NOTE:_** when auto_explain.log_timing parameter is on, per-plan-node timing occurs for all statements executed, whether or not they run long enough to actually get logged. This can have an extremely negative impact on performance. Turning off auto_explain.log_timing ameliorates the performance cost, at the price of obtaining less information. See https://www.postgresql.org/docs/current/auto-explain.html

After modifying `postgresql.conf`, restart your PostgreSQL server for the changes to take effect. The tool expects standard text-based PostgreSQL log files.

## Context Files & Location

The tool supports two distinct context sources:

1. **Built-in AI instruction prompts** loaded from `src/aiqo_pg_ai_report/prompts/`:
- `SYSTEM.txt`
- `FORMAT.txt`
- `GENERAL_HINTS_SYNTHESIS.txt`
- `TARGET_QUERY_SYSTEM.txt`
- `TARGET_QUERY_FORMAT.txt`

These files are required by the application and are always loaded from the package prompts directory.

2. **Application contexts and optimization history** loaded from a context folder:
- `DDL.txt`
- `CONFIG.txt`
- `PROJECT.txt`
- `SERVER.txt`
- `EVENTS.txt`
- `QUERIES/.txt`

Where `` is the first 6 characters of the normalized query hash.

### Default context folder behavior

If `--context-folder` is **not** provided, the default application context folder is:

- **Single log file mode**: `CONTEXT/` under the directory containing the log file.
- **Directory mode** (when a directory is analyzed): `CONTEXT/` inside that analyzed directory.

Examples:

- Log file: `/var/log/postgresql/postgresql.log` -> default context folder: `/var/log/postgresql/CONTEXT/`
- Log directory: `/var/log/postgresql/` -> default context folder: `/var/log/postgresql/CONTEXT/`

If `--context-folder` is provided, that path is used as-is and overrides the default behavior.

To use custom contexts, create a folder (e.g., `my_custom_contexts/`) and place your context files within it according to the following structure:

```
my_custom_contexts/
├── DDL.txt
├── CONFIG.txt
├── PROJECT.txt
├── SERVER.txt
├── EVENTS.txt
└── QUERIES/
├── .txt
├── .txt
└── ...
```

To specify your custom context folder, use the `--context-folder` argument. For example:

```bash
./pg_aiqo_report \
--context-folder "./my_custom_contexts" \
/path/to/your/postgresql.log
```

**Default Model**:
The default model is `gemini-3.1-flash-lite-preview`. You can specify a different model using the `--model` CLI argument.

## Advanced Usage

You can customize the analysis using various command-line arguments:

```bash
./pg_aiqo_report \
--model "gpt-4o-mini" \
--language "en" \
--format "json" \
--context-folder "./my_custom_contexts" \
--custom-prompt "Focus on index usage." \
--only-seq-scan-ai-analysis \
--filter "SELECT * FROM users" \
--limit-ai-calls 5 \
--ai-call-timeout 120 \
--analyze-all-queries \
--disable-provider-cache \
/path/to/your/postgresql.log
```

### Full CLI Parameters

* **`log_file_path`** (positional argument):
* The full path to the PostgreSQL `auto_explain` log file to be analyzed.
* Example: `/var/log/postgresql/postgresql-2023-01-01.log`

* **`--model `** (`-m`):
* Specify the AI model to use for analysis (e.g., `gpt-4o`, `gemini-1.5-pro`, `o1-mini`, `litellm_proxy/pg-report-model`).
* Default: `gemini-3.1-flash-lite-preview`

* **`--format `** (`-fmt`):
* Specify the log format to parse: `text` (default), `json` (supported), `yaml` (currently unsupported).
* Default: `text`

* **`--limit-ai-calls `** (`-l`):
* Limits the maximum number of AI calls made during the analysis. Use `-1` for unlimited calls.
* In target query mode (`--target-query`), limits the number of matching query occurrences included in the aggregated analysis instead of limiting AI calls.
* Default: `-1` (unlimited)

* **`--analyze-all-queries`** (`-aaa`):
* Forces an independent AI analysis for every query occurrence found in the logs.
* By default, the tool analyzes only the first occurrence of each query code and marks the following occurrences as already analyzed.
* This is a flag, no value needed.
* Default: `False`

* **`--ai-call-timeout `**:
* Sets the timeout duration for each individual AI call in seconds.
* Default: `90` seconds

* **`--disable-provider-cache`**:
* Disable provider-side prompt caching for all supported LLM providers.
* When enabled, OpenAI requests are sent without `prompt_cache_key`, and Gemini/Claude requests are sent without provider cache markers.
* This is a flag, no value needed.
* Default: `False`

* **`--litellm-api-base `**:
* Sets the LiteLLM Proxy Server API base URL for models using the `litellm_proxy/` prefix.
* Falls back to the `LITELLM_API_BASE` environment variable.

* **`--litellm-api-key `**:
* Sets the LiteLLM Proxy Server API key for models using the `litellm_proxy/` prefix.
* Falls back to the `LITELLM_API_KEY` environment variable.
* Optional for proxy deployments that do not require authentication.

* **`--language `**:
* Set the output language for the generated report and AI analysis.
* Default: `fr` (French)
* Example: `--language en` for English output.

* **`--skip-ai-analysis`** (`-s`):
* If set, the AI analysis step will be skipped entirely. A report will still be generated, but without AI-driven insights.
* This is a flag, no value needed.
* Default: `False`

* **`--only-seq-scan-ai-analysis`** (`-o`):
* If set, the AI analysis will only be performed on queries that are identified as performing sequential scans. This helps focus the AI's efforts on specific performance issues.
* This is a flag, no value needed.
* Default: `False`

* **`--filter `** (`-f`):
* Filter log entries. Only log entries containing the specified string in the query name, job name, SQL text, or query code will be included in the report and processed for AI analysis. Can be specified multiple times. Case-insensitive.
* Example: `--filter "public.users"` or `--filter "2a3b4c"` (to filter by a query code)
* Default: `None` (no filter)

* **`--custom-prompt `** (`-c`):
* Provide an additional custom prompt or instruction to the AI for its analysis. This prompt will be appended to the standard prompts.
* Example: `--custom-prompt "Pay special attention to JOIN operations."`

* **`--version`** (`-v`):
* Show the current tool version (from git tags) and the detected `litellm` version, then exit.

* **`--supported-models`** (`-sm`):
* Show the models reported by `litellm.models()` with provider information, then exit.
* Default: `None`

* **`--report-filename `** (`-r`):
* Override the HTML report filename.
* Example: `--report-filename my_custom_report.html`
* Default: Automatically generated based on log filename

* **`--context-folder `** (`-cf`):
* Path to a directory containing application context files and optimization history (`DDL.txt`, `CONFIG.txt`, `PROJECT.txt`, `SERVER.txt`, `EVENTS.txt`, and `QUERIES/*.txt`).
* Example: `--context-folder /home/user/my_db_contexts`
* Default: `CONTEXT/` under the analyzed path (for a log file: the log file's parent directory; for directory mode: the analyzed directory itself)

* **`--no-anonymize`**:
* Disable the anonymization layer. DB object names (tables, schemas, indexes, columns, custom functions) will be sent to the AI provider as-is.
* By default, all names are replaced with deterministic aliases before each AI call and restored in the response, so the HTML report always shows real names.
* This is a flag, no value needed.
* Default: `False` (anonymization is ON)

* **`--debug`** (`-d`):
* Enable debug logging.
* This is a flag, no value needed.
* Default: `False`

## Output Report

The tool generates a single, self-contained HTML report. This report typically includes:

* **Summary Dashboard**: Overview of analyzed queries, AI calls, and total costs.
* **Daily Statistics**: Breakdown of query activity and AI analysis per day.
* **Query Statistics**: Aggregated information about normalized queries, including frequency and average execution times.
* **Detailed Query Analysis**: For each significant query (especially those identified with issues like sequential scans or based on AI analysis), a dedicated section will provide:
* The original SQL query.
* The `EXPLAIN ANALYZE` plan.
* The AI's summary of the plan's performance characteristics.
* AI-generated optimization recommendations specific to that query.
* PEV2 Query visualizer
* Graph tracking qeury statistics over time

## Build Dependencies & Requirements

* Python >= 3.11 <= 3.13
* The project uses `poetry` for dependency management.
* Key Python packages: `litellm`, `sqlparse`, `Jinja2`.

To install the required dependencies, navigate to the project root and run:

```bash
poetry install
```

## Native Binaries

Standalone binaries are produced with the same base name on every platform:

- Linux: `dist/linux/pg_aiqo_report`
- macOS (Apple Silicon): `dist/macos-silicon/pg_aiqo_report`
- Windows: `dist/windows/pg_aiqo_report.exe`

Build them with:

```bash
poetry install --with dev
./scripts/build_nuitka.sh
```

The GitHub Actions release workflow persists Nuitka caches between runs by restoring `${GITHUB_WORKSPACE}/.github-cache/nuitka/` through `actions/cache`. When `ccache` is available, the build script points Nuitka to that cached compiler store automatically.