Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ankane/dexter

The automatic indexer for Postgres
https://github.com/ankane/dexter

Last synced: 19 days ago
JSON representation

The automatic indexer for Postgres

Awesome Lists containing this project

README

        

# Dexter

The automatic indexer for Postgres

[Read about how it works](https://ankane.org/introducing-dexter) or [watch the talk](https://www.youtube.com/watch?v=Mni_1yTaNbE)

[![Build Status](https://github.com/ankane/dexter/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/dexter/actions)

## Installation

First, install [HypoPG](https://github.com/HypoPG/hypopg) on your database server. This doesn’t require a restart.

```sh
cd /tmp
curl -L https://github.com/HypoPG/hypopg/archive/1.4.1.tar.gz | tar xz
cd hypopg-1.4.1
make
make install # may need sudo
```

And enable it in databases where you want to use Dexter:

```sql
CREATE EXTENSION hypopg;
```

See the [installation notes](#hypopg-installation-notes) if you run into issues.

Then install the command line tool with:

```sh
gem install pgdexter
```

The command line tool is also available with [Docker](#docker), [Homebrew](#homebrew), or as a [Linux package](guides/Linux.md).

## How to Use

Dexter needs a connection to your database and a source of queries (like [pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html)) to process.

```sh
dexter -d dbname --pg-stat-statements
```

This finds slow queries and generates output like:

```
Started
Processing 189 new query fingerprints
Index found: public.genres_movies (genre_id)
Index found: public.genres_movies (movie_id)
Index found: public.movies (title)
Index found: public.ratings (movie_id)
Index found: public.ratings (rating)
Index found: public.ratings (user_id)
```

To be safe, Dexter will not create indexes unless you pass the `--create` flag. In this case, you’ll see:

```
Index found: public.ratings (user_id)
Creating index: CREATE INDEX CONCURRENTLY ON "public"."ratings" ("user_id")
Index created: 15243 ms
```

## Connection Options

Dexter supports the same connection options as psql.

```
-h host -U user -p 5432 -d dbname
```

This includes URIs:

```
postgresql://user:pass@host:5432/dbname
```

and connection strings:

```
host=localhost port=5432 dbname=mydb
```

Always make sure your [connection is secure](https://ankane.org/postgres-sslmode-explained) when connecting to a database over a network you don’t fully trust.

## Collecting Queries

Dexter can collect queries from a number of sources.

- [Query stats](#query-stats)
- [Live queries](#live-queries)
- [Log files](#log-file)
- [SQL files](#sql-files)

### Query Stats

Enable [pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html) in your database.

```psql
CREATE EXTENSION pg_stat_statements;
```

And use:

```sh
dexter --pg-stat-statements
```

### Live Queries

Get live queries from the [pg_stat_activity](https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW) view with:

```sh
dexter --pg-stat-activity
```

### Log Files

Enable logging for slow queries in your Postgres config file.

```ini
log_min_duration_statement = 10 # ms
```

And use:

```sh
dexter postgresql.log
```

Supports `stderr`, `csvlog`, and `jsonlog` formats.

For real-time indexing, pipe your logfile:

```sh
tail -F -n +1 postgresql.log | dexter --stdin
```

And pass `--input-format csvlog` or `--input-format jsonlog` if needed.

### SQL Files

Pass a SQL file with:

```sh
dexter queries.sql
```

Pass a single query with:

```sh
dexter -s "SELECT * FROM ..."
```

## Collection Options

To prevent one-off queries from being indexed, specify a minimum number of calls before a query is considered for indexing

```sh
dexter --min-calls 100
```

You can do the same for total time a query has run

```sh
dexter --min-time 10 # minutes
```

When streaming logs, specify the time to wait between processing queries

```sh
dexter --interval 60 # seconds
```

## Analyze

For best results, make sure your tables have been recently analyzed so statistics are up-to-date. You can ask Dexter to analyze tables it comes across that haven’t been analyzed in the past hour with:

```sh
dexter --analyze
```

## Tables

You can exclude large or write-heavy tables from indexing with:

```sh
dexter --exclude table1,table2
```

Alternatively, you can specify which tables to index with:

```sh
dexter --include table3,table4
```

## Debugging

See how Dexter is processing queries with:

```sh
dexter --log-sql --log-level debug2
```

## Hosted Postgres

The `hypopg` extension, which Dexter needs to run, is available on [these providers](https://github.com/ankane/dexter/issues/44).

For other providers, see [this guide](guides/Hosted-Postgres.md). To request a new extension:

- Google Cloud SQL - vote or comment on [this page](https://issuetracker.google.com/issues/69250435)
- DigitalOcean Managed Databases - vote or comment on [this page](https://ideas.digitalocean.com/managed-database/p/support-hypopg-for-postgres)

## HypoPG Installation Notes

### Postgres Location

If your machine has multiple Postgres installations, specify the path to [pg_config](https://www.postgresql.org/docs/current/app-pgconfig.html) with:

```sh
export PG_CONFIG=/Applications/Postgres.app/Contents/Versions/latest/bin/pg_config
```

Then re-run the installation instructions (run `make clean` before `make` if needed)

### Missing Header

If compilation fails with `fatal error: postgres.h: No such file or directory`, make sure Postgres development files are installed on the server.

For Ubuntu and Debian, use:

```sh
sudo apt-get install postgresql-server-dev-15
```

Note: Replace `15` with your Postgres server version

## Additional Installation Methods

### Docker

Get the [Docker image](https://hub.docker.com/r/ankane/dexter) with:

```sh
docker pull ankane/dexter
```

And run it with:

```sh
docker run -ti ankane/dexter
```

For databases on the host machine, use `host.docker.internal` as the hostname (on Linux, this requires Docker 20.04+ and `--add-host=host.docker.internal:host-gateway`).

### Homebrew

With Homebrew, you can use:

```sh
brew install dexter
```

## Future Work

[Here are some ideas](https://github.com/ankane/dexter/issues/45)

## Upgrading

Run:

```sh
gem install pgdexter
```

To use master, run:

```sh
gem install specific_install
gem specific_install https://github.com/ankane/dexter.git
```

## Upgrade Notes

### 0.5.0

The `--stdin` option is now required to read queries from stdin.

```sh
tail -F -n +1 postgresql.log | dexter --stdin
```

## Thanks

This software wouldn’t be possible without [HypoPG](https://github.com/HypoPG/hypopg), which allows you to create hypothetical indexes, and [pg_query](https://github.com/lfittl/pg_query), which allows you to parse and fingerprint queries. A big thanks to Dalibo and Lukas Fittl respectively. Also, thanks to YugabyteDB for [this article](https://dev.to/yugabyte/explain-from-pgstatstatements-normalized-queries-how-to-always-get-the-generic-plan-in--5cfi) on how to explain normalized queries.

## Research

This is known as the Index Selection Problem (ISP).

## History

View the [changelog](https://github.com/ankane/dexter/blob/master/CHANGELOG.md)

## Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

- [Report bugs](https://github.com/ankane/dexter/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/dexter/pulls)
- Write, clarify, or fix documentation
- Suggest or add new features

To get started with development, run:

```sh
git clone https://github.com/ankane/dexter.git
cd dexter
bundle install
bundle exec rake install
```

To run tests, use:

```sh
createdb dexter_test
bundle exec rake test
```