https://github.com/postgres-ai/nancy

Fully automated database experiments. THIS IS A MIRROR OF https://gitlab.com/postgres.ai/nancy
https://github.com/postgres-ai/nancy

aws-ec2 database-experiment docker postgres sql-queries workload

Last synced: 12 months ago
JSON representation

Fully automated database experiments. THIS IS A MIRROR OF https://gitlab.com/postgres.ai/nancy

Host: GitHub
URL: https://github.com/postgres-ai/nancy
Owner: postgres-ai
License: bsd-3-clause
Archived: true
Created: 2018-04-24T17:59:33.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2020-07-31T19:06:29.000Z (almost 6 years ago)
Last Synced: 2024-11-24T09:34:00.421Z (over 1 year ago)
Topics: aws-ec2, database-experiment, docker, postgres, sql-queries, workload
Language: Shell
Homepage: https://postgres.ai
Size: 1.19 MB
Stars: 104
Watchers: 17
Forks: 7
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/badges/shields.svg)](github.com/postgres-ai/nancy)

[![CircleCI](https://circleci.com/gh/postgres-ai/nancy.svg?style=svg)](https://circleci.com/gh/postgres-ai/nancy)

:warning: Nancy CLI project is on hold now. See details: https://gitlab.com/postgres-ai/nancy/-/issues/228

# About

Nancy helps to conduct automated database experiments.

The Nancy Command Line Interface is a unified way to manage automated
database experiments either in clouds or on-premise.

### What is a Database Experiment?

Database experiment is a set of actions performed to test
* (a) specified SQL queries ("workload")
* (b) on specified machine / OS / Postgres version ("environment")
* (c) against specified database ("object")
* (d) with an optional change – some DDL or config change ("target" or "delta").

Two main goals for any database experiment:
* (1) validation – check that the specified workload is valid,
* (2) benchmark – perform deep SQL query analysis.

Database experiments are needed when you:
- add or remove indexes;
- for a new DB schema change, want to validate it and estimate migration time;
- want to verify some query optimization ideas;
- tune database configuration parameters;
- do capacity planning and want to stress-test your DB in some environment;
- plan to upgrade your DBMS to a new major version;
- want to train ML model related to DB optimization.

# Currently Supported Features

* Works anywhere where Docker can run (checked: Linux Ubuntu/Debian, macOS)
* Experiments are conducted in a Docker container with extended Postgres setup
* Supported Postgres versions: 12 (default), 11, 10, 9.6
* Postgres config specified via options, may be partial
* Supported locations for experimental runs:
* Any machine with Docker installed
* AWS EC2:
* Run on AWS EC2 Spot Instances (using Docker Machine)
* Allow to specify EC2 instance type
* Auto-detect and use current lowest EC2 Spot Instance prices
* Support i3 instances (with NVMe SSD drives)
* Support arbitrary-size EBS volumes
* Support local or remote (S3) files – config, dump, etc
* The object (database) can be specified in various ways:
* Plain text
* Synthetic database generated by [pgbench](https://www.postgresql.org/docs/current/static/pgbench.html)
* Dump file (.sql, .gz, .bz2)
* What to test (a.k.a. "target" or "delta"):
* Test Postgres parameters change
* Test DDL change (specified as "do" and "undo" SQL to return state)
* Supported types of workload:
* Any custom SQL
* Synthetic workload generated by [pgbench](https://www.postgresql.org/docs/current/static/pgbench.html)
* "Real workload" based on Postgres logs (using [pgreplay](https://github.com/laurenz/pgreplay))
* For "real workload", allow replaying it with increased speed
* Allow to keep container alive for specified time after all steps are done
* Collected artifacts:
* `pg_stat_statements` snapshot
* `pg_stat_database`, ...
* Workload SQL logs
* Deep SQL query analysis report

# Requirements

1) To use Nancy CLI you need Linux or MacOS with installed Docker.

2) To run on AWS EC2 instances, you also need:
* AWS CLI https://aws.amazon.com/en/cli/
* Docker Machine https://docs.docker.com/machine/
* jq https://stedolan.github.io/jq/

# Installation

In the minimal configuration, only a few steps are needed:

NOTICE: The [Additional notes](#additional-notes) section contains
instructions useful in case of docker-related errors during `nancy run` calls.
Alternatively, see Docker's official [post-installation instructions for Linux](https://docs.docker.com/install/linux/linux-postinstall/).

1) Install Docker

Ubuntu/Debian:
```shell
sudo apt-get -y install docker
sudo systemctl enable docker
sudo systemctl start docker
```

RHEL7:
```shell
yum -y install docker
systemctl enable docker
systemctl start docker
```

MacOS (assuming that [Homebrew](https://brew.sh/) is installed):
```shell
brew install docker
```
See also: https://docs.docker.com/docker-for-mac/install/

2) Clone this repo and adjust `$PATH`:
```shell
git clone https://gitlab.com/postgres.ai/nancy.git
echo "export PATH=\$PATH:"$(pwd)"/nancy" >> ~/.bashrc
source ~/.bashrc
```

3) Install jq
- Ubuntu/Debian: `sudo apt-get -y install jq`
- CentOS/RHEL: `sudo yum install jq`
- MacOS: `brew install jq`

Additionally, to allow use of AWS EC2 instances:

4) Install AWS CLI https://docs.aws.amazon.com/cli/latest/userguide/installing.html

5) Install Docker Machine tools https://docs.docker.com/machine/install-machine/

# Getting started

Start with these commands:
```shell
nancy help
nancy run help
```

# "Hello World!"

Locally, on any Linux or macOS machine:
```shell
echo "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
| bzip2 > ./sample.dump.bz2

# "Clean run": w/o index
# (seqscan is expected, total time ~150ms, depending on resources)
nancy run \
--db-dump file://$(pwd)/sample.dump.bz2 \
--workload-custom-sql "select i from hello_world where i between 10 and 20;"

# Now check how a regular btree index affects performance
# (expected total time: ~0.05ms)
nancy run \
--db-dump file://$(pwd)/sample.dump.bz2 \
--workload-custom-sql "select i from hello_world where i between 10 and 20;" \
--delta-sql-do "create index i_hello_world_i on hello_world(i);" \
--delta-sql-undo "drop index i_hello_world_i;"
```

AWS EC2:
```shell
nancy run \
--run-on aws \
--aws-ec2-type "i3.large" \
--aws-keypair-name awskey \
--aws-ssh-key-path file://$(echo ~)/.ssh/awskey.pem \
--db-dump "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
--workload-custom-sql "select i from hello_world where i between 10 and 20;"
```

# Additional notes

On Linux, if you experience issues with running (locally) `nancy run` inside `screen` or
`tmux`, double-check that Docker is running and add your user to the `docker`
group, as described below. See also: https://docs.docker.com/install/linux/linux-postinstall/.

Ubuntu/Debian:
```shell
# Ubuntu/Debian
sudo usermod -aG docker ${USER}
newgrp docker
```

CentOS/RHEL:
```shell
sudo usermod -aG dockerroot ${USER}
newgrp dockerroot
```

On MacOS, it is recommended to specify `--tmp-path` explicitly, similar to this:
```
mkdir ./tmp
nancy run ... --tmp-path "$(pwd)/tmp"
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/postgres-ai/nancy

Awesome Lists containing this project

README