https://github.com/postgres-ai/nancy
Fully automated database experiments. THIS IS A MIRROR OF https://gitlab.com/postgres.ai/nancy
https://github.com/postgres-ai/nancy
aws-ec2 database-experiment docker postgres sql-queries workload
Last synced: 9 months ago
JSON representation
Fully automated database experiments. THIS IS A MIRROR OF https://gitlab.com/postgres.ai/nancy
- Host: GitHub
- URL: https://github.com/postgres-ai/nancy
- Owner: postgres-ai
- License: bsd-3-clause
- Archived: true
- Created: 2018-04-24T17:59:33.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2020-07-31T19:06:29.000Z (over 5 years ago)
- Last Synced: 2024-11-24T09:34:00.421Z (over 1 year ago)
- Topics: aws-ec2, database-experiment, docker, postgres, sql-queries, workload
- Language: Shell
- Homepage: https://postgres.ai
- Size: 1.19 MB
- Stars: 104
- Watchers: 17
- Forks: 7
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](github.com/postgres-ai/nancy)
[](https://circleci.com/gh/postgres-ai/nancy)
:warning: Nancy CLI project is on hold now. See details: https://gitlab.com/postgres-ai/nancy/-/issues/228
# About

Nancy helps to conduct automated database experiments.
The Nancy Command Line Interface is a unified way to manage automated
database experiments either in clouds or on-premise.
### What is a Database Experiment?
Database experiment is a set of actions performed to test
* (a) specified SQL queries ("workload")
* (b) on specified machine / OS / Postgres version ("environment")
* (c) against specified database ("object")
* (d) with an optional change – some DDL or config change ("target" or "delta").
Two main goals for any database experiment:
* (1) validation – check that the specified workload is valid,
* (2) benchmark – perform deep SQL query analysis.
Database experiments are needed when you:
- add or remove indexes;
- for a new DB schema change, want to validate it and estimate migration time;
- want to verify some query optimization ideas;
- tune database configuration parameters;
- do capacity planning and want to stress-test your DB in some environment;
- plan to upgrade your DBMS to a new major version;
- want to train ML model related to DB optimization.
# Currently Supported Features
* Works anywhere where Docker can run (checked: Linux Ubuntu/Debian, macOS)
* Experiments are conducted in a Docker container with extended Postgres setup
* Supported Postgres versions: 12 (default), 11, 10, 9.6
* Postgres config specified via options, may be partial
* Supported locations for experimental runs:
* Any machine with Docker installed
* AWS EC2:
* Run on AWS EC2 Spot Instances (using Docker Machine)
* Allow to specify EC2 instance type
* Auto-detect and use current lowest EC2 Spot Instance prices
* Support i3 instances (with NVMe SSD drives)
* Support arbitrary-size EBS volumes
* Support local or remote (S3) files – config, dump, etc
* The object (database) can be specified in various ways:
* Plain text
* Synthetic database generated by [pgbench](https://www.postgresql.org/docs/current/static/pgbench.html)
* Dump file (.sql, .gz, .bz2)
* What to test (a.k.a. "target" or "delta"):
* Test Postgres parameters change
* Test DDL change (specified as "do" and "undo" SQL to return state)
* Supported types of workload:
* Any custom SQL
* Synthetic workload generated by [pgbench](https://www.postgresql.org/docs/current/static/pgbench.html)
* "Real workload" based on Postgres logs (using [pgreplay](https://github.com/laurenz/pgreplay))
* For "real workload", allow replaying it with increased speed
* Allow to keep container alive for specified time after all steps are done
* Collected artifacts:
* `pg_stat_statements` snapshot
* `pg_stat_database`, ...
* Workload SQL logs
* Deep SQL query analysis report
# Requirements
1) To use Nancy CLI you need Linux or MacOS with installed Docker.
2) To run on AWS EC2 instances, you also need:
* AWS CLI https://aws.amazon.com/en/cli/
* Docker Machine https://docs.docker.com/machine/
* jq https://stedolan.github.io/jq/
# Installation
In the minimal configuration, only a few steps are needed:
NOTICE: The [Additional notes](#additional-notes) section contains
instructions useful in case of docker-related errors during `nancy run` calls.
Alternatively, see Docker's official [post-installation instructions for Linux](https://docs.docker.com/install/linux/linux-postinstall/).
1) Install Docker
Ubuntu/Debian:
```shell
sudo apt-get -y install docker
sudo systemctl enable docker
sudo systemctl start docker
```
RHEL7:
```shell
yum -y install docker
systemctl enable docker
systemctl start docker
```
MacOS (assuming that [Homebrew](https://brew.sh/) is installed):
```shell
brew install docker
```
See also: https://docs.docker.com/docker-for-mac/install/
2) Clone this repo and adjust `$PATH`:
```shell
git clone https://gitlab.com/postgres.ai/nancy.git
echo "export PATH=\$PATH:"$(pwd)"/nancy" >> ~/.bashrc
source ~/.bashrc
```
3) Install jq
- Ubuntu/Debian: `sudo apt-get -y install jq`
- CentOS/RHEL: `sudo yum install jq`
- MacOS: `brew install jq`
Additionally, to allow use of AWS EC2 instances:
4) Install AWS CLI https://docs.aws.amazon.com/cli/latest/userguide/installing.html
5) Install Docker Machine tools https://docs.docker.com/machine/install-machine/
# Getting started
Start with these commands:
```shell
nancy help
nancy run help
```
# "Hello World!"
Locally, on any Linux or macOS machine:
```shell
echo "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
| bzip2 > ./sample.dump.bz2
# "Clean run": w/o index
# (seqscan is expected, total time ~150ms, depending on resources)
nancy run \
--db-dump file://$(pwd)/sample.dump.bz2 \
--workload-custom-sql "select i from hello_world where i between 10 and 20;"
# Now check how a regular btree index affects performance
# (expected total time: ~0.05ms)
nancy run \
--db-dump file://$(pwd)/sample.dump.bz2 \
--workload-custom-sql "select i from hello_world where i between 10 and 20;" \
--delta-sql-do "create index i_hello_world_i on hello_world(i);" \
--delta-sql-undo "drop index i_hello_world_i;"
```
AWS EC2:
```shell
nancy run \
--run-on aws \
--aws-ec2-type "i3.large" \
--aws-keypair-name awskey \
--aws-ssh-key-path file://$(echo ~)/.ssh/awskey.pem \
--db-dump "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
--workload-custom-sql "select i from hello_world where i between 10 and 20;"
```
# Additional notes
On Linux, if you experience issues with running (locally) `nancy run` inside `screen` or
`tmux`, double-check that Docker is running and add your user to the `docker`
group, as described below. See also: https://docs.docker.com/install/linux/linux-postinstall/.
Ubuntu/Debian:
```shell
# Ubuntu/Debian
sudo usermod -aG docker ${USER}
newgrp docker
```
CentOS/RHEL:
```shell
sudo usermod -aG dockerroot ${USER}
newgrp dockerroot
```
On MacOS, it is recommended to specify `--tmp-path` explicitly, similar to this:
```
mkdir ./tmp
nancy run ... --tmp-path "$(pwd)/tmp"
```