Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rabbit721/qppnet
https://github.com/rabbit721/qppnet
Last synced: 1 day ago
JSON representation
- Host: GitHub
- URL: https://github.com/rabbit721/qppnet
- Owner: rabbit721
- Created: 2020-05-27T13:43:54.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-07-06T22:01:34.000Z (over 1 year ago)
- Last Synced: 2023-08-05T17:22:44.661Z (over 1 year ago)
- Language: Python
- Size: 4.52 MB
- Stars: 28
- Watchers: 5
- Forks: 15
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# QPPNet in PyTorch [![DOI](https://zenodo.org/badge/267330400.svg)](https://zenodo.org/badge/latestdoi/267330400)
This code contains a sample implementation for [Plan-Structured Deep Neural Network Models for Query Performance Prediction](https://arxiv.org/pdf/1902.00132.pdf) presented at VLDB 2019, and the code for training/testing on
- TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with Postgres
The TPC-H data are generated with `./dbgen -s `.
The TPC-H queries are generated using `./qgen -r ` where varying seed random number is used to generate different queries from a template.
Tables in PostgresSQL are created using [dss.ddl](https://github.com/gregrahn/tpch-kit/blob/master/dbgen/dss.ddl) and then have indexes created with [tpch-postgres-index-ddl.sql](https://github.com/oltpbenchmark/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/ddls/tpch-postgres-index-ddl.sql).
The query plan structure and query performance metrics are retrieved as json objects by running the generated queries above with the `explain (analyze, format JSON, verbose)` statement prepended.
- TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with [NoisePage](https://github.com/cmu-db/terrier)
- TPC-C queries and smallbank dataset generated using [OLTP](https://github.com/oltpbenchmark/oltpbench) and benchmarked with [NoisePage](https://github.com/cmu-db/terrier)
## Prerequisites
- Linux or macOS
- Python 3## Getting Started
### Installation
- Cloning the repo:
```
git clone https://github.com/rabbit721/QPPNet.git
cd QPPNet
```- Install the required python packages:
- For pip: `pip install -r requirements.txt`
- For conda: `conda env create -f environment.yml`### Getting the Datasets
- TPC-H benchmarked with PostgresSQL
```
# Under directory dataset/postgres_tpch_dataset
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch0p1g.zip && unzip psqltpch0p1g.zip
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch1g.zip && unzip psqltpch1g.zip
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch10g.zip && unzip psqltpch10g.zip
```- TPC-H benchmarked with NoisePage:
Data files already located under directory [`dataset/terrier_tpch_dataset`](https://github.com/rabbit721/QPPNet/tree/master/dataset/terrier_tpch_dataset) as `execution_0p1G.csv`, `execution_1G.csv`, and `execution_10G.csv`
- TPC-C and smallbank benchmarked with NoisePage:
Data files already located under directory [`dataset/oltp_dataset`](https://github.com/rabbit721/QPPNet/tree/master/dataset/terrier_tpch_dataset) as `tpcc_pipeline.csv` and `sb_pipeline.csv`
### Examples for Training a model
- On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with PostgresSQL
```
python3 main.py --dataset PSQLTPCH -s 0 -t 250000 --batch_size 128 -epoch_freq 1000 --lr 2e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/postgres_tpch_dataset/tpch1g/900-exp_res_by_temp/
```- On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with NoisePage
```
python3 main.py --dataset TerrierTPCH -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 1e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/terrier_dataset/execution_1G.csv
```- On TPC-C dataset generated using OLTP with SF=1 and benchmarked with NoisePage
```
python3 main.py --dataset OLTP -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 5e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/oltp_dataset/tpcc_pipeline.csv
```### Using a pre-trained model
- Getting a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL:
```
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/psqltpch_epoch4000.zip
```- Getting a model trained for 20000 epochs on TPC-H SF=1 dataset benchmarked with NoisePage:
```
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/terriertpch_epoch20000.zip
```- Getting a model trained for 10000 epochs on TPC-C dataset generated with OLTP and benchmarked with NoisePage:
```
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/tpcc_epoch10000.zip
```### Examples for Testing a trained model
When testing, please make sure that trained models are saved in `./saved_model` and the mean and range values of the train data are provided with the flag '--mean_range_dict' in order to normalize the input.
The '-s' flag is interpreted as an integer and is used to specify the epoch number of the saved model to be tested.
- Testing a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL on TPC-H SF=10 dataset benchmarked with PostgresSQL.
```
python3 main.py --test_time --dataset PSQLTPCH -s 4000 --mean_range_dict mean_range_dict.pickle --data_dir ./dataset/postgres_tpch_dataset/tpch10G/900-exp_res_by_temp/
```- Testing a model trained for 10000 epochs on the TPC-C benchmark on the smallbank dataset.
Please make sure that models are saved in `./saved_model````
python3 main.py --test_time --dataset OLTP -s 10000 --data_dir ./dataset/oltp_dataset/sb_pipeline.csv
```