{"id":13582109,"url":"https://github.com/timescale/benchmark-postgres","last_synced_at":"2025-05-07T04:58:46.704Z","repository":{"id":44158957,"uuid":"99824257","full_name":"timescale/benchmark-postgres","owner":"timescale","description":"Tools for benchmarking TimescaleDB vs PostgreSQL","archived":false,"fork":false,"pushed_at":"2018-03-12T19:37:33.000Z","size":11,"stargazers_count":40,"open_issues_count":1,"forks_count":9,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-05-07T04:58:41.451Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timescale.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-09T15:32:56.000Z","updated_at":"2025-03-16T16:03:58.000Z","dependencies_parsed_at":"2022-07-30T10:08:06.368Z","dependency_job_id":null,"html_url":"https://github.com/timescale/benchmark-postgres","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timescale%2Fbenchmark-postgres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timescale%2Fbenchmark-postgres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timescale%2Fbenchmark-postgres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timescale%2Fbenchmark-postgres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timescale","download_url":"https://codeload.github.com/timescale/benchmark-postgres/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252816948,"owners_count":21808704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:02:25.954Z","updated_at":"2025-05-07T04:58:46.684Z","avatar_url":"https://github.com/timescale.png","language":"Go","readme":"# TimescaleDB vs PostgreSQL Benchmark\n\n## Introduction\nThis repository contains a collection of Go programs that can be used to\nbenchmark [TimescaleDB][timescaledb] against PostgreSQL on insert,\nquery, and deletion (data retention) performance. Additionally, we\nprovide a [data set and queries][dataset] to allow you to measure\nperformance [on the same data we have measured][blog].\n\n## Getting Started\n You will need the Go runtime (1.6+) installed on the machine\n you wish to benchmark. You can access this repo via `go get`:\n ```bash\n go get github.com/timescale/benchmark-postgres\n ```\n\nThere are three programs available for installation under the `cmd`\ndirectory, each of which can be installed with `go install`:\n```bash\n# Change to program directory\ncd $GOPATH/src/github.com/timescale/benchmark-postgres/cmd/timescaledb-benchmark-query\ngo get .\ngo install\n\n# Repeat for other programs\n```\n\n## Our Dataset\n\nIn a [discussion of how TimescaleDB compares to PostgreSQL][blog], we\nused two datasets: one with 100M rows of CPU metrics and one with 1B\nrows. We have made the [100M row dataset available][dataset]\n(link will download 7GB archive) and use\nit throughout this README as an example.\nIn addition to the data in CSV format, the archive\nalso contains a file to create the table schema we used and a selection\nof queries we tested.\n\nThe rows represent CPU metrics for 4000 hosts over the course of 3 days,\n`2016-01-01` through `2016-01-03`. Each row consists of a timestamp, a\nhost identifier, and 10 CPU metrics. All hosts have a row every 10\nseconds for the duration of the 3 days, leading to just over 100M rows\nof data. The CSV is **20 GB** and when imported the database\nis **~30GB**.\n\nTo unpack the archive:\n```bash\ntar -vxjf benchmark_postgres.tar.bz2\n```\n\nThis will unpack the following files into your current directory:\n\n* `cpu-data.csv`\n* `benchmark-setup-timescaledb.sql`\n* `benchmark-setup-postgresql.sql`\n* `queries-1-host-12-hr.sql`\n* `queries-8-host-1-hr.sql`\n* `queries-groupby-orderby-limit.sql`\n* `queries-groupby.sql`\n\n## Usage\n\n### Benchmark: Inserts (timescaledb-parallel-copy)\nIn the `cmd` folder is a Git submodule to our [parallel copy][] program\nthat is generally available. This program can actually double as a way\nto benchmark insert performance in either TimescaleDB or PostgreSQL.\nMake sure it is installed (see above) and you are ready to go.\n\nUsing our 100M dataset, first you need to setup the database and tables.\nCreate a database in PostgreSQL called, e.g., `benchmark`. Then, setup\nthe tables using our provided schema:\n```bash\n# To setup a TimescaleDB hypertable\npsql -d benchmark \u003c benchmark-setup-timescaledb.sql\n# To setup a plain PostgreSQL table\npsql -d benchmark \u003c benchmark-setup-postgresql.sql\n```\n\nNote that you can setup both tables in the same database for easy\ncomparisons. To measure insert performance, run\n`timescaledb-parallel-copy` with the `--verbose` flag, and optionally\na `--reporting-period` to get in-progress results:\n```bash\n# For TimescaleDB, report every 30s\ntimescaledb-parallel-copy --db-name=benchmark --table=cpu_ts \\\n    --verbose --reporting-period=30s --file=cpu-data.csv\n\n# For PostgreSQL, report every 30s\ntimescaledb-parallel-copy --db-name=benchmark --table=cpu_pg \\\n    --verbose --reporting-period=30s --file=cpu-data.csv\n```\n\nOnce the copy is finished, you'll be given an average number of rows\nper second over the whole insertion process. If you included a\n`--reporting-period` you can also see how the performance changes over\ntime.\n\n```bash\nat 20s, row rate 137950.621224/sec (period), row rate 137950.621224/sec (overall), 2.760000E+06 total rows\nat 40s, row rate 106634.481501/sec (period), row rate 122305.233000/sec (overall), 4.890000E+06 total rows\n...\nat 14m40s, row rate 97544.843755/sec (period), row rate 112289.740923/sec (overall), 9.877000E+07 total rows\nat 15m0s, row rate 134560.614457/sec (period), row rate 112784.651475/sec (overall), 1.014600E+08 total rows\nCOPY 103680000, took 15m18.895179s with 8 worker(s) (mean rate 112831.150209/sec)\n```\n\n### Benchmark: Queries (timescaledb-benchmark-query)\n\nTo benchmark query latency, we provide `timescaledb-benchmark-query`,\nwhich takes a file of queries (one per each line) and runs them in\nparallel. Each query is run twice, to generate 'cold' and 'warm'\nmeasurements for each, and the averages of both are printed after\nall the queries are run:\n```bash\ntimescaledb-benchmark-query --db-name=benchmark --table=cpu_ts \\\n    --query-file=queries-1-host-12-hr.sql --workers=1\n\navg of 'cold' 10 queries (ms):    19.90\navg of 'warm' 10 queries (ms):    14.50\n```\n\nYou can compare these numbers to PostgreSQL by changing the `--table`\nflag to point to a plain PostgreSQL table.\n\nIncluded in our 100M row dataset are queries of 4 types:\n\n* `1-host-12-hr`: Returns the max CPU usage per minute for 12 hours on one host\n* `8-host-1-hr`: Same as above except for 8 eights over 1 hour\n* `groupby`: Returns the avg CPU usage per host per hour over 24 hours\n* `groupby-orderby-limit`: Returns the last 5 max CPU usage per minute across all devices with a random time range end point\n\n_Note: The queries are missing the table name (replaced with %s), which\nis filled in later by the `--table` flag to `timescaledb-benchmark-query`._\n\n### Benchmark: Data Retention (timescaledb-benchmark-delete)\n\nOur final benchmark deals with measuring the cost of removing data after\nit falls outside of a retention period. TimescaleDB introduces a\nfunction called `drop_chunks()` to easily remove data older than a\ncertain date. Combined with the way TimescaleDB organizes and stores\ndata, this is much more efficient for removing data.\n\nTo measure this, we provide `timescaledb-benchmark-delete` which can be\nused to delete data using `drop_chunks()` or SQL's `DELETE` command.\nNote that this program **does actually delete the data**, so if you\nusing it, make sure the data loss is okay (i.e. **DO NOT USE** on\nproduction data).\n\nThe program requires a start date from which to delete data, an amount\nto delete -- which should be equal to a chunk size for TimescaleDB,\nand the number of times to delete that amount of data. Our 100M row\ndataset begins on `2016-01-01` at midnight, so that is the start date\nwe'll use along with 12-hour chunks. So, to delete the first 3 chunks\nwe run:\n```bash\ntimescaledb-benchmark-delete --db-name=benchmark --table=cpu_ts \\\n    --start=\"2016-01-01T00:00:00Z\" --amount=\"12h\" --limit=3\n```\nThis will print out the command used and time it took to execute in\nmilliseconds:\n```bash\nSELECT drop_chunks('2016-01-01T12:00:00Z'::TIMESTAMPTZ, 'cpu_ts')\n66ms\n\nSELECT drop_chunks('2016-01-02T00:00:00Z'::TIMESTAMPTZ, 'cpu_ts')\n19ms\n\nSELECT drop_chunks('2016-01-02T12:00:00Z'::TIMESTAMPTZ, 'cpu_ts')\n19ms\n```\nTo benchmark the equivalent scenario on PostgreSQL requires you to\ndisable the use of `drop_chunks()`:\n```bash\ntimescaledb-benchmark-delete --db-name=benchmark --table=cpu_pg \\\n    --start=\"2016-01-01T00:00:00Z\" --amount=\"12h\" --limit=3 \\\n    --use-drop-chunks=false\n```\n\n[timescaledb]: https://github.com/timescale/timescaledb\n[dataset]: https://timescaledata.blob.core.windows.net/datasets/benchmark_postgres.tar.bz2\n[blog]: https://blog.timescale.com/timescaledb-vs-6a696248104e\n[parallel copy]: https://github.com/timescale/timescaledb-parallel-copy\n","funding_links":[],"categories":["Go"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimescale%2Fbenchmark-postgres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimescale%2Fbenchmark-postgres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimescale%2Fbenchmark-postgres/lists"}