{"id":19200407,"url":"https://github.com/redisearch/ftsb","last_synced_at":"2026-04-22T12:11:29.230Z","repository":{"id":40292752,"uuid":"197765029","full_name":"RediSearch/ftsb","owner":"RediSearch","description":"Full Text Search Benchmark, a tool for comparing and evaluating full-text search engines.","archived":false,"fork":false,"pushed_at":"2025-04-09T16:37:22.000Z","size":51666,"stargazers_count":21,"open_issues_count":28,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-20T12:41:15.063Z","etag":null,"topics":["benchmark","redis","redisearch","search-engines"],"latest_commit_sha":null,"homepage":"https://redisearch.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RediSearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-07-19T11:58:39.000Z","updated_at":"2025-04-09T16:36:39.000Z","dependencies_parsed_at":"2023-12-07T23:24:26.962Z","dependency_job_id":"0fbf758e-f7bb-4383-9d76-9c651f9f9d8e","html_url":"https://github.com/RediSearch/ftsb","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RediSearch%2Fftsb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RediSearch%2Fftsb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RediSearch%2Fftsb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RediSearch%2Fftsb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RediSearch","download_url":"https://codeload.github.com/RediSearch/ftsb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253180865,"owners_count":21866987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","redis","redisearch","search-engines"],"created_at":"2024-11-09T12:32:51.921Z","updated_at":"2026-04-22T12:11:29.199Z","avatar_url":"https://github.com/RediSearch.png","language":"Python","readme":"[![license](https://img.shields.io/github/license/RediSearch/ftsb.svg)](https://github.com/RediSearch/ftsb)\n[![CircleCI](https://circleci.com/gh/RediSearch/ftsb/tree/master.svg?style=svg)](https://circleci.com/gh/RediSearch/ftsb/tree/master)\n[![GitHub issues](https://img.shields.io/github/release/RediSearch/ftsb.svg)](https://github.com/RediSearch/ftsb/releases/latest)\n[![Codecov](https://codecov.io/gh/RediSearch/ftsb/branch/master/graph/badge.svg)](https://codecov.io/gh/RediSearch/ftsb)\n[![Go Report Card](https://goreportcard.com/badge/github.com/RediSearch/ftsb)](https://goreportcard.com/report/github.com/RediSearch/ftsb)\n[![GoDoc](https://godoc.org/github.com/RediSearch/ftsb?status.svg)](https://godoc.org/github.com/RediSearch/ftsb)\n\n# Full-Text Search Benchmark (FTSB)\n [![Forum](https://img.shields.io/badge/Forum-RediSearch-blue)](https://forum.redislabs.com/c/modules/redisearch/) \n[![Discord](https://img.shields.io/discord/697882427875393627?style=flat-square)](https://discord.gg/xTbqgTB)\n\nThis repo contains code for benchmarking full text search databases,\nincluding RediSearch.\nThis code is based on a fork of work initially made public by TSBS\nat https://github.com/timescale/tsbs.\n\n\n\n## Overview\nThe Full-Text Search Benchmark (FTSB) is a collection of Python and Go programs that are used to generate datasets (Python) and then benchmark(Go) read and write performance of various databases. The intent is to make the FTSB extensible so that a variety of use cases (e.g., ecommerce, jsondata, logs, etc.), query types, and databases can be included and benchmarked.\nTo this end, we hope to help SAs, and prospective database administrators find the best database for their needs and their workloads.\n\n\n## What the FTSB tests\n\nFTSB is used to benchmark bulk load performance and query execution performance. To accomplish this in a fair way, the data to be inserted and the queries to run are always pre-generated and native Go clients are used wherever possible to connect to each database.\n\n## Current databases supported\n\n+ RediSearch\n\n### Current use cases\n\nCurrently, FTSB supports three use cases: \n- **nyc_taxis** [[details here](docs/nyc_taxis-benchmark/description.md)]. This benchmark focusses on write performance, making usage of TLC Trip Record Data that contains the rides that have been performed in yellow cab taxis in New York in 2015. The benchmark loads over 12M documents.\n\n- **enwiki-abstract** [[details here](docs/enwiki-abstract-benchmark/description.md)], from English-language [Wikipedia:Database](https://en.wikipedia.org/wiki/Wikipedia:Database_download) page abstracts. This use case generates 3 `TEXT` fields per document, and focusses on full text queries performance.\n\n- **enwiki-pages** [[details here](docs/enwiki-pages-benchmark/description.md)], from English-language [Wikipedia:Database](https://en.wikipedia.org/wiki/Wikipedia:Database_download) last page revisions, containing processed metadata extracted from the full Wikipedia XML dumppage abstracts. This use case generates 3 `TEXT` fields per document, and focuses on full text queries performance.\n\n- **ecommerce-inventory** [[details here](docs/ecommerce-inventory-benchmark/description.md)], from a base dataset of [10K fashion products on Amazon.com](https://data.world/promptcloud/fashion-products-on-amazon-com/workspace/file?filename=amazon_co-ecommerce_sample.csv) which are then multiplexed by categories, sellers, and countries to produce larger datasets (\u003e 1M documents). This benchmark focuses on updates and aggregate performance, splitting into Reads (FT.AGGREGATE), Cursor Reads (FT.CURSOR), and Updates (FT.ADD) the performance numbers. \nThe use case generates an index with 10 `TAG` fields (3 sortable and 1 non indexed), and 16 `NUMERIC` sortable non indexed fields per document.\nThe aggregate queries are designed to be extremely costly both on computation and network TX, given that each query aggregates and filters a large portion of the dataset while additionally loading 21 fields. Both the update and read rates can be adjusted.\n\n\n\n### Installation\n\n#### Download Standalone binaries ( no Golang needed )\n\nIf you don't have go on your machine and just want to use the produced binaries you can download the following prebuilt bins:\n\nhttps://github.com/RediSearch/ftsb/releases/latest\n\n| OS | Arch | Link |\n| :---         |     :---:      |          ---: |\n| Linux   | amd64  (64-bit X86)     | [ftsb_redisearch_linux_amd64](https://github.com/RediSearch/ftsb/releases/latest/download/ftsb_redisearch_linux_amd64.tar.gz)    |\n| Linux   | arm64 (64-bit ARM)     | [ftsb_redisearch_linux_arm64](https://github.com/RediSearch/ftsb/releases/latest/download/ftsb_redisearch_linux_arm64.tar.gz)    |\n| Darwin   | amd64  (64-bit X86)     | [ftsb_redisearch_darwin_amd64](https://github.com/RediSearch/ftsb/releases/latest/download/ftsb_redisearch_darwin_amd64.tar.gz)    |\n| Darwin   | arm64 (64-bit ARM)     | [ftsb_redisearch_darwin_arm64](https://github.com/RediSearch/ftsb/releases/latest/download/ftsb_redisearch_darwin_arm64.tar.gz)    |\n\nHere's how bash script to download and try it:\n\n```bash\nwget -c https://github.com/RediSearch/ftsb/releases/latest/download/ftsb_redisearch-$(uname -mrs | awk '{ print tolower($1) }')-$(dpkg --print-architecture).tar.gz -O - | tar -xz\n\n# give it a try\n./ftsb_redisearch --help\n```\n\n\n#### Installation in a Golang env\n\nTo install the benchmark utility with a Go Env do as follow:\n\n```bash\n# Fetch FTSB and its dependencies\ngo get github.com/RediSearch/ftsb\ncd $GOPATH/src/github.com/RediSearch/ftsb\n\n# Install desired binaries. At a minimum this includes ftsb_redisearch binary:\nmake\n\n# give it a try\n./bin/ftsb_redisearch --help\n```\n\n\n\n## How to use it?\n\nUsing FTSB for benchmarking involves 2 phases: data and query generation, and query execution.\n\n\n### Data and query generation ( single time step )\n\nSo that benchmarking results are not affected by generating data or queries on-the-fly, with FTSB you generate the data and queries you want to benchmark first, and then you can (re-)use it as input to the benchmarking phase. You can either use one of the pre-baked benchmark suites or develop one of your own. The requirement is that of the generated benchmark input file(s) they all respect the following:\n\n- CSV format, with one command per line. \n\n- On each line, the first three columns are related to the query type (READ, WRITE, UPDATE, DELETE, SETUP_WRITE), query group ( any unique identifier you like. example Q1 ), and key position. \n\n- The columns \u003e3 are the command and command arguments themselves, with one column per command argument. \n\nHere is an example of a CSV line:\n```\nWRITE,U1,2,FT.ADD,idx,doc1,1.0,FIELDS,title,hello world\n```\nwhich will translate to the following command being issued:\n```\nFT.ADD idx doc1 1.0 FIELDS title \"hello world\"\n```\nThe following links deep dive on:\n\n- Generating inputs from pre-baked benchmark suites (ecommerce-inventory , enwiki-abstract , enwiki-pages) \n\n- Generating your own use cases \n\nApart from the CSV files, and not mandatory, there is a benchmark suite specification that enables you to describe in detail the benchmark, what key metrics it provides, and how to automatically run more complex suites (with several steps, etc… ). This is not mandatory and for a simple benchmark, you just need to feed the CSV file as input. \n\n\n### Query execution ( benchmarking )\n\nSo that benchmarking results are not affected by generating data or queries on-the-fly, you are always required to feed an input file to the benchmark runner that respects the previous specification format. The overall idea is that the benchmark runner only concerns himself on executing the queries as fast as possible while enabling client runtime variations that influence performance ( and are not related to the use-case himself ) like, command pipelining ( auto pipelining based on time or number of commands ), cluster support, number of concurrent clients, rate limiting ( to find sustainable throughputs ), etc… \n\nRunning a benchmark is as simple as feeding an input file to the DB benchmark runner ( in this case ftsb_redisearch ):\n\n```bash\n\nftsb_redisearch --file ecommerce-inventory.redisearch.commands.BENCH.csv\n```\n\n\nThe resulting stdout output will look similar to this:\n\n```bash\n$ ftsb_redisearch --file ecommerce-inventory.redisearch.commands.BENCH.csv \n    setup writes/sec          writes/sec         updates/sec           reads/sec    cursor reads/sec         deletes/sec     current ops/sec           total ops             TX BW/sRX BW/s\n          0 (0.000)           0 (0.000)        1571 (2.623)         288 (7.451)           0 (0.000)           0 (0.000)        1859 (3.713)                1860             3.1KB/s  1.4MB/s\n          0 (0.000)           0 (0.000)        1692 (2.627)         287 (7.071)           0 (0.000)           0 (0.000)        1979 (3.597)                3839             3.3KB/s  1.4MB/s\n          0 (0.000)           0 (0.000)        1571 (2.761)         293 (7.087)           0 (0.000)           0 (0.000)        1864 (3.679)                5703             3.1KB/s  1.4MB/s\n          0 (0.000)           0 (0.000)        1541 (2.983)         280 (7.087)           0 (0.000)           0 (0.000)        1821 (3.739)                7524             3.1KB/s  1.4MB/s\n          0 (0.000)           0 (0.000)        1441 (2.989)         255 (7.375)           0 (0.000)           0 (0.000)        1696 (3.773)                9220             2.8KB/s  1.3MB/s\n\nSummary:\nIssued 9885 Commands in 5.455sec with 8 workers\n        Overall stats:\n        - Total 1812 ops/sec                    q50 lat 3.819 ms\n        - Setup Writes 0 ops/sec                q50 lat 0.000 ms\n        - Writes 0 ops/sec                      q50 lat 0.000 ms\n        - Reads 276 ops/sec                     q50 lat 7.531 ms\n        - Cursor Reads 0 ops/sec                q50 lat 0.000 ms\n        - Updates 1536 ops/sec                  q50 lat 3.117 ms\n        - Deletes 0 ops/sec                     q50 lat 0.000 ms\n        Overall TX Byte Rate: 3KB/sec\n        Overall RX Byte Rate: 1.4MB/sec\n```\n\n\nApart from the input file, you should also always specify the name of JSON output file to output benchmark results, in order to do more complex analysis or store the results. Here is the full list of supported options:\n\n```bash\n$ ./ftsb_redisearch --help\nUsage of ./bin/ftsb_redisearch:\n  -a string\n        Password for Redis Auth.\n  -cluster-mode\n        If set to true, it will run the client in cluster mode.\n  -continue-on-error\n        If set to true, it will continue the benchmark and print the error message to stderr.\n  -debug int\n        Debug printing (choices: 0, 1, 2). (default 0)\n  -do-benchmark\n        Whether to write databuild. Set this flag to false to check input read speed. (default true)\n  -host string\n        The host:port for Redis connection (default \"localhost:6379\")\n  -input string\n        File name to read databuild from\n  -json-out-file string\n        Name of json output file to output benchmark results. If not set, will not print to json.\n  -max-rps uint\n        enable limiting the rate of queries per second, 0 = no limit. By default no limit is specified and the binaries will stress the DB up to the maximum. A normal \"modus operandi\" would be to initially stress the system ( no limit on RPS) and afterwards that we know the limit vary with lower rps configurations.\n  -metadata-string string\n        Metadata string to add to json-out-file. If -json-out-file is not set, will not use this option.\n  -pipeline int\n        Pipeline \u003cnumreq\u003e requests. Default 1 (no pipeline). (default 1)\n  -reporting-period duration\n        Period to report write stats (default 1s)\n  -requests uint\n        Number of total requests to issue (0 = all of the present in input file).\n  -workers uint\n        Number of parallel clients inserting (default 8)\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredisearch%2Fftsb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fredisearch%2Fftsb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredisearch%2Fftsb/lists"}