{"id":43352062,"url":"https://github.com/hchargois/esdump","last_synced_at":"2026-02-02T02:30:44.137Z","repository":{"id":225735453,"uuid":"766709380","full_name":"hchargois/esdump","owner":"hchargois","description":"Dump an Elasticsearch (v7 or v8) index via scrolling. Achieve speeds up to 1 million doc/s on a single node","archived":false,"fork":false,"pushed_at":"2026-01-31T02:30:44.000Z","size":41,"stargazers_count":8,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-31T16:54:19.175Z","etag":null,"topics":["backup","cli","elasticsearch","export","json"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hchargois.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-04T01:04:38.000Z","updated_at":"2026-01-31T02:30:47.000Z","dependencies_parsed_at":"2024-06-21T13:04:19.689Z","dependency_job_id":null,"html_url":"https://github.com/hchargois/esdump","commit_stats":null,"previous_names":["hchargois/esdump"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/hchargois/esdump","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchargois%2Fesdump","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchargois%2Fesdump/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchargois%2Fesdump/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchargois%2Fesdump/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hchargois","download_url":"https://codeload.github.com/hchargois/esdump/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchargois%2Fesdump/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29001654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-02T01:32:03.847Z","status":"online","status_checked_at":"2026-02-02T02:00:07.448Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backup","cli","elasticsearch","export","json"],"created_at":"2026-02-02T02:30:44.059Z","updated_at":"2026-02-02T02:30:44.129Z","avatar_url":"https://github.com/hchargois.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# esdump\n\n**esdump** is a _simple_ and _efficient_ CLI tool to dump (retrieve) the documents contained in an Elasticsearch index via scrolling.\n\nIt outputs to standard output, in JSON lines (a.k.a. JSONL or NDJSON) format.\n\nIt works with Elasticsearch versions 7.x and 8.x.\n\n# Features\n\n- Correctly clears the scroll contexts before exiting, to free Elasticsearch resources\n- Automatically uses slicing for best performance\n- You can specify which document fields to output\n- Options to scroll in a random order and only a given number of docs, to easily get samples\n- You can specify a query in full Elasticsearch format or in the `query_string` (a.k.a. Lucene) syntax\n- Adaptive throttling\n- Works with security (username/password \u0026 HTTPS with custom certificate)\n- Option to turn HTTP gzip compression off (uses less CPU if network is not a bottleneck)\n- ... and more!\n\n# Install\n\nRequires Go \u003e= 1.18\n\n    go install github.com/hchargois/esdump@latest\n\n# Examples\n\nUsage is detailed in sections below, but here are a few simple examples:\n\n    # Dump an index to a file using default settings\n    esdump http://localhost myindex \u003e out.jsonl\n\n    # Dump multiple indexes using multi-target notation\n    esdump http://localhost myindex1,myindex2*\n\n    # Select the document fields to dump\n    esdump http://localhost myindex --fields id,date,description\n\n    # Specify a query in \"query_string\" format\n    esdump http://localhost myindex --query \"rabbit OR bunny\"\n\n    # Specify a search query on standard input\n    echo '{\"query\": {\"term\": {\"animal\": \"rabbit\"}}}' | esdump http://localhost myindex\n\n    # Dump a random sample of 1000 documents\n    esdump http://localhost myindex --random --count 1000\n\n    # Access an Elasticsearch server secured with TLS (with a custom cert) and username/password\n    esdump https://user:pass@localhost myindex --verify=cacert.pem\n\n# Usage\n\n    esdump base-url index-target [flags]\n\n    Arguments:\n\n      base-url      The base URL of the Elasticsearch server (e.g. http://localhost)\n                    If the port is not specified, 9200 is assumed\n      index-target  The name of the index you want to dump. Multi-target syntax is\n                    also supported (e.g. myindex1,myindex2 or myindex*)\n    \n    Flags:\n    \n      -f, --fields string             comma-separated list of fields to include in the output, or if starting with ^ to exclude\n      -q, --query string              filter the documents with a \"query_string\" query\n      -t, --throttle float32          delay factor for adaptive throttling, set 0 to disable throttling (default 4)\n      -n, --count uint                output that many documents maximum (default unlimited)\n      -s, --scroll-size int           number of hits per scroll request (default 1000)\n      -m, --metadata                  include hit metadata (_index, _id, _source...), if not set only outputs the contents of _source\n      -M, --metadata-only             only include hit metadata (_index, _id...), no _source\n      -r, --random                    dump the documents in a random order\n      -z, --no-compression            disable HTTP gzip compression\n          --verify string             certificate file to verify the server's certificate, or \"no\" to skip all TLS verification\n          --slices int                max number of slices per index (default 10)\n          --scroll-timeout duration   scroll timeout (default 1m0s)\n          --http-timeout duration     HTTP client timeout (default 1m0s)\n\n# How to...\n\n## Filter the documents to dump\n\nThere are two options to selectively dump a subset of the index documents.\n\nThe simple one is using the `-q` flag with a \"query string\" query, described here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html\n\nThe more advanced one is supplying a full Elasticsearch query to standard input. The format to use in this case is the format that is suitable for the `/_search` Elasticsearch endpoint, i.e. a JSON object containing a `\"query\"` key. You can also use this to set a custom sort order.\n\n## Choose what to dump\n\nBy default, esdump dumps only the documents, i.e. the contents of the `\"_source\"` in the Elasticsearch hits:\n\n    {\"title\": \"lorem ipsum\", \"price\": 1.23}\n    {\"title\": \"dolor sit amet\", \"price\": 4.56}\n    {\"title\": \"consectetur adipiscing elit\", \"price\": 7.89}\n\nWith the `-m`/`--metadata` flag, the documents are wrapped with their metadata:\n\n    {\"_index\": \"items\", \"_id\": \"1\", \"_source\": {\"title\": \"lorem ipsum\", \"price\": 1.23}}\n    {\"_index\": \"items\", \"_id\": \"2\", \"_source\": {\"title\": \"dolor sit amet\", \"price\": 4.56}}\n    {\"_index\": \"items\", \"_id\": \"3\", \"_source\": {\"title\": \"consectetur adipiscing elit\", \"price\": 7.89}}\n\nWith the `-M`/`--metadata-only` flag, only the metadata is output, which is the fastest way to retrieve only the `_id`:\n\n    {\"_index\": \"items\", \"_id\": \"1\"}\n    {\"_index\": \"items\", \"_id\": \"2\"}\n    {\"_index\": \"items\", \"_id\": \"3\"}\n\nThe `-f`/`--fields` can be used to specify a set of fields to include or exclude:\n\n* `-f a,b,c` will only output the fields a, b and c\n* `-f ^a,b,c` will output all the fields except a, b and c\n\n## Adjust the load on the server with adaptive throttling\n\nesdump uses a very simple but effective throttling algorithm that automatically adapts to the capabilities and current load of the Elasticsearch cluster.\n\nInstead of using the usual token bucket with a fixed rate, or a fixed delay between requests, you can set a _relative_ throttle factor, which depends on the time taken by the last request.\n\nFor example, if you set a throttle factor of 10 (`-t10`), and the first scroll request takes 10 ms, then esdump will sleep for 100 ms before sending the next scroll request. If the next request takes only 5 ms, esdump will then sleep for 50 ms. This ensures that the scroll requests take a constant proportion of the cluster's load, even if it becomes more or less loaded for other reasons.\n\nBy default, a throttle factor of 4 is used.\n\nTo completely disable throttling, set a 0 throttle factor (`-t0`).\n\n## Go fast\n\n* Disable throttling with `-t0`\n* Select only the required fields with `-f`...\n* ... or if you only need the document `_id`s, use `-M` to only retrieve the metadata\n* If the network is fast, try disabling the gzip compression with `-z` (automatically done if the server is on a loopback address)\n* Increase the maximum number of slices with `--slices`; but this will only have an effect if your index has at least as many shards\n* Do not use random scrolling (no `-r`); do not specify a custom `sort` order in a query supplied on stdin\n\n## Work with a secured Elasticsearch cluster\n\nIf the cluster uses TLS, make sure to use the `HTTPS` scheme in the URL:\n\n    https://localhost\n\nBy default, your host's CA bundle is used to verify the server certificate. To specify a custom trusted certificate, use `--verify`:\n\n    esdump https://localhost myindex --verify cert.pem\n\n... or turn off certificate validation with `--verify no`\n\nTo use a username and password, simply include them in the URL:\n\n    https://username:password@localhost\n\n# Benchmark\n\nAgainst what seems to be the most popular alternative, [elasticsearch-dump/elasticsearch-dump](https://github.com/elasticsearch-dump/elasticsearch-dump) (7k+ stars!)\n\nDumping an index that has 2 million documents of around 500 bytes each, in 10 shards, on a single node (a desktop computer) on localhost.\n\n| command | time | speed |\n| ------- | ---- | ----- |\n| `elasticdump --input=http://localhost:9200/testindex1 --output=out.json` | 5 hours 33 min | 100 docs/s |\n| `esdump http://localhost testindex1 \u003e out.json` (default, 4x throttling) | 7.28 s | 274 774 docs/s |\n| `esdump http://localhost testindex1 -t0 \u003e out.json` (no throttling) | 1.67 s | 1 199 154 docs/s |\n\nNo, there is no error. `esdump` really is _ten thousand times_ faster than `elasticdump`.\nThe (main) reason is that `elasticdump` has an insane throttling of 5 requests every 5 seconds.\nAnd the worst part of it? It is hardcoded, so it can't be configured, and it's also undocumented.\nYep. It's that bad.\n\n# Alternatives\n\nYou may ask, _surely_ there must already exist such a tool, right? Well, I searched for a few hours and couldn't find something that \"just worked\". So I had to make my own...\n\n* elasticsearch-dump/elasticsearch-dump: apart from the throttling issue described above, it also doesn't use slicing, and doesn't clear scroll contexts...\n* [miku/esdump](https://github.com/miku/esdump): does not produce JSONL, cannot specify an Elasticsearch query (only a Lucene one), no slicing...\n* [wubin1989/esdump](https://github.com/wubin1989/esdump): seems that it cannot actually dump but only reindex into another Elasticsearch index?\n* [shinexia/elasticdump](https://github.com/shinexia/elasticdump): cannot dump to stdout, doesn't use slicing...\n\nThere are also a few that are simply too old and don't work with recent (\u003e=7.x) Elasticsearch versions:\n\n* [wricardo/esdump](https://github.com/wricardo/esdump)\n* [berglh/escroll](https://github.com/berglh/escroll)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhchargois%2Fesdump","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhchargois%2Fesdump","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhchargois%2Fesdump/lists"}