{"id":13523093,"url":"https://github.com/medcl/esm","last_synced_at":"2025-12-15T20:34:56.525Z","repository":{"id":34981467,"uuid":"39061988","full_name":"medcl/esm","owner":"medcl","description":"An simple Elasticsearch migration tool.","archived":false,"fork":true,"pushed_at":"2024-02-28T08:51:56.000Z","size":8998,"stargazers_count":815,"open_issues_count":15,"forks_count":257,"subscribers_count":41,"default_branch":"master","last_synced_at":"2024-04-10T00:12:34.300Z","etag":null,"topics":["dump","elasticsearch","migration","opensearch","testing"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"hoffoo/elasticsearch-dump","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/medcl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"patreon":"medcl","custom":["https://www.buymeacoffee.com/medcl"]}},"created_at":"2015-07-14T07:59:54.000Z","updated_at":"2024-04-03T07:56:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/medcl/esm","commit_stats":null,"previous_names":["medcl/esm-abandoned","medcl/elasticsearch-migration","medcl/elasticsearch-dump"],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medcl%2Fesm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medcl%2Fesm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medcl%2Fesm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medcl%2Fesm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/medcl","download_url":"https://codeload.github.com/medcl/esm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222688173,"owners_count":17023297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dump","elasticsearch","migration","opensearch","testing"],"created_at":"2024-08-01T06:00:55.416Z","updated_at":"2025-12-15T20:34:56.441Z","avatar_url":"https://github.com/medcl.png","language":"Go","funding_links":["https://patreon.com/medcl","https://www.buymeacoffee.com/medcl"],"categories":["Go","ElasticSearch-Manage"],"sub_categories":[],"readme":"# An Elasticsearch Migration Tool\n\nElasticsearch cross version data migration.\n\nLinks:\n- [Dec 3rd, 2020: [EN] Cross version Elasticsearch data migration with ESM](https://discuss.elastic.co/t/dec-3rd-2020-en-cross-version-elasticsearch-data-migration-with-esm/256516)\n- [Use INFINI Gateway to check the Document-Level differences between two clusters or indices after the migration](https://gateway.infinilabs.com/docs/tutorial/index_diff/)\n\n## Features:\n\n*  Cross version migration supported\n*  Overwrite index name\n*  Copy index settings and mapping\n*  Support http basic auth\n*  Support dump index to local file\n*  Support loading index from local file\n*  Support http proxy\n*  Support sliced scroll ( elasticsearch 5.0 +)\n*  Support run in background\n*  Generate testing data by randomize the source document id\n*  Support rename filed name\n*  Support unify document type name\n*  Support specify which _source fields to return from source\n*  Support specify query string query to filter the data source\n*  Support rename source fields while do bulk indexing\n*  Support incremental update(add/update/delete changed records) with `--sync`. Notice: it use different implementation, just handle the ***changed*** records, but not as fast as the old way\n*  Load generating with \n\n## ESM is fast!\n\nA 3 nodes cluster(3 * c5d.4xlarge， 16C，32GB，10Gbps)\n\n```\nroot@ip-172-31-13-181:/tmp# ./esm -s https://localhost:8000 -d https://localhost:8000 -x logs1kw -y logs122 -m elastic:medcl123 -n elastic:medcl123 -w 40 --sliced_scroll_size=60 -b 5 --buffer_count=2000000  --regenerate_id\n[12-19 06:31:20] [INF] [main.go:506,main] start data migration..\nScroll 10064570 / 10064570 [=================================================] 100.00% 55s\nBulk 10062602 / 10064570 [==================================================]  99.98% 55s\n[12-19 06:32:15] [INF] [main.go:537,main] data migration finished.\n```\nMigrated 10,000,000 documents within a minute, Nginx log generated from kibana_sample_data_logs.\n\n\n## Before ESM\n\nBefore running the esm, please manually prepare the target index with mapping and optimized settings to improve the speed, for example:\n\n```\nPUT your-new-index\n{\n  \"settings\": {\n    \"index.translog.durability\": \"async\", \n    \"refresh_interval\": \"-1\", \n    \"number_of_shards\": 10,\n    \"number_of_replicas\": 0\n  }\n}\n```\n\n## Example:\n\ncopy index `index_name` from `192.168.1.x` to `192.168.1.y:9200`\n\n```\n./bin/esm  -s http://192.168.1.x:9200   -d http://192.168.1.y:9200 -x index_name  -w=5 -b=10 -c 10000\n```\n\ncopy index `src_index` from `192.168.1.x` to `192.168.1.y:9200` and save with `dest_index`\n\n```\n./bin/esm -s http://localhost:9200 -d http://localhost:9200 -x src_index -y dest_index -w=5 -b=100\n```\n\nuse sync feature for incremental update index `src_index` from `192.168.1.x` to `192.168.1.y:9200`\n```\n./bin/esm --sync -s http://localhost:9200 -d http://localhost:9200 -x src_index -y dest_index\n```\n\nsupport Basic-Auth\n```\n./bin/esm -s http://localhost:9200 -x \"src_index\" -y \"dest_index\"  -d http://localhost:9201 -n admin:111111\n```\n\ncopy settings and override shard size\n```\n./bin/esm -s http://localhost:9200 -x \"src_index\" -y \"dest_index\"  -d http://localhost:9201 -m admin:111111 -c 10000 --shards=50  --copy_settings\n\n```\n\ncopy settings and mapping, recreate target index, add query to source fetch, refresh after migration\n```\n./bin/esm -s http://localhost:9200 -x \"src_index\" -q=query:phone -y \"dest_index\"  -d http://localhost:9201  -c 10000 --shards=5  --copy_settings --copy_mappings --force  --refresh\n\n```\n\ndump elasticsearch documents into local file\n```\n./bin/esm -s http://localhost:9200 -x \"src_index\"  -m admin:111111 -c 5000 -q=query:mixer  --refresh -o=dump.bin \n```\n\ndump source and target index to local file and compare them, so can find the difference quickly\n```\n./bin/esm --sort=_id -s http://localhost:9200 -x \"src_index\" --truncate_output --skip=_index -o=src.json\n./bin/esm --sort=_id -s http://localhost:9200 -x \"dst_index\" --truncate_output --skip=_index -o=dst.json\ndiff -W 200 -ry --suppress-common-lines src.json dst.json\n```\n\nloading data from dump files, bulk insert to another es instance\n```\n./bin/esm -d http://localhost:9200 -y \"dest_index\"   -n admin:111111 -c 5000 -b 5 --refresh -i=dump.bin\n```\n\nsupport proxy\n```\n ./bin/esm -d http://123345.ap-northeast-1.aws.found.io:9200 -y \"dest_index\"   -n admin:111111  -c 5000 -b 1 --refresh  -i dump.bin  --dest_proxy=http://127.0.0.1:9743\n```\n\nuse sliced scroll(only available in elasticsearch v5) to speed scroll, and update shard number\n```\n ./bin/esm -s=http://192.168.3.206:9200 -d=http://localhost:9200 -n=elastic:changeme -f --copy_settings --copy_mappings -x=bestbuykaggle  --sliced_scroll_size=5 --shards=50 --refresh\n```\n\nmigrate 5.x to 6.x and unify all the types to `doc`\n```\n./esm -s http://source_es:9200 -x \"source_index*\"  -u \"doc\" -w 10 -b 10 - -t \"10m\" -d https://target_es:9200 -m elastic:passwd -n elastic:passwd -c 5000 \n\n```\n\nto migrate version 7.x and you may need to rename `_type` to `_doc`\n```\n./esm -s http://localhost:9201 -x \"source\" -y \"target\"  -d https://localhost:9200 --rename=\"_type:type,age:myage\"  -u\"_doc\"\n\n```\n\nfilter migration with range query\n\n```\n./esm -s https://192.168.3.98:9200 -m elastic:password -o json.out -x kibana_sample_data_ecommerce -q \"order_date:[2020-02-01T21:59:02+00:00 TO 2020-03-01T21:59:02+00:00]\"\n\n```\n\nrange query, keyword type and escape\n\n```\n./esm -s https://192.168.3.98:9200 -m test:123 -o 1.txt -x test1  -q \"@timestamp.keyword:[\\\"2021-01-17 03:41:20\\\" TO \\\"2021-03-17 03:41:20\\\"]\"\n```\n\ngenerate testing data, if `input.json` contains 10 documents, the follow command will ingest 100 documents, good for testing\n```\n./bin/esm -i input.json -d  http://localhost:9201 -y target-index1  --regenerate_id  --repeat_times=10 \n```\n\nselect source fields\n\n```\n ./bin/esm -s http://localhost:9201 -x my_index -o dump.json --fields=author,title\n```\n\nrename fields while do bulk indexing\n\n```\n./bin/esm -i dump.json -d  http://localhost:9201 -y target-index41  --rename=title:newtitle\n```\n\nuser buffer_count to control memory used by ESM， and use gzip to compress network traffic\n```\n./esm -s https://localhost:8000 -d https://localhost:8000 -x logs1kw -y logs122 -m elastic:medcl123 -n elastic:medcl123 --regenerate_id -w 20 --sliced_scroll_size=60 -b 5 --buffer_count=1000000 --compress false \n```\n\n## Download\nhttps://github.com/medcl/esm/releases\n\n\n## Compile:\nif download version is not fill you environment,you may try to compile it yourself. `go` required.\n\n`make build`\n* go version \u003e= 1.7\n\n## Options\n\n```\nUsage:\n  esm [OPTIONS]\n\nApplication Options:\n  -s, --source=                    source elasticsearch instance, ie: http://localhost:9200\n  -q, --query=                     query against source elasticsearch instance, filter data before migrate, ie: name:medcl\n      --sort=                      sort field when scroll, ie: _id (default: _id)\n  -d, --dest=                      destination elasticsearch instance, ie: http://localhost:9201\n  -m, --source_auth=               basic auth of source elasticsearch instance, ie: user:pass\n  -n, --dest_auth=                 basic auth of target elasticsearch instance, ie: user:pass\n  -c, --count=                     number of documents at a time: ie \"size\" in the scroll request (10000)\n      --buffer_count=              number of buffered documents in memory (100000)\n  -w, --workers=                   concurrency number for bulk workers (1)\n  -b, --bulk_size=                 bulk size in MB (5)\n  -t, --time=                      scroll time (1m)\n      --sliced_scroll_size=        size of sliced scroll, to make it work, the size should be \u003e 1 (1)\n  -f, --force                      delete destination index before copying\n  -a, --all                        copy indexes starting with . and _\n      --copy_settings              copy index settings from source\n      --copy_mappings              copy index mappings from source\n      --shards=                    set a number of shards on newly created indexes\n  -x, --src_indexes=               indexes name to copy,support regex and comma separated list (_all)\n  -y, --dest_index=                indexes name to save, allow only one indexname, original indexname will be used if not specified\n  -u, --type_override=             override type name\n      --green                      wait for both hosts cluster status to be green before dump. otherwise yellow is okay\n  -v, --log=                       setting log level,options:trace,debug,info,warn,error (INFO)\n  -o, --output_file=               output documents of source index into local file\n      --truncate_output=           truncate before dump to output file\n  -i, --input_file=                indexing from local dump file\n      --input_file_type=           the data type of input file, options: dump, json_line, json_array, log_line (dump)\n      --source_proxy=              set proxy to source http connections, ie: http://127.0.0.1:8080\n      --dest_proxy=                set proxy to target http connections, ie: http://127.0.0.1:8080\n      --refresh                    refresh after migration finished\n      --sync=                      sync will use scroll for both source and target index, compare the data and sync(index/update/delete)\n      --fields=                    filter source fields(white list), comma separated, ie: col1,col2,col3,...\n      --skip=                      skip source fields(black list), comma separated, ie: col1,col2,col3,...\n      --rename=                    rename source fields, comma separated, ie: _type:type, name:myname\n  -l, --logstash_endpoint=         target logstash tcp endpoint, ie: 127.0.0.1:5055\n      --secured_logstash_endpoint  target logstash tcp endpoint was secured by TLS\n      --repeat_times=              repeat the data from source N times to dest output, use align with parameter regenerate_id to amplify the data size\n  -r, --regenerate_id              regenerate id for documents, this will override the exist document id in data source\n      --compress                   use gzip to compress traffic\n  -p, --sleep=                     sleep N seconds after finished a bulk request (-1)\n\nHelp Options:\n  -h, --help                       Show this help message\n\n\n```\n\n## FAQ\n\n- Scroll ID too long, update `elasticsearch.yml` on source cluster.\n\n```\nhttp.max_header_size: 16k\nhttp.max_initial_line_length: 8k\n```\n\nVersions\n--------\n\nFrom       | To\n-----------|-----------\n1.x | 1.x\n1.x | 2.x\n1.x | 5.x\n1.x | 6.x\n1.x | 7.x\n2.x | 1.x\n2.x | 2.x\n2.x | 5.x\n2.x | 6.x\n2.x | 7.x\n5.x | 1.x\n5.x | 2.x\n5.x | 5.x\n5.x | 6.x\n5.x | 7.x\n6.x | 1.x\n6.x | 2.x\n6.x | 5.0\n6.x | 6.x\n6.x | 7.x\n7.x | 1.x\n7.x | 2.x\n7.x | 5.x\n7.x | 6.x\n7.x | 7.x\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedcl%2Fesm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmedcl%2Fesm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedcl%2Fesm/lists"}