{"id":13514859,"url":"https://github.com/unidentifieddeveloper/blaze","last_synced_at":"2025-04-09T17:24:22.083Z","repository":{"id":45290219,"uuid":"150500571","full_name":"unidentifieddeveloper/blaze","owner":"unidentifieddeveloper","description":"A blazing fast exporter for your Elasticsearch data.","archived":false,"fork":false,"pushed_at":"2024-12-10T07:36:05.000Z","size":35,"stargazers_count":62,"open_issues_count":2,"forks_count":9,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-02T11:51:39.055Z","etag":null,"topics":["data-dump","data-export","data-processing","devops","devops-tools","elasticsearch","libcurl","rapidjson"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unidentifieddeveloper.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-26T23:12:37.000Z","updated_at":"2025-02-27T10:22:48.000Z","dependencies_parsed_at":"2025-01-13T13:20:36.564Z","dependency_job_id":null,"html_url":"https://github.com/unidentifieddeveloper/blaze","commit_stats":{"total_commits":36,"total_committers":5,"mean_commits":7.2,"dds":0.2222222222222222,"last_synced_commit":"095fa56db4b29fbef02b8c07f69ca1668a8205ea"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unidentifieddeveloper%2Fblaze","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unidentifieddeveloper%2Fblaze/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unidentifieddeveloper%2Fblaze/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unidentifieddeveloper%2Fblaze/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unidentifieddeveloper","download_url":"https://codeload.github.com/unidentifieddeveloper/blaze/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248075592,"owners_count":21043619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-dump","data-export","data-processing","devops","devops-tools","elasticsearch","libcurl","rapidjson"],"created_at":"2024-08-01T05:01:02.793Z","updated_at":"2025-04-09T17:24:22.066Z","avatar_url":"https://github.com/unidentifieddeveloper.png","language":"C++","readme":"# Blaze\n\nAre you running Elasticsearch? Want to take your data and get the heck outta\nDodge? **Blaze** provides everything you need in a neat, blazing fast package!\n\n| **Linux / OSX** |\n| --------------- |\n| [![Build Status](https://github.com/unidentifieddeveloper/blaze/workflows/CI/badge.svg?branch=master)](https://github.com/unidentifieddeveloper/blaze/actions?query=branch%3Amaster) |\n\n\n## Features\n\n - Uses the [Elasticsearch sliced scroll API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html) to get your data hella fast.\n - Written in modern C++ using [libcurl](https://github.com/curl/curl) and [RapidJSON](https://github.com/Tencent/RapidJSON).\n - Distributed as a single, tiny binary.\n\n\n ### Performance\n\nBlaze compared to other Elasticsearch dump tools. The index has ~3.5M rows and\nis ~5GB in size. Each tool is timed with `time` and measures the time to write\na simple JSON dump file.\n\n| **Tool**    | **Time** |\n| ----------- | -------- |\n| Blaze       | 00m40s   |\n| elasticdump | 04m38s   |\n\n\n## Usage\n\nGet the binary for your platform from the Releases page or compile it yourself.\nIf you use it often it might make sense to put it in your `PATH` somewhere.\n\n```sh\n$ blaze --host=http://localhost:9200 --index=massive_1 \u003e dump.ndjson\n```\n\nThis will connect to Elasticsearch on the specified host and start downloading\nthe `massive_1` index to *stdout*. Make sure to redirect this somewhere, such as\na JSON file.\n\n\n### Output format\n\nBlaze will dump everything to *stdout* in a format compatible with the\nElasticsearch Bulk API, meaning you can use `curl` to put the data back.\n\n```sh\ncurl -H \"Content-Type: application/x-ndjson\" -XPOST localhost:9200/other_data/_bulk --data-binary \"@dump.ndjson\"\n```\n\nOne issue when working with large datasets is that Elasticsearch has an upper\nlimit on the size of HTTP requests (2GB). The solution is to split the file\nwith something like `parallel`. The split should be done on even line numbers\nsince each command is actually two lines in the file.\n\n```sh\ncat dump.ndjson | parallel --pipe -l 50000 curl -s -H \"Content-Type: application/x-ndjson\" -XPOST localhost:9200/other_data/_bulk --data-binary \"@-\"\n```\n\n\n### Command line options\n\n - `--host=\u003cvalue\u003e` - the host where Elasticsearch is running.\n - `--index=\u003cvalue\u003e` - the index to dump.\n - `--slices=\u003cvalue\u003e` - *(optional)* the number of slices to split the scroll. Should be set to the\n   number of shards for the index (as seen on `/_cat/indices`). Defaults to *5*.\n - `--size=\u003cvalue\u003e` - *(optional)* the size of the response (i.e, length of the `hits` array).\n   Defaults to *5000*.\n - `--dump-mappings` - specify this flag to dump the index mappings instead of the source.\n - `--dump-index-info` - specify this flag to dump the full index information (settings and mappings) instead of the source.\n\n#### Authentication\n\nTo use HTTP Basic authentication you need to pass the following options. *Note*\nthat passing a password on the command line will put it in your terminal\nhistory, so please use with care.\n\n - `--auth=basic` - enable HTTP Basic authentication.\n - `--basic-username=foo` - the username.\n - `--basic-password=bar` - the password.\n - `--insecure` - For HTTPS connections, specify this flag to skip server certificate validation.\n\n## Building from source\n\nBuilding Blaze is easy. It requires `libcurl`.\n\n### On Linux (and OSX)\n\n```sh\n$ git submodule update --init\n$ make\n```\n\n### Run it from docker\n\n```terminal\ndocker build -t blaze .\ndocker run -it blaze blaze\n```\n\n## License\n\nCopyright © Viktor Elofsson and contributors.\n\nBlaze is provided as-is under the MIT license. For more information see\n[LICENSE](https://github.com/vktr/blaze/blob/master/LICENSE).\n\n - For libcurl, see https://curl.haxx.se/docs/copyright.html\n - For RapidJSON, see https://github.com/Tencent/rapidjson/blob/master/license.txt\n","funding_links":[],"categories":["C++"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funidentifieddeveloper%2Fblaze","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funidentifieddeveloper%2Fblaze","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funidentifieddeveloper%2Fblaze/lists"}