Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/unidentifieddeveloper/blaze
A blazing fast exporter for your Elasticsearch data.
https://github.com/unidentifieddeveloper/blaze
data-dump data-export data-processing devops devops-tools elasticsearch libcurl rapidjson
Last synced: 3 days ago
JSON representation
A blazing fast exporter for your Elasticsearch data.
- Host: GitHub
- URL: https://github.com/unidentifieddeveloper/blaze
- Owner: unidentifieddeveloper
- License: mit
- Created: 2018-09-26T23:12:37.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-12-10T07:36:05.000Z (2 months ago)
- Last Synced: 2025-02-09T22:05:24.642Z (11 days ago)
- Topics: data-dump, data-export, data-processing, devops, devops-tools, elasticsearch, libcurl, rapidjson
- Language: C++
- Homepage:
- Size: 34.2 KB
- Stars: 61
- Watchers: 2
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-blazingly-fast - blaze - A blazing fast exporter for your Elasticsearch data. (C++)
README
# Blaze
Are you running Elasticsearch? Want to take your data and get the heck outta
Dodge? **Blaze** provides everything you need in a neat, blazing fast package!| **Linux / OSX** |
| --------------- |
| [](https://github.com/unidentifieddeveloper/blaze/actions?query=branch%3Amaster) |## Features
- Uses the [Elasticsearch sliced scroll API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html) to get your data hella fast.
- Written in modern C++ using [libcurl](https://github.com/curl/curl) and [RapidJSON](https://github.com/Tencent/RapidJSON).
- Distributed as a single, tiny binary.### Performance
Blaze compared to other Elasticsearch dump tools. The index has ~3.5M rows and
is ~5GB in size. Each tool is timed with `time` and measures the time to write
a simple JSON dump file.| **Tool** | **Time** |
| ----------- | -------- |
| Blaze | 00m40s |
| elasticdump | 04m38s |## Usage
Get the binary for your platform from the Releases page or compile it yourself.
If you use it often it might make sense to put it in your `PATH` somewhere.```sh
$ blaze --host=http://localhost:9200 --index=massive_1 > dump.ndjson
```This will connect to Elasticsearch on the specified host and start downloading
the `massive_1` index to *stdout*. Make sure to redirect this somewhere, such as
a JSON file.### Output format
Blaze will dump everything to *stdout* in a format compatible with the
Elasticsearch Bulk API, meaning you can use `curl` to put the data back.```sh
curl -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/other_data/_bulk --data-binary "@dump.ndjson"
```One issue when working with large datasets is that Elasticsearch has an upper
limit on the size of HTTP requests (2GB). The solution is to split the file
with something like `parallel`. The split should be done on even line numbers
since each command is actually two lines in the file.```sh
cat dump.ndjson | parallel --pipe -l 50000 curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/other_data/_bulk --data-binary "@-"
```### Command line options
- `--host=` - the host where Elasticsearch is running.
- `--index=` - the index to dump.
- `--slices=` - *(optional)* the number of slices to split the scroll. Should be set to the
number of shards for the index (as seen on `/_cat/indices`). Defaults to *5*.
- `--size=` - *(optional)* the size of the response (i.e, length of the `hits` array).
Defaults to *5000*.
- `--dump-mappings` - specify this flag to dump the index mappings instead of the source.
- `--dump-index-info` - specify this flag to dump the full index information (settings and mappings) instead of the source.#### Authentication
To use HTTP Basic authentication you need to pass the following options. *Note*
that passing a password on the command line will put it in your terminal
history, so please use with care.- `--auth=basic` - enable HTTP Basic authentication.
- `--basic-username=foo` - the username.
- `--basic-password=bar` - the password.
- `--insecure` - For HTTPS connections, specify this flag to skip server certificate validation.## Building from source
Building Blaze is easy. It requires `libcurl`.
### On Linux (and OSX)
```sh
$ git submodule update --init
$ make
```### Run it from docker
```terminal
docker build -t blaze .
docker run -it blaze blaze
```## License
Copyright © Viktor Elofsson and contributors.
Blaze is provided as-is under the MIT license. For more information see
[LICENSE](https://github.com/vktr/blaze/blob/master/LICENSE).- For libcurl, see https://curl.haxx.se/docs/copyright.html
- For RapidJSON, see https://github.com/Tencent/rapidjson/blob/master/license.txt