Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/trocco-io/embulk-input-elasticsearch
Elasticsearch input plugin for Embulk. parallel query support.
https://github.com/trocco-io/embulk-input-elasticsearch
elasticsearch embulk embulk-input-plugin jruby
Last synced: 6 days ago
JSON representation
Elasticsearch input plugin for Embulk. parallel query support.
- Host: GitHub
- URL: https://github.com/trocco-io/embulk-input-elasticsearch
- Owner: trocco-io
- License: mit
- Created: 2016-06-03T05:34:58.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2019-10-28T13:30:51.000Z (about 5 years ago)
- Last Synced: 2024-10-19T16:48:30.352Z (19 days ago)
- Topics: elasticsearch, embulk, embulk-input-plugin, jruby
- Language: Ruby
- Homepage:
- Size: 29.3 KB
- Stars: 6
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Elasticsearch input plugin for Embulk [![Build Status](https://secure.travis-ci.org/toyama0919/embulk-input-elasticsearch.png?branch=master)](http://travis-ci.org/toyama0919/embulk-input-elasticsearch) [![Gem Version](https://badge.fury.io/rb/embulk-input-elasticsearch.svg)](http://badge.fury.io/rb/embulk-input-elasticsearch)
## Overview
* **Plugin type**: input
* **Resume supported**: yes
* **Cleanup supported**: yes
* **Guess supported**: no## Configuration
- **nodes**: nodes (array, required)
- **host**: host (string, required)
- **port**: port (integer, required)
- **queries**: lucene query array. (array, required)
- **index**: index (string, required)
- **index_type**: index_type (string)
- **request_timeout**: request timeout (integer)
- **per_size**: per size query. (integer, required, default: `1000`)
- **limit_size**: limit size unit query. (integer, default: unlimit)
- **num_threads**: number of threads for queries. (integer, default: 1)
- **retry_on_failure**: retry on failure. set 0 is retry forever. (integer, default: 5)
- **sort**: sort order. (hash, default: nil)
- **scroll**: scroll. to keep the search context. (string, default: '1m')
- **fields**: fields (array, required)
- **name**: name (string, required)
- **type**: type (string, required)
- **metadata**: metadata (boolean, default: false)
- **time_format**: time_format (string)## Example
```yaml
in:
type: elasticsearch
nodes:
- {host: localhost, port: 9200}
queries:
- 'page_type: HP'
- 'page_type: GP'
index: crawl
index_type: m_corporation_page
request_timeout: 60
per_size: 1000
limit_size: 200000
num_threads: 2
sort:
m_corporation_id: desc
employee_range: asc
fields:
- { name: _id, type: string, metadata: true }
- { name: _type, type: string, metadata: true }
- { name: _index, type: string, metadata: true }
- { name: _score, type: double, metadata: true }
- { name: page_type, type: string }
- { name: corp_name, type: string }
- { name: corp_key, type: string }
- { name: title, type: string }
- { name: body, type: string }
- { name: url, type: string }
- { name: employee_range, type: long }
- { name: m_corporation_id, type: long }
- { name: cg_lv1, type: json }
- { name: cg_lv2, type: json }
- { name: cg_lv3, type: json }
```## Support Type
* string
* long
* double
* timestamp
* json
* boolean## test
### setup
```
curl -o embulk.jar --create-dirs -L "http://dl.embulk.org/embulk-latest.jar"
chmod +x embulk.jar
./embulk.jar gem install bundler
./embulk.jar bundle install --path vendor/bundle
```### run test
```
./embulk.jar bundle exec rake test
```## Build
```
$ rake
```