Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/slub/es2json
elasticsearch bulk harvester which is using the scroll-API
https://github.com/slub/es2json
elasticsearch elasticsearch-client json line-delimited-json python3
Last synced: 27 days ago
JSON representation
elasticsearch bulk harvester which is using the scroll-API
- Host: GitHub
- URL: https://github.com/slub/es2json
- Owner: slub
- License: apache-2.0
- Created: 2020-01-30T07:11:12.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-12-14T15:47:18.000Z (about 1 year ago)
- Last Synced: 2024-04-14T22:49:12.599Z (8 months ago)
- Topics: elasticsearch, elasticsearch-client, json, line-delimited-json, python3
- Language: Python
- Size: 123 KB
- Stars: 1
- Watchers: 7
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# es2jon
es2json is a simple elasticsearch index download/search tool. You can use your own queries via the -body switch or give it an idfile with \n-delmited IDs. The idfile\_consume switch consumes the idfile, leaving back in the file just the IDs which couldnt get retrieved because of any reasons. Output is in line-delimited JSON over STDOUT, if you don't use -headless, elasticsearch metadata is getting printed out too.
## usage
```
usage: es2json [-h] [-server SERVER] [-ign-source] [-size N[:M]]
[-timeout TIMEOUT] [-includes INCLUDES] [-excludes EXCLUDES]
[-headless] [-body BODY] [-idfile IDFILE]
[-idfile_consume IDFILE_CONSUME] [-pretty] [-verbose]
[-chunksize CHUNKSIZE] [-auth [USER]]Query elasticsearch indices/index/documents and print them formatted as JSON-Objects
optional arguments:
-h, --help show this help message and exit
-server SERVER use http://host:port/index/type/id.
host:port - hostname or IP with port of the elasticsearch node to query
default: localhost:9200
index - index to query
default: None → queries across all available indices
type - elasticsearch doctype to use (optional)
id - identifier of one specific document to query (optional)
-use-ssl use https instead of http
-ign-source return the Document or just the Elasticsearch-Metadata
-size N[:M] just return the first n-Records of the search,
or return a python slice, e.g. 2:10 returns a list
from the 2nd including the 9th element of the search
only works with the ESGenerator
Note: Not all slice variants may be supported
-timeout TIMEOUT Set the time in seconds after when a ReadTimeoutError can occur.
Default is 10 seconds. Raise for big/difficult querys
-includes INCLUDES just include following _source field(s) in the _source object
-excludes EXCLUDES exclude following _source field(s) from the _source object
-headless don't print Elasticsearch metadata
-body BODY Elasticsearch Query object that can be in the form of
1) a JSON string (e.g. '{"query": {"match": {"name": "foo"}}}')
2) a file containing the upper query string
-idfile IDFILE path to a file with \n-delimited IDs to process
-idfile_consume IDFILE_CONSUME
path to a file with \n-delimited IDs to process
-pretty prettyprint the json output
-verbose print progress for large dumps
-chunksize CHUNKSIZE chunksize of the search window to use
-auth [USER] Provide authentication, this can be done using:
1) set environment variables E2J_USER and E2J_PASSWD. In
this case there is no further argument needed here
2) as a string "username". The password is then asked interactively
3) as "username:password" (not recommended)```
## tests
This package comes with tests, of course this needs to be setup. See tests/Readme for setting this up.
Running tests after setup is as easy as `python3 -m pytest tests`