Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/slub/es2json

elasticsearch bulk harvester which is using the scroll-API
https://github.com/slub/es2json

elasticsearch elasticsearch-client json line-delimited-json python3

Last synced: 27 days ago
JSON representation

elasticsearch bulk harvester which is using the scroll-API

Host: GitHub
URL: https://github.com/slub/es2json
Owner: slub
License: apache-2.0
Created: 2020-01-30T07:11:12.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2023-12-14T15:47:18.000Z (about 1 year ago)
Last Synced: 2024-04-14T22:49:12.599Z (8 months ago)
Topics: elasticsearch, elasticsearch-client, json, line-delimited-json, python3
Language: Python
Size: 123 KB
Stars: 1
Watchers: 7
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        

# es2jon

es2json is a simple elasticsearch index download/search tool. You can use your own queries via the -body switch or give it an idfile with \n-delmited IDs. The idfile\_consume switch consumes the idfile, leaving back in the file just the IDs which couldnt get retrieved because of any reasons. Output is in line-delimited JSON over STDOUT, if you don't use -headless, elasticsearch metadata is getting printed out too.

## usage

```

usage: es2json [-h] [-server SERVER] [-ign-source] [-size N[:M]]

               [-timeout TIMEOUT] [-includes INCLUDES] [-excludes EXCLUDES]

               [-headless] [-body BODY] [-idfile IDFILE]

               [-idfile_consume IDFILE_CONSUME] [-pretty] [-verbose]

               [-chunksize CHUNKSIZE] [-auth [USER]]

Query elasticsearch indices/index/documents and print them formatted as JSON-Objects

optional arguments:

  -h, --help            show this help message and exit

  -server SERVER        use http://host:port/index/type/id.

                        host:port - hostname or IP with port of the elasticsearch node to query

                                    default: localhost:9200

                        index     - index to query

                                    default: None → queries across all available indices

                        type      - elasticsearch doctype to use (optional)

                        id        - identifier of one specific document to query (optional)

  -use-ssl              use https instead of http

  -ign-source           return the Document or just the Elasticsearch-Metadata

  -size N[:M]           just return the first n-Records of the search,

                        or return a python slice, e.g. 2:10 returns a list

                        from the 2nd including the 9th element of the search

                        only works with the ESGenerator

                        Note: Not all slice variants may be supported

  -timeout TIMEOUT      Set the time in seconds after when a ReadTimeoutError can occur.

                        Default is 10 seconds. Raise for big/difficult querys 

  -includes INCLUDES    just include following _source field(s) in the _source object

  -excludes EXCLUDES    exclude following _source field(s) from the _source object

  -headless             don't print Elasticsearch metadata

  -body BODY            Elasticsearch Query object that can be in the form of

                        1) a JSON string (e.g. '{"query": {"match": {"name": "foo"}}}')

                        2) a file containing the upper query string

  -idfile IDFILE        path to a file with \n-delimited IDs to process

  -idfile_consume IDFILE_CONSUME

                        path to a file with \n-delimited IDs to process

  -pretty               prettyprint the json output

  -verbose              print progress for large dumps

  -chunksize CHUNKSIZE  chunksize of the search window to use

  -auth [USER]          Provide authentication, this can be done using:

                        1) set environment variables E2J_USER and E2J_PASSWD. In

                           this case there is no further argument needed here

                        2) as a string "username". The password is then asked interactively

                        3) as "username:password" (not recommended)

```

## tests

This package comes with tests, of course this needs to be setup. See tests/Readme for setting this up.

Running tests after setup is as easy as `python3 -m pytest tests`