https://github.com/federicotdn/inelastic
Print an Elasticsearch inverted index as a CSV table or JSON object.
https://github.com/federicotdn/inelastic
csv elastic elasticsearch index inverted json search
Last synced: 2 months ago
JSON representation
Print an Elasticsearch inverted index as a CSV table or JSON object.
- Host: GitHub
- URL: https://github.com/federicotdn/inelastic
- Owner: federicotdn
- License: apache-2.0
- Archived: true
- Created: 2018-08-09T23:19:27.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-03-20T17:40:51.000Z (about 2 years ago)
- Last Synced: 2025-11-27T18:27:22.973Z (4 months ago)
- Topics: csv, elastic, elasticsearch, index, inverted, json, search
- Language: Python
- Homepage:
- Size: 37.1 KB
- Stars: 11
- Watchers: 2
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# inelastic
[](https://travis-ci.org/federicotdn/inelastic)
[](https://pypi.python.org/pypi/inelastic)


Print an Elasticsearch inverted index as a CSV table or JSON object.
`inelastic` builds an approximation of how an [inverted index](https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up) would look like for a particular index and document field, using the [Multi termvectors API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html) on all stored documents.
## Installation
To install `inelastic`, run the following command:
```bash
$ pip3 install --upgrade inelastic
```
`inelastic` currently only supports Elasticsearch versions 6.X and 7.X.
## Example
Having the following index:
```
PUT /tweets
{
"mappings": {
"properties": {
"content": {
"type": "text"
}
}
}
}
```
with the following documents:
```
POST /tweets/_bulk
{ "index": { "_id": 1 }}
{ "content": "This is my first tweet." }
{ "index": { "_id": 2 }}
{ "content": "Most Elasticsearch examples use tweets." }
{ "index": { "_id": 3 }}
{ "content": "This is an example." }
{ "index": { "_id": 4 }}
{ "content": "Adding some more tweets." }
{ "index": { "_id": 5 }}
{ "content": "Adding more and more tweets." }
```
`inelastic` could be used as follows (combined with the `column` command):
```bash
$ inelastic -i tweets -f content | column -t -s ,
```
Which would output:
```
term freq doc_count d0 d1 d2
adding 2 2 4 5
an 1 1 3
and 1 1 5
elasticsearch 1 1 2
example 1 1 3
examples 1 1 2
first 1 1 1
is 2 2 1 3
more 3 2 4 5
most 1 1 2
my 1 1 1
some 1 1 4
this 2 2 1 3
tweet 1 1 1
tweets 3 3 2 4 5
use 1 1 2
```
The `freq` field specifies the total amount of times the term appears in all documents, and the `doc_count` field specifies how many documents contain the term at least once. The `d0`, `d1`... fields list the IDs for documents containing the term.
The chosen document field's type must be `text` or `keyword`.
## Usage
These are the arguments `inelastic` accepts:
- `-i` (`--index`): Index name (**required**).
- `-f` (`--field`): Document field name from which to generate inverted index (**required**).
- `-l` (`--id-field`): Document field to use as ID when printing results (*default: _id*).
- `-o` (`--output`): Output format, `json` or `csv` (*default: csv*).
- `-p` (`--port`): Elasticsearch host port (*default: 9200*).
- `-e` (`--host`): Elasticsearch host address (*default: localhost*).
- `-q` (`--query`): Elasticsearch DSL JSON query to use when fetch documents. (*default: None*).
- `-d` (`--doctype`): Document type (*default: _doc*) (**Elasticsearch 6.X only**).
- `-v` (`--verbose`): Print debug information (*default: False*).
- `-h` (`--help`): Show help and exit.
## Scripting
The `inelastic` module exposes the `InvertedIndex` class, which can be used in custom Python scripts:
```python
from inelastic import InvertedIndex
from elasticsearch import Elasticsearch # Only with ES 7.X
from elasticsearch6 import Elasticsearch # Only with ES 6.X
es = Elasticsearch()
ii = InvertedIndex(search_size=250, scroll_time='10s')
n_docs, errors = ii.read_index(es, 'tweets', 'content')
print('# docs: {}, # errors: {}'.format(n_docs, errors))
for entry in ii.to_list():
print(entry)
```
When run, the previous script will output:
```
# docs: 5, # errors: 0
('adding', )
('an', )
('and', )
('elasticsearch', )
('example', )
('examples', )
('first', )
('is', )
('more', )
('most', )
('my', )
('some', )
('this', )
('tweet', )
('tweets', )
('use', )
```