Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jprante/elasticsearch-index-termlist
Elasticsearch Index Termlist
https://github.com/jprante/elasticsearch-index-termlist
Last synced: 13 days ago
JSON representation
Elasticsearch Index Termlist
- Host: GitHub
- URL: https://github.com/jprante/elasticsearch-index-termlist
- Owner: jprante
- License: apache-2.0
- Created: 2012-03-25T11:03:58.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2019-04-23T22:53:02.000Z (over 5 years ago)
- Last Synced: 2024-06-02T15:54:17.193Z (5 months ago)
- Language: Java
- Homepage:
- Size: 1.11 MB
- Stars: 117
- Watchers: 13
- Forks: 31
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Elasticsearch Index Termlist Plugin
This plugin extends Elasticsearch with a term list capability. It presents a list of terms in a field of an index
and can also list each terms frequency. Term lists can be generated from one index or even of all of the
indexes.## Versions
| Elasticsearch | Plugin | Release date |
| -------------- | ------------ | ------------ |
| 2.3.0 | 2.3.0.0 | March 29, 2016 |
| 2.2.0 | 2.2.0.2 | March 22, 2016 |
| 1.5.2 | 1.5.2.0 | Jun 5, 2015 |
| 1.5.0 | 1.5.0.0 | Apr 9, 2015 |
| 1.4.4 | 1.4.4.0 | Mar 15, 2015 |
| 1.4.0 | 1.4.0.2 | Feb 19, 2015 |
| 1.4.0 | 1.4.0.1 | Jan 14, 2015 |
| 1.4.0 | 1.4.0.0 | Nov 18, 2014 |
| 1.3.2 | 1.3.0.0 | Aug 21, 2014 |
| 1.2.1 | 1.2.1.0 | Jul 3, 2014 |## Installation
./bin/plugin -install index-termlist -url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-index-termlist/1.5.2.0/elasticsearch-index-termlist-1.5.2.0-plugin.zip
Do not forget to restart the node after installing.
## Project docs
The Maven project site is available at [Github](http://jprante.github.io/elasticsearch-index-termlist)
## Issues
All feedback is welcome! If you find issues, please post them at [Github](https://github.com/jprante/elasticsearch-index-termlist/issues)
# Introduction
Getting the list of all terms indexed is useful for various purposes, for example
- term statistics
- building dictionaries
- controlling the overall effects of analyzers on the indexed terms
- automatic query building on indexed terms, e.g. for load tests
- input to linguistic analysis tools
- for other post-processing of the indexed terms outside of ElasticsearchOptionally, the term list can be narrowed down to a field name. The field name is the Lucene field
name as found in the Lucene index.Only terms of field names not starting with underscore are listed. Terms of internal fields
like `_uid`, `_all`, or `_type` are always skipped.# Response
For each term, statistics are computed.
{
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"took": 384,
"numdocs": 51279,
"numterms": 100,
"terms": [
{
"term": "aacr2",
"totalfreq": 34699,
"docfreq": 34697,
"min": 1,
"max": 2,
"mean": 1.0000505458956723,
"geomean": 1.0000399550985877,
"sumofsquares": 34703,
"sumoflogs": 1.3862943611198906,
"sigma": 0.008475454987021664,
"variance": 0.00007183333723703039
}, ...
`took` - milliseconds required for executing`numdocs` - the number of documents examined
`numterms` - the number of terms returned
`terms` - the array of term infos
`term` - the name of the term
`totalfreq` - the total number of occurrences of this term
`docfreq` - the document count where this term appears in
`min` - the minimum number of occurrences of this term in a document
`max` - the maximum number of occurrences of this term in a document`mean` - the mean of the term occurences
`geomean` - the gemotric mean of the term occurrences
`sumofsquares` - sum of the squares of the term occurrences
`sumoflogs` - sum of the logarithms of the term occurences
`variance` - the variance of the term occurences
`sigma` - the standard deviation, equal to sqrt(variance)
# Example
Consider the following example
curl -XDELETE 'http://localhost:9200/test/'
curl -XPUT 'http://localhost:9200/test/'
curl -XPUT 'http://localhost:9200/test/test/1' -d '{ "test": "Hello World" }'
curl -XPUT 'http://localhost:9200/test/test/2' -d '{ "test": "Hello Jörg Prante" }'
curl -XPUT 'http://localhost:9200/test/test/3' -d '{ "message": "elastic search" }'Get term list of index ``test``
curl -XGET 'http://localhost:9200/test/_termlist'
Get term list of index `test` of field `message`
curl -XGET 'http://localhost:9200/test/_termlist?field=message'
Get term list of index `test` with total frequencies but only the first three of the list
curl -XGET 'http://localhost:9200/test/_termlist?size=3'
Get term list of terms starting with `hello` in index `test` field `test`
curl -XGET 'http://localhost:9200/test/_termlist?field=test&term=hello'
A page of 100 terms of a sorted list of terms in your index beginning with `a`
curl -XGET 'http://localhost:9200/books/_termlist?term=a&sortbyterms&pretty&from=0&size=100'
A page of 100 terms of a sorted list of terms in your index beginning with `frodo`,'frod','fro' and 'fr', since your backtracingcount is set to 3curl -XGET 'http://localhost:9200/books/_termlist?term=frodo&sortbyterms&pretty&from=0&size=100*backtracingcount=3'
# Caution
The term list is built internally into an unsorted, compact set of strings which i
s not streamed to the client. You should be aware that if you have lots of unique terms
in the index, this procedure consumes a lot of heap memory and may result in
out of memory situations that can render your Elasticsearch cluster unusable
until it is restarted.# License
Elasticsearch Term List Plugin
Copyright (C) 2011-2015 Jörg Prante
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.