https://github.com/cclient/elasticsearch-filter-limitbyfreq
elasticsearch token limit by freq,过滤出权重较重的top个词,为下一步simhash作准备
https://github.com/cclient/elasticsearch-filter-limitbyfreq
byfreq elasticsearch-filter limit
Last synced: 7 months ago
JSON representation
elasticsearch token limit by freq,过滤出权重较重的top个词,为下一步simhash作准备
- Host: GitHub
- URL: https://github.com/cclient/elasticsearch-filter-limitbyfreq
- Owner: cclient
- License: apache-2.0
- Created: 2017-09-01T13:30:57.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-09-01T14:23:42.000Z (about 8 years ago)
- Last Synced: 2025-01-16T22:20:06.094Z (9 months ago)
- Topics: byfreq, elasticsearch-filter, limit
- Language: Java
- Homepage:
- Size: 10.7 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
Limit Token Filter for Elasticsearch
==================================Filter: limit_by_freq
Parameter: max_token_count(default:512)
Desc: token order by freq desc and limit top
freq num is stored in 'payload' to be used in future
Install
-------1.download or compile
* download pre-build package from here: https://github.com/cclient/elasticsearch-filter-limitbyfreq/releases
unzip plugin to folder `your-es-root/plugins/`2.restart elasticsearch
#### Quick Example
1.create a index
```bash
curl -XPUT http://localhost:9200/test_index -d'
{
"settings": {
"analysis": {
"filter": {
"my_limit": {
"type":"limit_by_freq",
"max_token_count":2
}
},
"analyzer": {
"limit_test": {
"tokenizer": "standard",
"filter": [
"my_limit"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"desc": {
"type": "text",
"analyzer": "limit_test"
}
}
}
}
}'
```2.test
```bash
curl -XPOST http://localhost:9200/test_index/_analyze?tokenizer=standard&filter=limit_by_freq -d'
hello hyper log log'
```Result
```json
{
"tokens": [
{
"token": "log",
"start_offset": 0,
"end_offset": 0,
"type": "TOP_TOKEN",
"position": 0
},
{
"token": "hello",
"start_offset": 0,
"end_offset": 0,
"type": "TOP_TOKEN",
"position": 1
},
{
"token": "hyper",
"start_offset": 0,
"end_offset": 0,
"type": "TOP_TOKEN",
"position": 2
}
]
}
``````bash
curl -XPOST http://127.0.0.1:9200/test_index/_analyze?analyzer=limit_test -d'
hello hyper log log'
```Result
```json
{
"tokens": [
{
"token": "log",
"start_offset": 0,
"end_offset": 0,
"type": "TOP_TOKEN",
"position": 0
},
{
"token": "hello",
"start_offset": 0,
"end_offset": 0,
"type": "TOP_TOKEN",
"position": 1
}
]
}
```