An open API service indexing awesome lists of open source software.

https://github.com/codelibs/elasticsearch-analysis-synonym

NGramSynonymTokenizer for Elasticsearch
https://github.com/codelibs/elasticsearch-analysis-synonym

elasticsearch elasticsearch-plugin synonyms

Last synced: 3 days ago
JSON representation

NGramSynonymTokenizer for Elasticsearch

Awesome Lists containing this project

README

        

Elasticsearch Analysis Synonym
=======================

## Overview

Elasticsearch Analysis Synonym Plugin provides NGramSynonymTokenizer.
For more details, see [LUCENE-5252](https://issues.apache.org/jira/browse/LUCENE-5252 "LUCENE-5252").

## Version

[Versions in Maven Repository](http://central.maven.org/maven2/org/codelibs/elasticsearch-analysis-synonym/)

### Issues/Questions

Please file an [issue](https://github.com/codelibs/elasticsearch-analysis-synonym/issues "issue").
(Japanese forum is [here](https://github.com/codelibs/codelibs-ja-forum "here").)

## Installation

### For 5.x

$ $ES_HOME/bin/elasticsearch-plugin install org.codelibs:elasticsearch-analysis-synonym:5.3.0

### For 2.x

$ $ES_HOME/bin/plugin install org.codelibs/elasticsearch-analysis-synonym/2.4.0

## Getting Started

### Create synonym.txt File

First of all, you need to create a synonym dictionary file, synonym.txt in $ES\_CONF(ex. /etc/elasticsearch).
(The following content is just a sample...)

$ cat /etc/elasticsearch/synonym.txt
あ,かき,さしす,たちつて,なにぬねの

### Create Index

NGramSynonymTokenizer is defined as "ngram\_synonym" type.
Creating an index with "ngram\_synonym" is below:

$ curl -XPUT localhost:9200/sample?pretty -d '
{
"settings":{
"index":{
"analysis":{
"tokenizer":{
"2gram_synonym":{
"type":"ngram_synonym",
"n":"2",
"synonyms_path":"synonym.txt"
}
},
"analyzer":{
"2gram_synonym_analyzer":{
"type":"custom",
"tokenizer":"2gram_synonym"
}
}
}
}
},
"mappings":{
"item":{
"properties":{
"id":{
"type":"string",
"index":"not_analyzed"
},
"msg":{
"type":"string",
"analyzer":"2gram_synonym_analyzer"
}
}
}
}
}'

and then insert data:

$ curl -XPOST localhost:9200/sample/item/1 -d '
{
"id":"1",
"msg":"あいうえお"
}'

### Check Search Results

Try searching...

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
"query": {
"match_phrase": {
"msg": "あ"
}
}
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
"query": {
"match_phrase": {
"msg": "あい"
}
}
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
"query": {
"match_phrase": {
"msg": "かき"
}
}
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
"query": {
"match_phrase": {
"msg": "かきい"
}
}
}'

### Reload synonyms_path File Dynamically

To add "dynamic\_reload" property as true, NGramSynonymTokenizer reloads synonyms\_path file on the fly(actually, it's reload on reset() method call).
If you want to change an interval time to check a file timestamp, add "reload\_interval".

$ curl -XPUT localhost:9200/sample?pretty -d '
{
"settings":{
"index":{
"analysis":{
"tokenizer":{
"2gram_synonym":{
"type":"ngram_synonym",
"n":"2",
"synonyms_path":"synonym.txt",
"dynamic_reload":true,
"reload_interval":"10s"
}
},
...