https://github.com/codelibs/elasticsearch-analysis-synonym

NGramSynonymTokenizer for Elasticsearch
https://github.com/codelibs/elasticsearch-analysis-synonym

elasticsearch elasticsearch-plugin synonyms

Last synced: 9 months ago
JSON representation

NGramSynonymTokenizer for Elasticsearch

Host: GitHub
URL: https://github.com/codelibs/elasticsearch-analysis-synonym
Owner: codelibs
License: apache-2.0
Created: 2014-12-11T13:43:43.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2021-12-14T20:32:07.000Z (over 4 years ago)
Last Synced: 2025-05-28T01:07:18.201Z (about 1 year ago)
Topics: elasticsearch, elasticsearch-plugin, synonyms
Language: Java
Homepage:
Size: 121 KB
Stars: 24
Watchers: 11
Forks: 8
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          Elasticsearch Analysis Synonym

=======================

## Overview

Elasticsearch Analysis Synonym Plugin provides NGramSynonymTokenizer.

For more details, see [LUCENE-5252](https://issues.apache.org/jira/browse/LUCENE-5252 "LUCENE-5252").

## Version

[Versions in Maven Repository](http://central.maven.org/maven2/org/codelibs/elasticsearch-analysis-synonym/)

### Issues/Questions

Please file an [issue](https://github.com/codelibs/elasticsearch-analysis-synonym/issues "issue").

(Japanese forum is [here](https://github.com/codelibs/codelibs-ja-forum "here").)

## Installation

### For 5.x

    $ $ES_HOME/bin/elasticsearch-plugin install org.codelibs:elasticsearch-analysis-synonym:5.3.0

### For 2.x

    $ $ES_HOME/bin/plugin install org.codelibs/elasticsearch-analysis-synonym/2.4.0

## Getting Started

### Create synonym.txt File

First of all, you need to create a synonym dictionary file, synonym.txt in $ES\_CONF(ex. /etc/elasticsearch).

(The following content is just a sample...)

    $ cat /etc/elasticsearch/synonym.txt

    あ,かき,さしす,たちつて,なにぬねの

### Create Index

NGramSynonymTokenizer is defined as "ngram\_synonym" type.

Creating an index with "ngram\_synonym" is below:

    $ curl -XPUT localhost:9200/sample?pretty -d '

    {

      "settings":{

        "index":{

          "analysis":{

            "tokenizer":{

              "2gram_synonym":{

                "type":"ngram_synonym",

                "n":"2",

                "synonyms_path":"synonym.txt"

              }

            },

            "analyzer":{

              "2gram_synonym_analyzer":{

                "type":"custom",

                "tokenizer":"2gram_synonym"

              }

            }

          }

        }

      },

      "mappings":{

        "item":{

          "properties":{

            "id":{

              "type":"string",

              "index":"not_analyzed"

            },

            "msg":{

              "type":"string",

              "analyzer":"2gram_synonym_analyzer"

            }

          }

        }

      }

    }'

and then insert data:

    $ curl -XPOST localhost:9200/sample/item/1 -d '

    {

      "id":"1",

      "msg":"あいうえお"

    }'

### Check Search Results

Try searching...

    $ curl -XPOST "http://localhost:9200/sample/_search" -d '

    {

       "query": {

          "match_phrase": {

             "msg": "あ"

          }

       }

    }'

    $ curl -XPOST "http://localhost:9200/sample/_search" -d '

    {

       "query": {

          "match_phrase": {

             "msg": "あい"

          }

       }

    }'

    $ curl -XPOST "http://localhost:9200/sample/_search" -d '

    {

       "query": {

          "match_phrase": {

             "msg": "かき"

          }

       }

    }'

    $ curl -XPOST "http://localhost:9200/sample/_search" -d '

    {

       "query": {

          "match_phrase": {

             "msg": "かきい"

          }

       }

    }'

### Reload synonyms_path File Dynamically

To add "dynamic\_reload" property as true, NGramSynonymTokenizer reloads synonyms\_path file on the fly(actually, it's reload on reset() method call).

If you want to change an interval time to check a file timestamp, add "reload\_interval".

    $ curl -XPUT localhost:9200/sample?pretty -d '

    {

      "settings":{

        "index":{

          "analysis":{

            "tokenizer":{

              "2gram_synonym":{

                "type":"ngram_synonym",

                "n":"2",

                "synonyms_path":"synonym.txt",

                "dynamic_reload":true,

                "reload_interval":"10s"

              }

            },

    ...

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/codelibs/elasticsearch-analysis-synonym

Awesome Lists containing this project

README