https://github.com/o19s/elyzer

"Stop worrying about Elasticsearch analyzers", my therapist says
https://github.com/o19s/elyzer

custom-analyzer elasticsearch elasticsearch-analyzers

Last synced: 24 days ago
JSON representation

"Stop worrying about Elasticsearch analyzers", my therapist says

Host: GitHub
URL: https://github.com/o19s/elyzer
Owner: o19s
License: apache-2.0
Created: 2015-09-17T02:36:43.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2021-06-01T21:49:27.000Z (almost 4 years ago)
Last Synced: 2025-04-02T08:38:10.376Z (about 1 month ago)
Topics: custom-analyzer, elasticsearch, elasticsearch-analyzers
Language: Python
Homepage:
Size: 33.2 KB
Stars: 155
Watchers: 11
Forks: 15
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # Elyzer

See step-by-step how Elasticsearch custom analyzers decompose your text into tokens.

My therapist said this would be a good idea...


Doug:  hi

Eliza: How are you today.. What would you like to discuss?

Doug:  well I can't really see how my Elasticsearch analyzers work, on the inside

Eliza: How do you know you can't really see how your Elasticsearch analyzers work, on the inside?

Doug:  cause I call _analyze, and it does the whole analyzer or just one step

Eliza  What other reasons might there be?

Doug:  I guess nobody's gotten around to making a good debugger

Eliza  I'm not sure I understand you fully.

Doug   well just keep reading this README then...

Eliza  Come, come, elucidate your thoughts.



# Installation

Install through pip3

(ES 2.x & 5.x)

    pip3 install elyzer

# Usage

Give Elyzer some text and have its analyzed. See the process broken down step-by-step on the command line.

Assuming I'd created an `english_bigrams` custom analyzer for the index tmdb, I would run the analyzer like so

```

$ elyzer --es "http://localhost:9200" --index tmdb --analyzer english_bigrams "Mary had a little lamb"

TOKENIZER: standard

{1:Mary}    {2:had} {3:a}   {4:little}  {5:lamb}    

TOKEN_FILTER: standard

{1:Mary}    {2:had} {3:a}   {4:little}  {5:lamb}    

TOKEN_FILTER: lowercase

{1:mary}    {2:had} {3:a}   {4:little}  {5:lamb}    

TOKEN_FILTER: porter_stem

{1:mari}    {2:had} {3:a}   {4:littl}   {5:lamb}    

TOKEN_FILTER: bigram_filter

{1:mari had}    {2:had a}   {3:a littl} {4:littl lamb}  

```

Output is each token, prefixed by the numerical position attribute in the token stream at each step.

## Args

There are four required command line args:

- es: the elasticsearch host (ie http://localhost:9200)

- index: name of the index where your custom analyzer can be found

- analyzer: name of your custom analyzer

- text: the text to analyze

# Shortcomings

aka "Areas for Improvement"

- Only works for custom analyzers right now (as it accesses the settings for your index)

- Attributes besides the token text and position would be handy 

## Who?

Created by [OpenSource Connections](http://opensourceconnections.com)

## License

Released under [Apache 2](LICENSE.txt)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/o19s/elyzer

Awesome Lists containing this project

README