https://github.com/evolvedbinary/plmultianalyzer
https://github.com/evolvedbinary/plmultianalyzer
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/evolvedbinary/plmultianalyzer
- Owner: evolvedbinary
- License: mit
- Created: 2021-09-24T08:51:06.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-03-06T12:40:34.000Z (over 2 years ago)
- Last Synced: 2024-04-24T12:23:48.400Z (over 1 year ago)
- Language: Java
- Size: 98.6 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
PLMultiAnalyzer
===============
A Lucene custom analyzer that allow for indexing multiple tokens for a single term
it supports storing terms with Mixed-case letters and terms with punctuation. This in theory should produce more accurate results as it causes Lucene to perform a more exact search.
Released under the [MIT License](https://opensource.org/licenses/MIT).
[](https://github.com/evolvedbinary/PLMultiAnalyzer/actions/workflows/ci.yaml?query=workflow%3ACI)
[![Maven Central]()]()
# Adding a Custom analyzer to exist-db
## 1.Building eXist-db
```bash
$ git clone https://github.com/eXist-db/exist.git
$ cd exist
$ git checkout master
$ mvn -DskipTests package
```
we will refer to the exist-db directory as `$EXIST_HOME`
you can set it using
**Linux/macOS:**
```bash
$ export EXIST_HOME=/your/path/to/eXist-db
```
**Windows:**
```cmd
$ set EXIST_HOME=C:\your\path\to\eXist-db
```
## Copy the Jar into exist-db directory
**Linux/macOS:**
```shell
$ cp PLMultiAnalyzer-1.0.0-SNAPSHOT.jar $EXIST_HOME/exist-distribution/target/exist-distribution-[version]-dir/lib
```
**Windows:**
```cmd
$ copy PLMultiAnalyzer-1.0.0-SNAPSHOT.jar %EXIST_HOME%\exist-distribution\target\exist-distribution-[version]-dir\lib
```
## Add the analyzer dependency in exist start up script
in your `$EXIST_HOME/exist-distribution/target/exist-distribution-[version]-dir/etc/startup.xml`
add to the dependencies
```xml
...
com.evolvedbinary.lucene.analyzer
ohAnalyzer
1.0.0-SNAPSHOT
PLMultiAnalyzer-1.0.0-SNAPSHOT.jar
...
```
## Start up exist
run the start up script
**Linux/macOS:**
```shell
$ $EXIST_HOME/exist-distribution/target/exist-distribution-[version]-dir/bin/startup.sh
```
**Windows:**
```cmd
$ %EXIST_HOME%\exist-distribution\target\exist-distribution-[version]-dir\bin\startup.bat
```
## Index The data using the custom Analyzer
when creating the index config specify the `Analyzer` as `com.evolvedbinary.lucene.analyzer.OhAnalyzer`
the `Analyzer` needs two parameters
* `minimumTermLength`: the minimum length of any decomposed term, any smaller decomposed terms will be discarded. Set to 0 to indicate no minimum.
* `punctuationDictionary`: the dictionary of punctuation to use for decomposition.
```xml
'
-
’
```