{"id":30612640,"url":"https://github.com/evolvedbinary/plmultianalyzer","last_synced_at":"2025-08-30T05:34:47.463Z","repository":{"id":44549404,"uuid":"409896228","full_name":"evolvedbinary/PLMultiAnalyzer","owner":"evolvedbinary","description":null,"archived":false,"fork":false,"pushed_at":"2023-03-06T12:40:34.000Z","size":101,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-24T12:23:48.400Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evolvedbinary.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-24T08:51:06.000Z","updated_at":"2024-04-24T12:23:48.401Z","dependencies_parsed_at":"2023-02-17T02:01:51.048Z","dependency_job_id":null,"html_url":"https://github.com/evolvedbinary/PLMultiAnalyzer","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/evolvedbinary/PLMultiAnalyzer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evolvedbinary%2FPLMultiAnalyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evolvedbinary%2FPLMultiAnalyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evolvedbinary%2FPLMultiAnalyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evolvedbinary%2FPLMultiAnalyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evolvedbinary","download_url":"https://codeload.github.com/evolvedbinary/PLMultiAnalyzer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evolvedbinary%2FPLMultiAnalyzer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272808936,"owners_count":24996603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-30T05:34:46.296Z","updated_at":"2025-08-30T05:34:47.373Z","avatar_url":"https://github.com/evolvedbinary.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"PLMultiAnalyzer\n===============\nA Lucene custom analyzer that allow for indexing multiple tokens for a single term \nit supports storing terms with Mixed-case letters and terms with punctuation. This in theory should produce more accurate results as it causes Lucene to perform a more exact search.\n\nReleased under the [MIT License](https://opensource.org/licenses/MIT).\n\n[![CI](https://github.com/digital-preservation/utf8-validator/workflows/CI/badge.svg)](https://github.com/evolvedbinary/PLMultiAnalyzer/actions/workflows/ci.yaml?query=workflow%3ACI)\n[![Maven Central]()]()\n# Adding a Custom analyzer to exist-db\n\n## 1.Building eXist-db\n```bash\n$ git clone https://github.com/eXist-db/exist.git\n$ cd exist\n$ git checkout master\n$ mvn -DskipTests package\n```\nwe will refer to the exist-db directory as `$EXIST_HOME`\nyou can set it using\n**Linux/macOS:**\n```bash\n$ export EXIST_HOME=/your/path/to/eXist-db\n```\n**Windows:** \n```cmd\n$ set EXIST_HOME=C:\\your\\path\\to\\eXist-db\n```\n\n## Copy the Jar into exist-db directory\n**Linux/macOS:**\n```shell\n$ cp PLMultiAnalyzer-1.0.0-SNAPSHOT.jar  $EXIST_HOME/exist-distribution/target/exist-distribution-[version]-dir/lib\n```\n\n**Windows:** \n```cmd\n$ copy PLMultiAnalyzer-1.0.0-SNAPSHOT.jar  %EXIST_HOME%\\exist-distribution\\target\\exist-distribution-[version]-dir\\lib\n```\n\n## Add the analyzer dependency in exist start up script\nin your `$EXIST_HOME/exist-distribution/target/exist-distribution-[version]-dir/etc/startup.xml`\n add to the dependencies \n```xml\n\u003cdependencies\u003e\n    ... \u003c!-- other dependencies --\u003e\n    \u003cdependency\u003e\n        \u003cgroupId\u003ecom.evolvedbinary.lucene.analyzer\u003c/groupId\u003e\n        \u003cartifactId\u003eohAnalyzer\u003c/artifactId\u003e \n        \u003cversion\u003e1.0.0-SNAPSHOT\u003c/version\u003e\n        \u003crelativePath\u003ePLMultiAnalyzer-1.0.0-SNAPSHOT.jar\u003c/relativePath\u003e \u003c!-- must be exact match to the jar in lib folder --\u003e\n    \u003c/dependency\u003e\n    ... \u003c!-- other dependencies --\u003e\n\u003cdependencies\u003e\n```\n## Start up exist \nrun the start up script\n**Linux/macOS:**\n```shell\n$ $EXIST_HOME/exist-distribution/target/exist-distribution-[version]-dir/bin/startup.sh\n```\n\n**Windows:** \n```cmd\n$ %EXIST_HOME%\\exist-distribution\\target\\exist-distribution-[version]-dir\\bin\\startup.bat\n```\n\n## Index The data using the custom Analyzer\nwhen creating the index config specify the `Analyzer` as `com.evolvedbinary.lucene.analyzer.OhAnalyzer`\nthe `Analyzer` needs two parameters\n* `minimumTermLength`: the minimum length of any decomposed term, any smaller decomposed terms will be discarded. Set to 0 to indicate no minimum.\n* `punctuationDictionary`:  the dictionary of punctuation to use for decomposition.\n\n```xml\n\u003ccollection xmlns=\"http://exist-db.org/collection-config/1.0\"\u003e\n    \u003cindex xmlns:wiki=\"http://exist-db.org/xquery/wiki\" xmlns:html=\"http://www.w3.org/1999/xhtml\" xmlns:atom=\"http://www.w3.org/2005/Atom\"\u003e\n        \u003c!-- Lucene index is configured below --\u003e\n        \u003clucene\u003e\n\t        \u003canalyzer class=\"com.evolvedbinary.lucene.analyzer.OhAnalyzer\"\u003e\n                \u003cparam name=\"punctuationDictionary\" type=\"char[]\"\u003e\n                    \u003cvalue\u003e'\u003c/value\u003e\n                    \u003cvalue\u003e-\u003c/value\u003e\n                    \u003cvalue\u003e’\u003c/value\u003e\n                \u003c/param\u003e\n                \u003cparam name=\"minimumTermLength\" type=\"int\" value=\"2\" /\u003e\n            \u003canalyzer\u003e\n\t        \u003ctext qname=\"doc\"/\u003e\n        \u003c/lucene\u003e\n    \u003c/index\u003e\n\u003c/collection\u003e\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevolvedbinary%2Fplmultianalyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevolvedbinary%2Fplmultianalyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevolvedbinary%2Fplmultianalyzer/lists"}