{"id":28431400,"url":"https://github.com/dainiusjocas/lucene-text-analysis","last_synced_at":"2025-07-04T20:30:54.926Z","repository":{"id":40504929,"uuid":"488562865","full_name":"dainiusjocas/lucene-text-analysis","owner":"dainiusjocas","description":"(Micro)Library to inspect the output of the Lucene analyzers.","archived":false,"fork":false,"pushed_at":"2023-10-01T21:50:54.000Z","size":47,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-05T15:07:11.992Z","etag":null,"topics":["clojure","lucene","nl"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dainiusjocas.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":["dainiusjocas"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2022-05-04T11:34:21.000Z","updated_at":"2022-05-05T08:40:01.000Z","dependencies_parsed_at":"2023-10-01T23:33:21.246Z","dependency_job_id":null,"html_url":"https://github.com/dainiusjocas/lucene-text-analysis","commit_stats":{"total_commits":17,"total_committers":2,"mean_commits":8.5,"dds":0.05882352941176472,"last_synced_commit":"7652b4653fa9c45e94a058307e29d793268fbe31"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/dainiusjocas/lucene-text-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dainiusjocas%2Flucene-text-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dainiusjocas%2Flucene-text-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dainiusjocas%2Flucene-text-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dainiusjocas%2Flucene-text-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dainiusjocas","download_url":"https://codeload.github.com/dainiusjocas/lucene-text-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dainiusjocas%2Flucene-text-analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263614457,"owners_count":23488920,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","lucene","nl"],"created_at":"2025-06-05T15:07:06.717Z","updated_at":"2025-07-04T20:30:54.920Z","avatar_url":"https://github.com/dainiusjocas.png","language":"Clojure","funding_links":["https://github.com/sponsors/dainiusjocas"],"categories":[],"sub_categories":[],"readme":"[![Clojars Project](https://img.shields.io/clojars/v/lt.jocas/lucene-text-analysis.svg)](https://clojars.org/lt.jocas/lucene-text-analysis)\n[![cljdoc badge](https://cljdoc.org/badge/lt.jocas/lucene-text-analysis)](https://cljdoc.org/d/lt.jocas/lucene-text-analysis/CURRENT)\n[![Tests](https://github.com/dainiusjocas/lucene-text-analysis/actions/workflows/test.yml/badge.svg)](https://github.com/dainiusjocas/lucene-text-analysis/actions/workflows/test.yml)\n\n# lucene-text-analysis\n\nLibrary to inspect the output of the [Lucene](https://lucene.apache.org) text analysis pipeline.  \n\nSupports 3 ways of analyzing text:\n- string to list of strings;\n- String to list of tokens (similar to the Elasticsearch/Opensearch `_analyze` API);\n- string to GraphViz program to draw a Lucene `TokenStream` as a graph.\n\n## Quickstart\n\nDependencies:\n```clojure\n{:deps {lt.jocas/lucene-text-analysis {:mvn/version \"1.0.21\"}}}\n```\n\nCode:\n```clojure\n(require '[lucene.custom.text-analysis :as analysis])\n\n(analysis/text-\u003etoken-strings \"Test TEXT\")\n;; =\u003e [\"test\" \"text\"]\n\n(analysis/text-\u003etokens \"Test TEXT\")\n;; =\u003e \n[#lucene.custom.text_analysis.TokenRecord{:token \"test\",\n                                          :type \"\u003cALPHANUM\u003e\",\n                                          :start_offset 0,\n                                          :end_offset 4,\n                                          :position 0,\n                                          :positionLength 1}\n #lucene.custom.text_analysis.TokenRecord{:token \"text\",\n                                          :type \"\u003cALPHANUM\u003e\",\n                                          :start_offset 5,\n                                          :end_offset 9,\n                                          :position 1,\n                                          :positionLength 1}]\n\n(analysis/text-\u003egraph \"Test TEXT\")\n;; =\u003e\n\"digraph tokens {\n   graph [ fontsize=30 labelloc=\\\"t\\\" label=\\\"\\\" splines=true overlap=false rankdir = \\\"LR\\\" ];\n   // A2 paper size\n   size = \\\"34.4,16.5\\\";\n   edge [ fontname=\\\"Helvetica\\\" fontcolor=\\\"red\\\" color=\\\"#606060\\\" ]\n   node [ style=\\\"filled\\\" fillcolor=\\\"#e8e8f0\\\" shape=\\\"Mrecord\\\" fontname=\\\"Helvetica\\\" ]\n \n   0 [label=\\\"0\\\"]\n   -1 [shape=point color=white]\n   -1 -\u003e 0 []\n   0 -\u003e 1 [ label=\\\"test / Test\\\"]\n   1 [label=\\\"1\\\"]\n   1 -\u003e 2 [ label=\\\"text / TEXT\\\"]\n   -2 [shape=point color=white]\n   2 -\u003e -2 []\n }\n \"\n```\n\nEvery function accepts a Lucene `Analyzer` as the second argument.\n\n## Use cases\n\n- Do ASCII folding person names:\n\nWith helper library:\n```clojure\nlt.jocas/lucene-custom-analyzer {:mvn/version \"1.0.14\"}\n```\n\n```clojure\n(require '[lucene.custom.analyzer :as custom-analyzer])\n\n(lucene.custom.text-analysis/text-\u003etoken-strings \n  \"Thomas Müller\" \n  (custom-analyzer/create {:token-filters [{:asciiFolding {}}]}))\n;; =\u003e [\"Thomas\" \"Muller\"]\n```\n\n## How to draw a graph image?\n\nThe example assumes that the GraphViz `dot` program is installed:\n\n```shell\nclojure -M --eval '(require `lucene.custom.text-analysis)(println (lucene.custom.text-analysis/text-\u003egraph \"one two three\"))' | dot -Tpng -o docs/assets/images/token-graph.png\n```\nResults in an image\n\n\u003cimg src=\"docs/assets/images/token-graph.png\"\nalt=\"Token Graph\" title=\"Token Graph\" /\u003e\n\n## Development\n\nCompile Java classes:\n\n```shell\nclojure -T:build compile-java\n```\n\nStart your REPL.\n\n## License\n\nCopyright \u0026copy; 2023 [Dainius Jocas](https://www.jocas.lt).\n\nDistributed under The Apache License, Version 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdainiusjocas%2Flucene-text-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdainiusjocas%2Flucene-text-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdainiusjocas%2Flucene-text-analysis/lists"}