{"id":20137843,"url":"https://github.com/scdh/icu-xpath-bindings","last_synced_at":"2025-08-23T14:14:04.293Z","repository":{"id":91213877,"uuid":"602714865","full_name":"SCDH/icu-xpath-bindings","owner":"SCDH","description":"Bring ICU's transliteration and normalization to XPath","archived":false,"fork":false,"pushed_at":"2024-11-06T17:32:32.000Z","size":167,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-02T23:13:36.768Z","etag":null,"topics":["icu","normalization","oxygenxml","saxon","tei","unicode","xpath","xquery","xslt"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SCDH.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-16T19:44:51.000Z","updated_at":"2024-11-06T17:32:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"a95d6a77-737a-4ce0-bfa2-11d9391eb003","html_url":"https://github.com/SCDH/icu-xpath-bindings","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/SCDH/icu-xpath-bindings","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SCDH%2Ficu-xpath-bindings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SCDH%2Ficu-xpath-bindings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SCDH%2Ficu-xpath-bindings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SCDH%2Ficu-xpath-bindings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SCDH","download_url":"https://codeload.github.com/SCDH/icu-xpath-bindings/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SCDH%2Ficu-xpath-bindings/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271751925,"owners_count":24814707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["icu","normalization","oxygenxml","saxon","tei","unicode","xpath","xquery","xslt"],"created_at":"2024-11-13T21:29:54.016Z","updated_at":"2025-08-23T14:14:04.276Z","avatar_url":"https://github.com/SCDH.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ICU XPath Bindings\n\nThis project provides XPath bindings of the\n[ICU](https://unicode-org.github.io/icu/) library for processing\ncommon Unicode tasks. It's based on the ICU library for Java (ICU4J)\nand can be used in the [Saxon XSLT/XQuery](https://www.saxonica.com)\nprocessor.\n\nThe bindings only use a small set of the ICU library. Other parts may\nbe added in future, if they are needed. XPath functions for the\nfollowing tasks are provided:\n\n- normalization\n- transliteration\n\n## XPath Functions\n\nThe namespace name of the XPath extension functions is\n`https://unicode-org.github.io/icu/`. In this documentation, we are\nusing the prefix `icu` bound to this namespace:\n`xmlns:icu=\"https://unicode-org.github.io/icu/\"`.\n\n- normalization\n  - [`icu:normalize(input as xs:string, normalizer as xs:string, mode as xs:string) as xs:string`](doc/normalization.md#icunormalize)\n- transliteration\n  - [`icu:transliterate(input as xs:string, transliterator-ID as\n    xs:string) as xs:string`](doc/transliteration.md#icutransliterate)\n  - [`icu:transliterator-from-rules(ID as xs:string, rules as xs:string, direction as xs:string) as xs:boolean`](doc/transliteration.md#icutransliterator-from-rules)\n  - [`icu:transliterator-ids() as xs:string*`](doc/transliteration.md#icutransliterator-ids)\n\n## Getting started\n\nFor getting started have a look at the example sections in the\n[transliteration](doc/transliteration.md) and\n[normalization](doc/normalization.md) documentation.\n\n\n## Installation\n\n### oXygen XML Editor\n\nInstallation for the oXygen XML editor is very simple. You only have\nto provide the following URL to the installation dialog from **Help**\n-\u003e **Install new add-ons...**:\n\n```\nhttps://scdh.github.io/icu-xpath-bindings/descriptor.xml\n```\n\nNote: As we don't have a key for signing the extension, we will have\nto proceed anyway at some stage of the installation process.\n\nAfter the installation, you can use the new XPath function everywhere\nin oXygen. You don't need to clone this repo.\n\n### Usage with Saxon's command line interface\n\n**tl;dr**: Run `mvn package` and use the `xslt.sh` or `saxon.sh` shell\nwrappers with the option `-config:saxon-config.xml`.\n\nTwo things are necessary:\n\n1. Tell Saxon that there are XPath functions. This can be done via a\n   [Saxon configuration\n   file](https://www.saxonica.com/html/documentation11/configuration/configuration-file/). Such\n   a configuration is in [`saxon-config.xml`](saxon-config.xml). You\n   can use it from the Saxon command line interface via the argument\n   `-config:saxon-config.xml`.\n\n2. Provide a jar file to the classpath, so that the Java classes that\n   define the functions are available to Saxon. On the [releases\n   page](https://github.com/SCDH/icu-xpath-bindings/releases/), you\n   can find jar files for each release. Use\n   `icu-xpath-bindings-VERSION-with-dependencies.jar` or\n   `icu-xpath-bindings-VERSION.jar`. The former has everything but\n   Saxon packed into it. If using the latter one, dependency packages\n   like ICU4J also have to be included into the classpath:\n\n- icu4j\n- icu4j-charset\n- icu4j-localespi\n- slf4j-api\n\nYou can get the dependency jar files manually through [Maven\nCentral](https://mvnrepository.com/repos/central) or you can clone\nthis git repository and run the [Maven](https://maven.apache.org/)\nbuild process, which downloads and builds everything for you\nautomatically:\n\n```{shell}\nmvn package\n```\n\nAfter you have run `mvn package` all the required jar files are\npresent within the project:\n\n- `bindings/target/icu-xpath-bindings-VERSION.jar`\n- `bindings/target/lib/icu4j-VERSION.jar`\n- `bindings/target/lib/icu4j-charset-VERSION.jar`\n- `bindings/target/lib/icu4j-localespi-VERSION.jar`\n- `bindings/target/lib/slf4j-api-VERSION.jar`\n\nFor convenience, after running `mvn package` there will also be the\nshell scripts `xslt.sh` and `saxon.sh` in the repo's root folder. It's\na shell wrapper around Saxon that sets the classpath correctly.\n\n\n### Java\n\nWhen using Java, you should also have a look at the\n[`IcuXPathFunctionRegistry.register(Processor)`](bindings/src/main/java/de/wwu/scdh/xpath/icu/IcuXPathFunctionRegistry.java). Moreover,\nthe classes with the function definition are registered for loading\nthrough the SPI.\n\n## Building locally\n\nYou can build and test the project locally. You can also install the\noxygen plugin from a local build. Therefore, run\n\n```{shell}\nmvn -Drelease.url=\"\" package\n```\n\nThen, you can provide the descriptor file under\n`oxygen/target/descriptor.xml` to the oxygen extension installation\ndialog.\n\n\n## Further Reading\n\n- [extension functions](https://www.saxonica.com/html/documentation11/extensibility/extension-functions-J/ext-full-J.html)\n\n- [strip accents with ICU\n  transliterator](https://stackoverflow.com/questions/2992066/code-to-strip-diacritical-marks-using-icu)\n\n\n## License\n\nMIT License\n\nCopyright (c) 2023 SCDH, Westfälische Wilhelms-Universität Münster\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscdh%2Ficu-xpath-bindings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscdh%2Ficu-xpath-bindings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscdh%2Ficu-xpath-bindings/lists"}