https://github.com/dbuenzli/uuseg
Unicode text segmentation for OCaml
https://github.com/dbuenzli/uuseg
Last synced: 10 months ago
JSON representation
Unicode text segmentation for OCaml
- Host: GitHub
- URL: https://github.com/dbuenzli/uuseg
- Owner: dbuenzli
- License: isc
- Created: 2014-12-23T18:38:35.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2024-09-10T22:52:21.000Z (over 1 year ago)
- Last Synced: 2024-09-11T03:24:15.844Z (over 1 year ago)
- Language: OCaml
- Homepage: http://erratique.ch/software/uuseg
- Size: 148 KB
- Stars: 23
- Watchers: 6
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-ocaml - Uuseg
- awesome-list - uuseg
README
Uuseg — Unicode text segmentation for OCaml
===============================================================================
%%VERSION%%
Uuseg is an OCaml library for segmenting Unicode text. It implements
the locale independent [Unicode text segmentation algorithms][1] to
detect grapheme cluster, word and sentence boundaries and the [Unicode
line breaking algorithm][2] to detect line break opportunities.
The library is independent from any IO mechanism or Unicode text data
structure and it can process text without a complete in-memory
representation.
Uuseg is distributed under the ISC license. It depends on [Uucp].
[1]: http://www.unicode.org/reports/tr29/
[2]: http://www.unicode.org/reports/tr14/
[Uucp]: http://erratique.ch/software/uucp
Homepage:
## Installation
Uuseg can be installed with `opam`:
opam install uuseg
opam install uuseg cmdliner uutf # For the usegtrip tool.
If you don't use `opam` consult the [`opam`](opam) file for build
instructions.
## Documentation
The documentation can be consulted [online] or via `odig doc uuseg`.
Questions are welcome but better asked on the [OCaml forum] than on
the issue tracker.
[online]: http://erratique.ch/software/uuseg/doc/
[OCaml forum]: https://discuss.ocaml.org/
## Sample programs
The [`usegtrip`] tool segments text provided on standard input.
See also the [doc examples].
[`usegtrip`]: test/usegtrip.ml
[doc examples]: test/examples.ml