https://github.com/hildjj/wrap-segments
Wrap lines at Unicode word boundaries, using Intl.Segmenter.
https://github.com/hildjj/wrap-segments
unicode wordwrap zalgo
Last synced: about 2 months ago
JSON representation
Wrap lines at Unicode word boundaries, using Intl.Segmenter.
- Host: GitHub
- URL: https://github.com/hildjj/wrap-segments
- Owner: hildjj
- License: other
- Created: 2023-05-11T18:46:28.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-13T19:02:54.000Z (about 2 years ago)
- Last Synced: 2025-02-12T19:38:33.517Z (3 months ago)
- Topics: unicode, wordwrap, zalgo
- Language: JavaScript
- Homepage:
- Size: 120 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wrap-segments
Wrap lines at Unicode word boundaries, using [Intl.Segmenter](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter).
Existing wrapping libraries tend to work very well on plain ASCII-7 text.
However, the world has lots of other text that needs to be wrapped.You might want to turn this:
> f̵̩̣̺ö̶̧̧̢o̶̥̩̗̹ ̶̨̢͔̳b̷̧̥͍̥a̷̛̦͓̜r̴̡͕̳̪
into this:
> f̵̩̣̺ö̶̧̧̢o̶̥̩̗̹ ̶̨̢͔̳
> b̷̧̥͍̥a̷̛̦͓̜r̴̡͕̳̪by wrapping every 4 [grapheme clusters](https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).
## Installation
```sh
npm install wrap-segments
```## API
None of the options are required, and you can omit the options entirely
to take all of the defaults. The below example shows the default options:```js
import {SegmentWrapper} from '../lib/index.js'const w = new SegmentWrapper({
escape: identityTransform, // Escape inputs before proessing
indent: '', // Can be a string or number
indentChar: ' ', // If indent is a number, repeat this that many times
indentEmpty: false, // If the input is empty, still indent?
indentFirst: true, // Indent the first line?
isEmpty: /^\s*$/u, // Is a given text segment empty? Only applies to non-wordLike segments.
isNewline: /((?![\r\n\v\f\x85\u2028\u2029])\s)*[\r\n\v\f\x85\u2028\u2029]+(\s*)/gu, // Replace newlines matching this with newlineReplacement
locale: DEFAULT_LOCALE, // Default is calculated by the JS runtime
newline: '\n', // Insert this at the end of every line
newlineReplacement: ' ', // What to replace isNewline with
trim: true, // Trim whitespace from the end of the input
width: 80, // In grapheme clusters, *including* indent
})const wrapped = w.wrap('Lorem Ipsum...')
```Generated [API documentation](https://hildjj.github.io/wrap-segments/) is
available.## Command line
A CLI is available as a [separate package](cli/README.md).
## Caveats
- This hasn't been tested with enough languages. Please submit an issue or PR if you speak Korean, a language that uses the Devanagari script, a language that uses a right-to-left script such as Arabic or Hebrew, etc.
- This does not implement the full [line breaking algorithm](https://unicode.org/reports/tr14/#Algorithm) from Unicode TR14. I'm hoping that the `Intl.Segmenter` word boundaries are "close enough" for most cases. It's hard to get access to all of the needed properties from the JS runtime without including version-specific Unicode data, which I don't want to do. However, there are some rules in that algorithm that would be worth adding, with some careful thought.---
[](https://github.com/hildjj/wrap-segments/actions/workflows/node.js.yml)
[](https://codecov.io/gh/hildjj/wrap-segments)