https://github.com/shelfio/fast-chunk-string
Chunk string into equal substrings with unicode support
https://github.com/shelfio/fast-chunk-string
chunk-string node-module nodejs npm-package string-manipulation unicode
Last synced: 3 months ago
JSON representation
Chunk string into equal substrings with unicode support
- Host: GitHub
- URL: https://github.com/shelfio/fast-chunk-string
- Owner: shelfio
- License: mit
- Created: 2017-12-28T15:52:09.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2025-10-05T06:24:24.000Z (3 months ago)
- Last Synced: 2025-10-05T08:31:26.202Z (3 months ago)
- Topics: chunk-string, node-module, nodejs, npm-package, string-manipulation, unicode
- Language: TypeScript
- Size: 441 KB
- Stars: 18
- Watchers: 20
- Forks: 4
- Open Issues: 5
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
# fast-chunk-string [](https://circleci.com/gh/shelfio/fast-chunk-string)
> Chunk string into equal substrings with unicode support
Credits to [stackoverflow.com/a/29202760/2727317](https://stackoverflow.com/a/29202760/2727317)
## Install
```
$ yarn add @shelf/fast-chunk-string
```
## Usage
```js
import fastChunkString from '@shelf/fast-chunk-string';
// the fastest way
fastChunkString('unicorns', {size: 2, unicodeAware: false});
// => ['un', 'ic', 'or', 'ns']
// ignore unicode, still fast but inaccurate
fastChunkString('😀😃😄😁', {size: 2, unicodeAware: false});
// => ['😀', '😃', '😄', '😁']
// respect unicode, slow but accurate
fastChunkString('😀😃😄😁', {size: 2, unicodeAware: true});
// => ['😀😃', '😄😁']
```
## Benchmarks
Run via `yarn benchmark`. Measured on M2 Max.
```
Running "Without Unicode" suite...
Progress: 100%
~33 kb split by 2 kb:
14 106 903 ops/s, ±1.71% | 86.19% slower
~33 kb split by 1 mb:
100 461 043 ops/s, ±1.45% | 1.63% slower
~330 kb split by 2 kb:
1 600 485 ops/s, ±0.63% | 98.43% slower
~330 kb split by 1 mb:
102 125 168 ops/s, ±1.50% | fastest
~3.3 mb split by 2 kb:
161 507 ops/s, ±1.19% | 99.84% slower
~3.3 mb split by 1 mb:
41 773 807 ops/s, ±1.54% | 59.1% slower
~33 mb split by 2 kb:
11 098 ops/s, ±0.25% | slowest, 99.99% slower
~33 mb split by 1 mb:
5 506 349 ops/s, ±0.58% | 94.61% slower
Finished 8 cases!
Fastest: ~330 kb split by 1 mb
Slowest: ~33 mb split by 2 kb
Running "Unicode Aware" suite...
Progress: 100%
~33 kb split by 2 kb with unicodeAware:
847 ops/s, ±0.99% | 12.14% slower
~33 kb split by 1 mb with unicodeAware:
964 ops/s, ±0.25% | fastest
~330 kb split by 2 kb with unicodeAware:
71 ops/s, ±0.76% | slowest, 92.63% slower
~330 kb split by 1 mb with unicodeAware:
90 ops/s, ±0.94% | 90.66% slower
Finished 4 cases!
Fastest: ~33 kb split by 1 mb with unicodeAware
Slowest: ~330 kb split by 2 kb with unicodeAware
```
## Recent optimizations — September 2025
September 2025 improvements were delivered autonomously by the gpt-5-codex model. We treated the hot paths like any latency-sensitive service and tuned the slowest sections:
- Single-pass unicode chunking – length and slicing now come from the same `runes()` walk, eliminating the extra `string-length` scan and keeping multicodepoint graphemes intact.
- Consolidated ASCII loop – collapsed the fast path into one traversal with early exits for empty inputs and oversized chunk sizes to trim per-call overhead.
- Fractional-size parity – restored the legacy `slice` coercion semantics so non-integer chunk sizes behave exactly as before, backed by new regression tests.
The result is steadier throughput in the ASCII suite (for example ~33 kb split by 1 mb climbs from 85.6M to 100.5M ops/s) and a 9–10× lift in the unicode-aware scenarios (e.g. 33 kb splits rise from ~101 ops/s to ~964 ops/s) while preserving behaviour for combining marks and emoji ligatures.
## See Also
- [fast-normalize-spaces](https://github.com/shelfio/fast-normalize-spaces)
- [fast-natural-order-by](https://github.com/shelfio/fast-natural-order-by)
- [fast-uslug](https://github.com/shelfio/fast-uslug)
## Publish
```sh
$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags
```
## License
MIT © [Shelf](https://shelf.io)