https://github.com/shelfio/fast-chunk-string

Chunk string into equal substrings with unicode support
https://github.com/shelfio/fast-chunk-string

chunk-string node-module nodejs npm-package string-manipulation unicode

Last synced: 5 months ago
JSON representation

Chunk string into equal substrings with unicode support

Host: GitHub
URL: https://github.com/shelfio/fast-chunk-string
Owner: shelfio
License: mit
Created: 2017-12-28T15:52:09.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2025-10-05T06:24:24.000Z (5 months ago)
Last Synced: 2025-10-05T08:31:26.202Z (5 months ago)
Topics: chunk-string, node-module, nodejs, npm-package, string-manipulation, unicode
Language: TypeScript
Size: 441 KB
Stars: 18
Watchers: 20
Forks: 4
Open Issues: 5
Metadata Files:
- Readme: readme.md
- License: license

Awesome Lists containing this project

README

          # fast-chunk-string [![CircleCI](https://img.shields.io/circleci/project/shelfio/fast-chunk-string.svg)](https://circleci.com/gh/shelfio/fast-chunk-string)

> Chunk string into equal substrings with unicode support

Credits to [stackoverflow.com/a/29202760/2727317](https://stackoverflow.com/a/29202760/2727317)

## Install

```

$ yarn add @shelf/fast-chunk-string

```

## Usage

```js

import fastChunkString from '@shelf/fast-chunk-string';

// the fastest way

fastChunkString('unicorns', {size: 2, unicodeAware: false});

// => ['un', 'ic', 'or', 'ns']

// ignore unicode, still fast but inaccurate

fastChunkString('😀😃😄😁', {size: 2, unicodeAware: false});

// => ['😀', '😃', '😄', '😁']

// respect unicode, slow but accurate

fastChunkString('😀😃😄😁', {size: 2, unicodeAware: true});

// => ['😀😃', '😄😁']

```

## Benchmarks

Run via `yarn benchmark`. Measured on M2 Max.

```

Running "Without Unicode" suite...

Progress: 100%

  ~33 kb split by 2 kb:

    14 106 903 ops/s, ±1.71%    | 86.19% slower

  ~33 kb split by 1 mb:

    100 461 043 ops/s, ±1.45%   | 1.63% slower

  ~330 kb split by 2 kb:

    1 600 485 ops/s, ±0.63%     | 98.43% slower

  ~330 kb split by 1 mb:

    102 125 168 ops/s, ±1.50%   | fastest

  ~3.3 mb split by 2 kb:

    161 507 ops/s, ±1.19%       | 99.84% slower

  ~3.3 mb split by 1 mb:

    41 773 807 ops/s, ±1.54%    | 59.1% slower

  ~33 mb split by 2 kb:

    11 098 ops/s, ±0.25%        | slowest, 99.99% slower

  ~33 mb split by 1 mb:

    5 506 349 ops/s, ±0.58%     | 94.61% slower

Finished 8 cases!

  Fastest: ~330 kb split by 1 mb

  Slowest: ~33 mb split by 2 kb

Running "Unicode Aware" suite...

Progress: 100%

  ~33 kb split by 2 kb with unicodeAware:

    847 ops/s, ±0.99%   | 12.14% slower

  ~33 kb split by 1 mb with unicodeAware:

    964 ops/s, ±0.25%   | fastest

  ~330 kb split by 2 kb with unicodeAware:

    71 ops/s, ±0.76%    | slowest, 92.63% slower

  ~330 kb split by 1 mb with unicodeAware:

    90 ops/s, ±0.94%    | 90.66% slower

Finished 4 cases!

  Fastest: ~33 kb split by 1 mb with unicodeAware

  Slowest: ~330 kb split by 2 kb with unicodeAware

```

## Recent optimizations — September 2025

September 2025 improvements were delivered autonomously by the gpt-5-codex model. We treated the hot paths like any latency-sensitive service and tuned the slowest sections:

- Single-pass unicode chunking – length and slicing now come from the same `runes()` walk, eliminating the extra `string-length` scan and keeping multicodepoint graphemes intact.

- Consolidated ASCII loop – collapsed the fast path into one traversal with early exits for empty inputs and oversized chunk sizes to trim per-call overhead.

- Fractional-size parity – restored the legacy `slice` coercion semantics so non-integer chunk sizes behave exactly as before, backed by new regression tests.

The result is steadier throughput in the ASCII suite (for example ~33 kb split by 1 mb climbs from 85.6M to 100.5M ops/s) and a 9–10× lift in the unicode-aware scenarios (e.g. 33 kb splits rise from ~101 ops/s to ~964 ops/s) while preserving behaviour for combining marks and emoji ligatures.

## See Also

- [fast-normalize-spaces](https://github.com/shelfio/fast-normalize-spaces)

- [fast-natural-order-by](https://github.com/shelfio/fast-natural-order-by)

- [fast-uslug](https://github.com/shelfio/fast-uslug)

## Publish

```sh

$ git checkout master

$ yarn version

$ yarn publish

$ git push origin master --tags

```

## License

MIT © [Shelf](https://shelf.io)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shelfio/fast-chunk-string

Awesome Lists containing this project

README