{"id":13658667,"url":"https://github.com/algolia/chunk-text","last_synced_at":"2025-06-19T15:42:15.052Z","repository":{"id":57108089,"uuid":"94689753","full_name":"algolia/chunk-text","owner":"algolia","description":"🔪 chunk/split a string by length without cutting/truncating words.","archived":false,"fork":false,"pushed_at":"2020-09-16T08:11:39.000Z","size":199,"stargazers_count":45,"open_issues_count":1,"forks_count":9,"subscribers_count":69,"default_branch":"master","last_synced_at":"2025-06-02T22:39:58.133Z","etag":null,"topics":["algolia","array","chunk","length","size","split","string","text"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/algolia.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-18T13:43:38.000Z","updated_at":"2024-12-18T11:17:47.000Z","dependencies_parsed_at":"2022-08-20T17:10:52.140Z","dependency_job_id":null,"html_url":"https://github.com/algolia/chunk-text","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/algolia/chunk-text","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fchunk-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fchunk-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fchunk-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fchunk-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/algolia","download_url":"https://codeload.github.com/algolia/chunk-text/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fchunk-text/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260781391,"owners_count":23062231,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algolia","array","chunk","length","size","split","string","text"],"created_at":"2024-08-02T05:01:01.544Z","updated_at":"2025-06-19T15:42:10.040Z","avatar_url":"https://github.com/algolia.png","language":"JavaScript","readme":"Chunk Text\n===\n\n\u003e chunk/split a string by length without cutting/truncating words.\n\n\n``` javascript\nconst out = chunk('hello world how are you?', 7);\n/* ['hello', 'world', 'how are', 'you?'] */\n```\n\n\n## Installation\n\n``` bash\n$ npm install chunk-text\n# yarn add chunk-text\n```\n\n\n## Usage\n\nAll number values are parsed according to `Number.parseInt`.\n\n``` javascript\nconst chunk = require('chunk-text');\n```\n\n#### chunk(text, chunkSize);\n\nChunks the `text` string into an array of strings that each have a maximum length of `chunkSize`.\n\n``` javascript\nconst out = chunk('hello world how are you?', 7);\n/* ['hello', 'world', 'how are', 'you?'] */\n```\n\nIf no space is detected before `chunkSize` is reached, then it will truncate the word to always\nensure the resulting text chunks have at maximum a length of `chunkSize`.\n\n``` javascript\nconst out = chunk('hello world', 4);\n/* ['hell', 'o', 'worl', 'd'] */\n```\n\n#### chunk(text, chunkSize, chunkOptions);\n\nChunks the `text` string into an array of strings that each have a maximum length of `chunkSize`, as determined by `chunkOptions.charLengthMask`.\n\nThe default behavior if `chunkOptions.charLengthMask` is excluded is equal to `chunkOptions.charLengthMask=-1`.\n\nFor single-byte characters, `chunkOptions.charLengthMask` never changes the results.\n\nFor multi-byte characters, `chunkOptions.charLengthMask` allows awareness of multi-byte glyphs according to the following table:\n\n| `chunkOptions.charLengthMask` | result                                                                                                                                                                                          |\n|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| -1          | - same as default, same as `chunkOptions.charLengthMask=1`\u003cbr /\u003e- each character counts as 1 towards length                                                                                                         |\n| 0           | - each character counts as the number of bytes it contains                                                                                                                                      |\n| \u003e0          | - each character counts as the number of bytes it contains, up to a limit of `chunkOptions.charLengthMask=N`\u003cbr /\u003e- a 7-byte ZWJ emoji such as runningPerson+ZWJ+femaleSymbol (🏃🏽‍♀️) counts as 2, when `chunkOptions.charLengthMask=2` |\n\nYou can also substitute from the default `chunkOptions.charLengthType` property of `length` to `TextEncoder`.\n\nThis enables you to pass any object to `chunkOptions.textEncoder` which matches the signature, `chunkOptions.textEncoder.encode(text).length`\n\nIf your environment natively contains the `TextEncoder` prototype and `chunkOptions.textEncoder` isn't provided,\n\nthe module attempts `new TextEncoder()` in order to use this `chunkOptions.charLengthType`.\n\nIf\n\n- `chunkOptions.charLengthType` is set to `TextEncoder`.\n- `chunkOptions.textEncoder` isn't provided.\n- `TextEncoder` prototype isn't provided by the environment.\n\nThen\n\n- `ReferenceError` will occur.\n\nEnd If\n\n``` javascript\n// one woman runner emoji with a colour is seven bytes, or five characters\n// RUNNER(2) + COLOUR(2) + ZJW + GENDER + VS15\n// (actually encodes to 17)\nconst runner = '🏃🏽‍♀️';\n\nconst outDefault = chunk(runner+runner+runner, 4);\n/* [ '🏃🏽‍♀️🏃🏽‍♀️🏃🏽‍♀️' ] */\n\nconst outZero = chunk(runner+runner+runner, 4, { charLengthMask: 0 });\n/* [ '🏃🏽‍♀️', '🏃🏽‍♀️', '🏃🏽‍♀️' ] */\n\nconst outTwo = chunk(runner+runner+runner, 4, { charLengthMask: 2 });\n/* [ '🏃🏽‍♀️🏃🏽‍♀️', '🏃🏽‍♀️' ] */\n\n// FLAG + RAINBOW\n// 2 each as length, 4 each as TextEncoder\n// 4 as length, 8 as TextEncoder\n// Node v14.5.0 does not provide TextEncoder natively.\nconst flags = '🏳️‍🌈🏳️‍🌈';\n\n// \\/ will fail if your environment doesn't already have TextEncoder prototype \\/\nchunk(flags, 8, { charLengthMask: 0, charLengthType: 'TextEncoder' });\n// [ '🏳️‍🌈', '🏳️‍🌈' ]\n// /\\ will fail if your environment doesn't already have TextEncoder prototype /\\\n\nchunk(flags, 4, {\n  charLengthMask: 0,\n  charLengthType: 'TextEncoder',\n  textEncoder: new TextEncoder(),\n})\n// [ '🏳️‍🌈', '🏳️‍🌈' ]\n\nchunk(flags, 999, {\n  charLengthMask: 0,\n  charLengthType: 'TextEncoder',\n  textEncoder: {\n    encode: () =\u003e ({ length: 999 }),\n  },\n})\n// [ '🏳️‍🌈', '🏳️‍🌈' ]\n```\n\n## Usage in Algolia context\n\nThis library was created by [Algolia](https://www.algolia.com/) to ease\nthe optimizing of record payload sizes resulting in faster search responses from the API.\n\nIn general, there is always a unique large \"content attribute\" per record,\nand this packages will allow to chunk that content into small chunks of text.\n\nThe text chunks can then be [distributed over multiple records](https://www.algolia.com/doc/faq/basics/how-do-i-reduce-the-size-of-my-records/#faq-section).\n\nHere is an example of how to split an existing record into several ones:\n\n``` javascript\nvar chunk = require('chunk-text');\nvar record = {\n  post_id: 100,\n  content: 'A large chunk of text here'\n};\n\nvar chunks = chunk(record.content, 600); // Limit the chunk size to a length of 600.\nvar records = [];\nchunks.forEach(function(content) {\n  records.push(Object.assign({}, record, {content: content}));\n});\n```\n","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falgolia%2Fchunk-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falgolia%2Fchunk-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falgolia%2Fchunk-text/lists"}