{"id":18486131,"url":"https://github.com/mathsgod/token-text-splitter","last_synced_at":"2026-01-25T11:01:57.209Z","repository":{"id":250646797,"uuid":"835055920","full_name":"mathsgod/token-text-splitter","owner":"mathsgod","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-12T04:05:13.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-20T20:53:49.587Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mathsgod.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-29T04:29:28.000Z","updated_at":"2024-08-12T04:04:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"7a91dce1-f435-4a75-ab8b-3076d57c4565","html_url":"https://github.com/mathsgod/token-text-splitter","commit_stats":null,"previous_names":["mathsgod/token-text-splitter"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/mathsgod/token-text-splitter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathsgod%2Ftoken-text-splitter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathsgod%2Ftoken-text-splitter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathsgod%2Ftoken-text-splitter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathsgod%2Ftoken-text-splitter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mathsgod","download_url":"https://codeload.github.com/mathsgod/token-text-splitter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathsgod%2Ftoken-text-splitter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28752360,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-25T10:25:12.305Z","status":"ssl_error","status_checked_at":"2026-01-25T10:25:11.933Z","response_time":113,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T12:47:55.479Z","updated_at":"2026-01-25T11:01:57.170Z","avatar_url":"https://github.com/mathsgod.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Token Text Splitter\n\nThis is a token text splitter. It splits the texts based on the token size. It is useful for splitting the text for the token based models like GPT-3, GPT-4, etc. The splitter will split the text into chunks based on the token size and overlap. \n\nSome written languages (e.g Chinese, Japanese) have characters which encode to 2 or more tokens. The splitter has a mechanism to ensure that the chunk is not split in the middle of the token. Each chunk will have well-formated Unicode characters.\n\n\n## Installation\n\n```bash\ncomposer require mathsgod/token-text-splitter\n``` \n\n\n## Usage\n    \n```php\n\nuse TextSplitter\\TokenTextSplitter;\n\n$text = \"蘋果公司（Apple Inc.）是美國的一家跨國科技公司，總部位於加利福尼亞州的庫比蒂諾。蘋果公司的硬體產品包括iPhone智慧型手機、iPad平板電腦、Mac個人電腦、iPod多媒體播放器、Apple Watch智慧手錶和Apple TV數位媒體機。蘋果公司的軟體產品包括iOS、iPadOS、macOS、watchOS和tvOS作業系統，iTunes多媒體播放軟體，Safari網頁瀏覽器，iLife和iWork生產力套件，Final Cut Pro X和Logic Pro X專業影音剪輯軟體。蘋果公司的線上服務包括App Store、Apple Music、iCloud、iTunes Store和Apple TV+。蘋果公司的零售店面遍佈全球，是全球最大的科技公司之一。\";\n\n// token size is 10\n// overlap is 5\n$splitter = new TextSplitter\\TokenTextSplitter(\"gpt-4o\", 10, 5);\n\n$chunks = $splitter-\u003esplitText($text);\n\nprint_r($chunks);\n\n\n```\n\n### Output\n\n```\nArray\n(\n    [0] =\u003e 蘋果公司（Apple Inc.）是\n    [1] =\u003e Apple Inc.）是美國的一家跨\n    [2] =\u003e 美國的一家跨國科技公司，總\n    [3] =\u003e 國科技公司，總部位於加利\n    [4] =\u003e 部位於加利福尼亞州的\n    [5] =\u003e 福尼亞州的庫比蒂諾\n    [6] =\u003e 庫比蒂諾。蘋果公司的\n    [7] =\u003e 。蘋果公司的硬體產品包括i\n    [8] =\u003e 硬體產品包括iPhone智慧型手機、\n    [9] =\u003e Phone智慧型手機、iPad平板電\n    [10] =\u003e iPad平板電腦、Mac個\n    [11] =\u003e 腦、Mac個人電腦、\n    [12] =\u003e 人電腦、iPod多媒體\n    [13] =\u003e iPod多媒體播放器、Apple Watch智慧\n    [14] =\u003e 播放器、Apple Watch智慧手錶和Apple\n    [15] =\u003e 手錶和Apple TV數位媒體\n    [16] =\u003e  TV數位媒體機。蘋果\n    [17] =\u003e 機。蘋果公司的軟體產品\n    [18] =\u003e 公司的軟體產品包括iOS、i\n    [19] =\u003e 包括iOS、iPadOS、macOS\n    [20] =\u003e PadOS、macOS、watchOS和tv\n    [21] =\u003e 、watchOS和tvOS作業系統\n    [22] =\u003e OS作業系統，iTunes多媒\n    [23] =\u003e ，iTunes多媒體播放軟體\n    [24] =\u003e 體播放軟體，Safari網頁\n    [25] =\u003e ，Safari網頁瀏覽器，iLife\n    [26] =\u003e 瀏覽器，iLife和iWork生\n    [27] =\u003e Life和iWork生產力套件，\n    [28] =\u003e 產力套件，Final Cut Pro X和\n    [29] =\u003e Final Cut Pro X和Logic Pro X專業\n    [30] =\u003e Logic Pro X專業影音剪輯軟\n    [31] =\u003e 影音剪輯軟體。蘋果\n    [32] =\u003e 體。蘋果公司的線上服務包括\n    [33] =\u003e 公司的線上服務包括App Store、Apple Music\n    [34] =\u003e App Store、Apple Music、iCloud、i\n    [35] =\u003e 、iCloud、iTunes Store和Apple TV\n    [36] =\u003e Tunes Store和Apple TV+。蘋果\n    [37] =\u003e +。蘋果公司的零售店面\n    [38] =\u003e 公司的零售店面遍佈全球，是\n    [39] =\u003e 遍佈全球，是全球最大的科技公司之一\n    [40] =\u003e 全球最大的科技公司之一。\n)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathsgod%2Ftoken-text-splitter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmathsgod%2Ftoken-text-splitter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathsgod%2Ftoken-text-splitter/lists"}