{"id":18111918,"url":"https://github.com/philnash/chunkers","last_synced_at":"2025-04-06T08:28:56.829Z","repository":{"id":254086416,"uuid":"844767766","full_name":"philnash/chunkers","owner":"philnash","description":"An exploration of text splitting and chunking in JavaScript","archived":false,"fork":false,"pushed_at":"2024-11-21T20:47:21.000Z","size":16258,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-21T00:09:23.288Z","etag":null,"topics":["langchain-js","llamaindex","text-chunking","text-splitter","text-splitting"],"latest_commit_sha":null,"homepage":"https://chunkers.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philnash.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-19T23:43:28.000Z","updated_at":"2024-11-21T20:47:24.000Z","dependencies_parsed_at":"2024-11-21T21:36:00.539Z","dependency_job_id":null,"html_url":"https://github.com/philnash/chunkers","commit_stats":null,"previous_names":["philnash/chunkers"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philnash%2Fchunkers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philnash%2Fchunkers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philnash%2Fchunkers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philnash%2Fchunkers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philnash","download_url":"https://codeload.github.com/philnash/chunkers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247455558,"owners_count":20941727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["langchain-js","llamaindex","text-chunking","text-splitter","text-splitting"],"created_at":"2024-11-01T01:08:35.525Z","updated_at":"2025-04-06T08:28:56.806Z","avatar_url":"https://github.com/philnash.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Chunkers\nAn exploration of JavaScript text splitters.\n\n## What is chunking?\n\nWhen building a [Retrieval-Augmented Generation (RAG) based app](https://www.datastax.com/guides/what-is-retrieval-augmented-generation?utm_medium=display\u0026utm_source=datastax\u0026utm_campaign=chunkers), one of the most important things you need to do is to get your data AI-ready. One of the steps in that process is known as \"chunking\" as it is used to break down large blocks of text or unstructured data into smaller chunks. Read more about [why chunking is important and what to consider here](https://www.datastax.com/blog/chunking-to-get-your-data-ai-ready?utm_medium=display\u0026utm_source=datastax\u0026utm_campaign=chunkers).\n\nIn the JavaScript world, there are a few libraries that can help you with chunking your data. This project is an exploration of those tools and you can see the write up in the blog post on [how to chunk text in JavaScript for your RAG application](https://www.datastax.com/blog/how-to-chunk-text-in-javascript-for-rag-applications?utm_medium=display\u0026utm_source=datastax\u0026utm_campaign=chunkers).\n\n## The project\n\nThis is a Next.js application that allows you to experiment with four JavaScript tools that provide different text chunking capabilities. The tools are:\n\n* [llm-chunk](https://github.com/golbin/llm-chunk)\n* [@langchain/textsplitters](https://js.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)\n* [LlamaIndex NodeParser](https://ts.llamaindex.ai/modules/node_parser)\n* [semantic-chunking](https://github.com/jparkerweb/semantic-chunking)\n\n\n## Running the project\n\nFirst, clone this repo:\n\n```sh\ngit clone https://github.com/philnash/chunkers.git\ncd chunkers\n```\n\nInstall the dependencies:\n\n```sh\nnpm install\n```\n\nThen, run the development server:\n\n```sh\nnpm run dev\n```\n\nOpen [http://localhost:3000](http://localhost:3000) with your browser to see the result.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilnash%2Fchunkers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilnash%2Fchunkers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilnash%2Fchunkers/lists"}