Projects in Awesome Lists tagged with chunking-algorithm
A curated list of projects in awesome lists tagged with chunking-algorithm .
https://github.com/chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library
ai chonkie chunker chunking-algorithm llms rag retrieval-systems semantic-chunker splitting-algorithms text-splitter typescript
Last synced: 01 Feb 2026
https://github.com/nlfiedler/fastcdc-rs
FastCDC implementation in Rust
chunking-algorithm deduplication rust
Last synced: 04 Apr 2025
https://github.com/mg98/ae-chunker-go
Go implementation of the AE chunking algorithm.
chunking chunking-algorithm go golang
Last synced: 12 Apr 2025
https://github.com/arcadiasofts/clast-rs
A Rust library for Content-Defined Chunking (CDC).
chunking-algorithm content-defined-chunking rust-library
Last synced: 30 Nov 2025
https://github.com/isaka-james/chunks-to-file
A nodejs chunking system
chunk chunked-uploads chunking chunking-algorithm chunking-files chunks node-chunking nodejs nodejs-chunking
Last synced: 15 Jan 2026
https://github.com/ayush585/smartchunk
SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.
agentic-workflow chunking chunking-algorithm cli llm nlp package pip rag semantic
Last synced: 07 Sep 2025
https://github.com/mudssrali/chunkify
a simple utility to split given array into chunks of input size with array reverse option
array array-splitter chunk chunking-algorithm chunking-array javascript split typescript
Last synced: 19 Nov 2025
https://github.com/mahnoorsheikh16/nlp-framework-for-literature-summarization-in-law-and-policy
Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, chunking), extractive summarization baselines, and fine-tuned abstractive models (PEGASUS and LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity for evaluation.
bleu-score chunking-algorithm cosine-similarity encoder-decoder-model led longformer-models nlp-keywords-extraction pegasus policy-analysis retrieval-chatbot rouge-metric text-summarization tokenization
Last synced: 09 Oct 2025
https://github.com/davidwrossiter/langchunk
Source code for chunking code in multiple different languages
chunking chunking-algorithm embedding llm-context vectorization
Last synced: 20 Mar 2025