An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with chunking-algorithm

A curated list of projects in awesome lists tagged with chunking-algorithm .

https://github.com/chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

ai chonkie chunker chunking-algorithm llms rag retrieval-systems semantic-chunker splitting-algorithms text-splitter typescript

Last synced: 01 Feb 2026

https://github.com/nlfiedler/fastcdc-rs

FastCDC implementation in Rust

chunking-algorithm deduplication rust

Last synced: 04 Apr 2025

https://github.com/mg98/ae-chunker-go

Go implementation of the AE chunking algorithm.

chunking chunking-algorithm go golang

Last synced: 12 Apr 2025

https://github.com/arcadiasofts/clast-rs

A Rust library for Content-Defined Chunking (CDC).

chunking-algorithm content-defined-chunking rust-library

Last synced: 30 Nov 2025

https://github.com/ayush585/smartchunk

SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.

agentic-workflow chunking chunking-algorithm cli llm nlp package pip rag semantic

Last synced: 07 Sep 2025

https://github.com/mudssrali/chunkify

a simple utility to split given array into chunks of input size with array reverse option

array array-splitter chunk chunking-algorithm chunking-array javascript split typescript

Last synced: 19 Nov 2025

https://github.com/mahnoorsheikh16/nlp-framework-for-literature-summarization-in-law-and-policy

Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, chunking), extractive summarization baselines, and fine-tuned abstractive models (PEGASUS and LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity for evaluation.

bleu-score chunking-algorithm cosine-similarity encoder-decoder-model led longformer-models nlp-keywords-extraction pegasus policy-analysis retrieval-chatbot rouge-metric text-summarization tokenization

Last synced: 09 Oct 2025

https://github.com/davidwrossiter/langchunk

Source code for chunking code in multiple different languages

chunking chunking-algorithm embedding llm-context vectorization

Last synced: 20 Mar 2025