Projects in Awesome Lists tagged with text-splitting
A curated list of projects in awesome lists tagged with text-splitting .
https://github.com/chonkie-ai/chonkie
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
ai chunking etl nlp python rag retrieval semantic-segmentation text-chunking text-processing text-splitting vector-search
Last synced: 14 May 2025
https://github.com/isaacus-dev/semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
chunking isaacus nlp python semantic-chunking splitting text text-chunking text-splitting
Last synced: 15 May 2025
https://github.com/messkan/rag-chunk
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
chunking document-chunking embedding-vectors ia langchain llm nlp python rag rag-pipeline retrieval-augmented-generation text-splitting vector-search
Last synced: 05 Mar 2026
https://github.com/jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
chunking embeddings llm semantic-chunking text-chunking text-splitter text-splitting vector
Last synced: 01 May 2025
https://github.com/speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
ai chunking chunks-algorithm chunks-processing code-chunking code-structure document-chunking natural-language-processing nlp rag text-splitting visualization
Last synced: 02 Mar 2026
https://github.com/sentencizer/sentencizer
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
ai golang llm natural-language-processing nlp-library rag retrieval-augmented-generation sentence-boundary-detection sentence-segmentation sentence-segmenter sentence-splitter sentence-splitting sentence-tokenizer text-splitter text-splitting
Last synced: 14 Jan 2026
https://github.com/jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications
chunk chunking etl-pipeline java rag text-splitter text-splitting
Last synced: 10 Mar 2026
https://github.com/hamedfathi/recursivetextsplitter
A smart C# text splitting library that intelligently chunks text while preserving semantic boundaries. Uses a hierarchical approach with configurable overlap and detailed metadata.
csharp dotnet dotnet-core dotnet-library dotnetcore recursive recursive-algorithm recursive-text-splitter text text-split text-splitter text-splitting
Last synced: 11 Mar 2026
https://github.com/hemaldholakiya12/pdfchat
A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
ai api cors embeddings faiss fastapi groq huggingface langchain llama3 llm pdf pdf-processing pymupdf python question-answering semantic-search text-splitting transformers vector-store
Last synced: 30 Oct 2025
https://github.com/philnash/chunkers
An exploration of text splitting and chunking in JavaScript
langchain-js llamaindex text-chunking text-splitter text-splitting
Last synced: 06 Apr 2025
https://github.com/pranav-kural/ledaa-text-splitter
Specialized markdown text splitter - part of LEDAA project's data ingestion pipeline for RAG.
conversational-ai langchain ledaa python text-splitting
Last synced: 29 Apr 2026
https://github.com/ekimetrics/adaptive-chunking
Adaptive Chunking: automatically select the best chunking method per document for RAG. Accepted at LREC 2026.
chunking information-retrieval llm nlp rag text-splitting
Last synced: 02 Jun 2026
https://github.com/resetnetwork/n8n-nodes
A collection of custom n8n nodes for enhanced document processing, text splitting, and embeddings generation
ai document-processing embeddings langchain monorepo n8n n8n-community-nodes text-splitting typescript
Last synced: 11 Jun 2025
https://github.com/samliebl/word-matching
Matching strings between lists based on length
block-splitting string text text-splitter text-splitting
Last synced: 14 Mar 2025
https://github.com/vaidehishyara14/ayurveda-pdf-q-a-chatbot
An intelligent chatbot that allows users to upload text-based Ayurveda PDFs and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining semantic search and LLM-based responses.
embeddings fastapi fiass langchain langchain-groq llama3 llm pdf pdfprocessing pymupdf python question-answering text-splitting vector-database
Last synced: 28 Apr 2026
https://github.com/skitsanos/streamlit-split-text
Text splitting example using Tiktoken
ai chunk chunking chunks genai llm rag streamlit text-split text-splitter text-splitting tiktoken
Last synced: 04 May 2026