An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with text-splitting

A curated list of projects in awesome lists tagged with text-splitting .

https://github.com/chonkie-ai/chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

ai chunking etl nlp python rag retrieval semantic-segmentation text-chunking text-processing text-splitting vector-search

Last synced: 14 May 2025

https://github.com/isaacus-dev/semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

chunking isaacus nlp python semantic-chunking splitting text text-chunking text-splitting

Last synced: 15 May 2025

https://github.com/messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

chunking document-chunking embedding-vectors ia langchain llm nlp python rag rag-pipeline retrieval-augmented-generation text-splitting vector-search

Last synced: 05 Mar 2026

https://github.com/jparkerweb/semantic-chunking

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

chunking embeddings llm semantic-chunking text-chunking text-splitter text-splitting vector

Last synced: 01 May 2025

https://github.com/speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

ai chunking chunks-algorithm chunks-processing code-chunking code-structure document-chunking natural-language-processing nlp rag text-splitting visualization

Last synced: 02 Mar 2026

https://github.com/jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications

chunk chunking etl-pipeline java rag text-splitter text-splitting

Last synced: 10 Mar 2026

https://github.com/hamedfathi/recursivetextsplitter

A smart C# text splitting library that intelligently chunks text while preserving semantic boundaries. Uses a hierarchical approach with configurable overlap and detailed metadata.

csharp dotnet dotnet-core dotnet-library dotnetcore recursive recursive-algorithm recursive-text-splitter text text-split text-splitter text-splitting

Last synced: 11 Mar 2026

https://github.com/hemaldholakiya12/pdfchat

A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .

ai api cors embeddings faiss fastapi groq huggingface langchain llama3 llm pdf pdf-processing pymupdf python question-answering semantic-search text-splitting transformers vector-store

Last synced: 30 Oct 2025

https://github.com/philnash/chunkers

An exploration of text splitting and chunking in JavaScript

langchain-js llamaindex text-chunking text-splitter text-splitting

Last synced: 06 Apr 2025

https://github.com/pranav-kural/ledaa-text-splitter

Specialized markdown text splitter - part of LEDAA project's data ingestion pipeline for RAG.

conversational-ai langchain ledaa python text-splitting

Last synced: 29 Apr 2026

https://github.com/ekimetrics/adaptive-chunking

Adaptive Chunking: automatically select the best chunking method per document for RAG. Accepted at LREC 2026.

chunking information-retrieval llm nlp rag text-splitting

Last synced: 02 Jun 2026

https://github.com/resetnetwork/n8n-nodes

A collection of custom n8n nodes for enhanced document processing, text splitting, and embeddings generation

ai document-processing embeddings langchain monorepo n8n n8n-community-nodes text-splitting typescript

Last synced: 11 Jun 2025

https://github.com/samliebl/word-matching

Matching strings between lists based on length​

block-splitting string text text-splitter text-splitting

Last synced: 14 Mar 2025

https://github.com/vaidehishyara14/ayurveda-pdf-q-a-chatbot

An intelligent chatbot that allows users to upload text-based Ayurveda PDFs and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining semantic search and LLM-based responses.

embeddings fastapi fiass langchain langchain-groq llama3 llm pdf pdfprocessing pymupdf python question-answering text-splitting vector-database

Last synced: 28 Apr 2026