An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with text-chunking

A curated list of projects in awesome lists tagged with text-chunking .

https://github.com/chonkie-ai/chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

ai chunking etl nlp python rag retrieval semantic-segmentation text-chunking text-processing text-splitting vector-search

Last synced: 10 Apr 2025

https://github.com/isaacus-dev/semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

chunking isaacus nlp python semantic-chunking splitting text text-chunking text-splitting

Last synced: 14 Apr 2025

https://github.com/lazyFrogLOL/llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

chunking document-analysis llm nlp ocr pdf-parser pdfparser rag text-chunking

Last synced: 01 Apr 2025

https://github.com/umarbutler/semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

chunking nlp python semantic-chunking splitting text text-chunking text-splitting

Last synced: 18 Jan 2025

https://github.com/jparkerweb/semantic-chunking

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

chunking embeddings llm semantic-chunking text-chunking text-splitter text-splitting vector

Last synced: 19 Apr 2025

https://github.com/drittich/semanticslicer

A recursive text chunker that attempts to break the text on meaningful boundaries.

ai azure-openai chat-gpt chatgpt chunker chunking embeddings gpt gpt-4 langchain llm openai text-chunking

Last synced: 02 Dec 2024

https://github.com/philnash/chunkers

An exploration of text splitting and chunking in JavaScript

langchain-js llamaindex text-chunking text-splitter text-splitting

Last synced: 06 Apr 2025

https://github.com/simonpierreboucher/embedding

A robust Python tool for generating embeddings from text files using OpenAI's API. This tool processes text files, splits them into chunks while preserving context headers, and generates embeddings using OpenAI's models, saving both text and embeddings in structured formats.

api-rate-limiting automated-text-analysis context-preservation data-preprocessing embeddings-generation error-handling json-and-npy-formats machine-learning metadata-management natural-language-processing openai-api python-tool text-chunking text-embedding yaml-configuration

Last synced: 30 Mar 2025

https://github.com/adityapathak-cubastion/cubastion-hr-chatbot

Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>

cosine-similarity huggingface llama3 pinecone prompt-engineering python sentence-transformers streamlit text-chunking text-embeddings text-extraction text-generation

Last synced: 26 Mar 2025