Projects in Awesome Lists tagged with text-chunking
A curated list of projects in awesome lists tagged with text-chunking .
https://github.com/chonkie-ai/chonkie
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
ai chunking etl nlp python rag retrieval semantic-segmentation text-chunking text-processing text-splitting vector-search
Last synced: 10 Apr 2025
https://github.com/isaacus-dev/semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
chunking isaacus nlp python semantic-chunking splitting text text-chunking text-splitting
Last synced: 14 Apr 2025
https://github.com/lazyFrogLOL/llmdocparser
A package for parsing PDFs and analyzing their content using LLMs.
chunking document-analysis llm nlp ocr pdf-parser pdfparser rag text-chunking
Last synced: 01 Apr 2025
https://github.com/umarbutler/semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
chunking nlp python semantic-chunking splitting text text-chunking text-splitting
Last synced: 18 Jan 2025
https://github.com/jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
chunking embeddings llm semantic-chunking text-chunking text-splitter text-splitting vector
Last synced: 19 Apr 2025
https://github.com/drittich/semanticslicer
A recursive text chunker that attempts to break the text on meaningful boundaries.
ai azure-openai chat-gpt chatgpt chunker chunking embeddings gpt gpt-4 langchain llm openai text-chunking
Last synced: 02 Dec 2024
https://github.com/philnash/chunkers
An exploration of text splitting and chunking in JavaScript
langchain-js llamaindex text-chunking text-splitter text-splitting
Last synced: 06 Apr 2025
https://github.com/simonpierreboucher/embedding
A robust Python tool for generating embeddings from text files using OpenAI's API. This tool processes text files, splits them into chunks while preserving context headers, and generates embeddings using OpenAI's models, saving both text and embeddings in structured formats.
api-rate-limiting automated-text-analysis context-preservation data-preprocessing embeddings-generation error-handling json-and-npy-formats machine-learning metadata-management natural-language-processing openai-api python-tool text-chunking text-embedding yaml-configuration
Last synced: 30 Mar 2025
https://github.com/adityapathak-cubastion/cubastion-hr-chatbot
Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>
cosine-similarity huggingface llama3 pinecone prompt-engineering python sentence-transformers streamlit text-chunking text-embeddings text-extraction text-generation
Last synced: 26 Mar 2025