https://github.com/runtime-error786/context-compression-retriever

huggingface-transformers langchain llama3-meta-ai

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/runtime-error786/context-compression-retriever
Owner: runtime-error786
Created: 2024-08-23T20:34:24.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-23T20:44:48.000Z (11 months ago)
Last Synced: 2024-12-05T13:11:27.372Z (7 months ago)
Topics: huggingface-transformers, langchain, llama3-meta-ai
Language: Jupyter Notebook
Homepage:
Size: 6.2 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Document Compressor Pipeline

## Description
The LLM-Enhanced PDF Knowledge Extractor is an advanced tool designed to process and analyze large collections of PDF documents using state-of-the-art Large Language Models (LLMs). The tool splits text from PDFs into smaller chunks, embeds the chunks using HuggingFace's sentence-transformers, and stores them in a Chroma vector database. The stored embeddings are then queried using various retrieval mechanisms, including contextual compression and filtering, to provide accurate and relevant responses to user queries.

This tool is ideal for tasks such as document indexing, information retrieval, and knowledge management across large document sets.

## Features
PDF Parsing: Automatically loads and parses multiple PDF files from a specified directory.
Text Chunking: Splits parsed text into manageable chunks for efficient processing.
Embeddings Generation: Uses sentence-transformers from HuggingFace to generate high-quality embeddings for each text chunk.
Vector Store Management: Stores and retrieves embeddings using Chroma, a high-performance vector database.
Contextual Compression: Utilizes contextual compression techniques to retrieve the most relevant information from large document sets.
Custom Retrieval Chains: Includes multiple retrieval chains like LLMChainExtractor, LLMChainFilter, EmbeddingsFilter, and DocumentCompressorPipeline for flexible query processing.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/runtime-error786/context-compression-retriever

Awesome Lists containing this project

README