Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/runtime-error786/hypothetical-document-embedding
https://github.com/runtime-error786/hypothetical-document-embedding
huggingface-transformers langchain llama3-meta-ai
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/runtime-error786/hypothetical-document-embedding
- Owner: runtime-error786
- Created: 2024-08-25T06:38:16.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-25T06:56:56.000Z (5 months ago)
- Last Synced: 2024-08-25T07:50:04.968Z (5 months ago)
- Topics: huggingface-transformers, langchain, llama3-meta-ai
- Language: Jupyter Notebook
- Homepage:
- Size: 6.2 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hypothetical Document Embedding
## Project Description
This project leverages LangChain to process and retrieve relevant information from PDF documents. By embedding the contents of multiple PDFs, it enables efficient and contextually accurate question-answering. The system incorporates state-of-the-art models like Hugging Face for embedding and Ollama for generating responses. Additionally, it employs a Chroma vector store to facilitate the retrieval of document segments that are most relevant to user queries.
## Features
- **PDF Document Loading:** Automatically loads and processes PDF documents from a specified directory.
- **Text Chunking:** Utilizes RecursiveCharacterTextSplitter to divide large text into manageable chunks for better retrieval performance.
- **Document Embedding:** Uses Hugging Face embeddings to create vector representations of text chunks for efficient retrieval.
- **Vector Store Integration:** Stores and retrieves document embeddings using Chroma, allowing for quick and accurate search operations.
- **Contextual Question Answering:** Employs Ollama to generate responses based on retrieved document segments, providing answers in context.
- **Customizable Prompts:** Supports custom prompt templates to fine-tune the response generation according to specific requirements.
- **Dynamic Document Retrieval:** Automatically retrieves and provides contextually relevant document segments in response to queries.