https://github.com/runtime-error786/parent-document-retriever
https://github.com/runtime-error786/parent-document-retriever
huggingface-transformers langchain llama3-meta-ai
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/runtime-error786/parent-document-retriever
- Owner: runtime-error786
- Created: 2024-08-24T13:34:04.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-24T13:38:28.000Z (about 1 year ago)
- Last Synced: 2025-06-25T02:44:14.851Z (4 months ago)
- Topics: huggingface-transformers, langchain, llama3-meta-ai
- Language: Jupyter Notebook
- Homepage:
- Size: 643 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Parent Document Retriever
This project demonstrates how to load, process, and retrieve information from PDF documents using LangChain. The script processes PDF files, splits them into chunks, embeds the text, and enables question-answering (QA) capabilities through a retrieval system.
## Features
- **PDF Document Loading**: Load and process PDF documents from a specified folder.
- **Text Splitting**: Break down large documents into smaller chunks for efficient processing.
- **Embedding**: Use HuggingFace's sentence transformer model to embed document chunks.
- **Vector Store**: Store the embedded chunks in a vector store (Chroma) for efficient retrieval.
- **Retrieval System**: Retrieve relevant information from documents using LangChain's ParentDocumentRetriever.
- **Question Answering**: Perform QA on the documents using the Ollama language model.