Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/runtime-error786/parent-document-retriever
https://github.com/runtime-error786/parent-document-retriever
huggingface-transformers langchain llama3-meta-ai
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/runtime-error786/parent-document-retriever
- Owner: runtime-error786
- Created: 2024-08-24T13:34:04.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-24T13:38:28.000Z (5 months ago)
- Last Synced: 2024-08-24T14:47:40.290Z (5 months ago)
- Topics: huggingface-transformers, langchain, llama3-meta-ai
- Language: Jupyter Notebook
- Homepage:
- Size: 643 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Parent Document Retriever
This project demonstrates how to load, process, and retrieve information from PDF documents using LangChain. The script processes PDF files, splits them into chunks, embeds the text, and enables question-answering (QA) capabilities through a retrieval system.
## Features
- **PDF Document Loading**: Load and process PDF documents from a specified folder.
- **Text Splitting**: Break down large documents into smaller chunks for efficient processing.
- **Embedding**: Use HuggingFace's sentence transformer model to embed document chunks.
- **Vector Store**: Store the embedded chunks in a vector store (Chroma) for efficient retrieval.
- **Retrieval System**: Retrieve relevant information from documents using LangChain's ParentDocumentRetriever.
- **Question Answering**: Perform QA on the documents using the Ollama language model.