Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/runtime-error786/parent-document-retriever


https://github.com/runtime-error786/parent-document-retriever

huggingface-transformers langchain llama3-meta-ai

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

        

# Parent Document Retriever

This project demonstrates how to load, process, and retrieve information from PDF documents using LangChain. The script processes PDF files, splits them into chunks, embeds the text, and enables question-answering (QA) capabilities through a retrieval system.

## Features

- **PDF Document Loading**: Load and process PDF documents from a specified folder.
- **Text Splitting**: Break down large documents into smaller chunks for efficient processing.
- **Embedding**: Use HuggingFace's sentence transformer model to embed document chunks.
- **Vector Store**: Store the embedded chunks in a vector store (Chroma) for efficient retrieval.
- **Retrieval System**: Retrieve relevant information from documents using LangChain's ParentDocumentRetriever.
- **Question Answering**: Perform QA on the documents using the Ollama language model.