Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/runtime-error786/multimodaldocprocessor
https://github.com/runtime-error786/multimodaldocprocessor
huggingface huggingface-transformers langcahin llava multimodal
Last synced: 17 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/runtime-error786/multimodaldocprocessor
- Owner: runtime-error786
- Created: 2024-08-07T19:52:49.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-10T10:36:22.000Z (4 months ago)
- Last Synced: 2024-08-10T11:40:52.942Z (4 months ago)
- Topics: huggingface, huggingface-transformers, langcahin, llava, multimodal
- Language: Jupyter Notebook
- Homepage:
- Size: 3.87 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Multimodal PDF Content Extractor and Summarizer
This project provides a comprehensive tool for extracting and summarizing content from PDF documents. It supports text, tables, and images, using advanced language models and embeddings for content analysis and similarity search.
## Features
- **PDF Extraction**: Extracts text, tables (future implementation), and images from PDF files.
- **Text Summarization**: Splits and summarizes large text chunks using an LLM.
- **Image Summarization**: Summarizes image content using a language model.
- **FAISS Integration**: Uses FAISS for efficient similarity search on text, table, and image embeddings.
- **Query System**: Allows querying across different content types to retrieve relevant information and descriptions.
- **Image Display**: Displays relevant images extracted from the PDF.## Installation
To run this project, ensure you have Python 3.11.3 installed. Then, install the required packages using pip:
```bash
pip install numpy faiss-cpu pymupdf pillow langchain langchain_community