Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/runtime-error786/multimodaldocprocessor

huggingface huggingface-transformers langcahin llava multimodal

Last synced: 17 days ago
JSON representation

Host: GitHub
URL: https://github.com/runtime-error786/multimodaldocprocessor
Owner: runtime-error786
Created: 2024-08-07T19:52:49.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-08-10T10:36:22.000Z (4 months ago)
Last Synced: 2024-08-10T11:40:52.942Z (4 months ago)
Topics: huggingface, huggingface-transformers, langcahin, llava, multimodal
Language: Jupyter Notebook
Homepage:
Size: 3.87 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Multimodal PDF Content Extractor and Summarizer

This project provides a comprehensive tool for extracting and summarizing content from PDF documents. It supports text, tables, and images, using advanced language models and embeddings for content analysis and similarity search.

## Features

- **PDF Extraction**: Extracts text, tables (future implementation), and images from PDF files.
- **Text Summarization**: Splits and summarizes large text chunks using an LLM.
- **Image Summarization**: Summarizes image content using a language model.
- **FAISS Integration**: Uses FAISS for efficient similarity search on text, table, and image embeddings.
- **Query System**: Allows querying across different content types to retrieve relevant information and descriptions.
- **Image Display**: Displays relevant images extracted from the PDF.

## Installation

To run this project, ensure you have Python 3.11.3 installed. Then, install the required packages using pip:

```bash
pip install numpy faiss-cpu pymupdf pillow langchain langchain_community