https://github.com/connectaman/ragalchamy
Summarize and perform RAG on PPTx/PPT file formats
https://github.com/connectaman/ragalchamy
advance-rag artificial-intelligence generative-ai large-language-models natural-language-processing ppt rag summarization
Last synced: about 1 month ago
JSON representation
Summarize and perform RAG on PPTx/PPT file formats
- Host: GitHub
- URL: https://github.com/connectaman/ragalchamy
- Owner: connectaman
- Created: 2023-12-12T06:59:34.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-14T04:20:13.000Z (12 months ago)
- Last Synced: 2025-03-27T07:35:54.537Z (6 months ago)
- Topics: advance-rag, artificial-intelligence, generative-ai, large-language-models, natural-language-processing, ppt, rag, summarization
- Language: Jupyter Notebook
- Homepage:
- Size: 260 KB
- Stars: 15
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# RAGAlchamy (🥇 Winning Solution for ZS Hackathon)
RAGAlchamy is a groundbreaking Python package designed to revolutionize the way you interact with PowerPoint presentations.
Our package empowers you to effortlessly extract and manipulate a wide range of content within PPT files, including text, charts, and even perform OCR (Optical Character Recognition) on images embedded in your presentations.# Installation
1. Install / Check python verion -> python>=3.10
2. Install all required packages
```
pip install -r requirements.txt
```
3. Install tesseract
- Windows
- Download tesseract exe from https://github.com/UB-Mannheim/tesseract/wiki.
- Copy the path of tesseract.exe and update it in the code (ragalchemy->extractors->image.py , line 11)
```
pytesseract.pytesseract.tesseract_cmd = r'PATH TO tesseract.exe'
```
- Linux
```
sudo apt update
sudo apt-get install tesseract-ocr
```
4. Run the sample notebook