https://github.com/abd-al-rahmanh/multi-doc-retrieval-watsonx
This IBM Watsonx-powered chatbot processes documents (PDFs, CSVs, text etc..) to answer user queries and common questions accurately, streamlining information retrieval.
https://github.com/abd-al-rahmanh/multi-doc-retrieval-watsonx
chatbot rag streamlit watsonx-ai
Last synced: 7 months ago
JSON representation
This IBM Watsonx-powered chatbot processes documents (PDFs, CSVs, text etc..) to answer user queries and common questions accurately, streamlining information retrieval.
- Host: GitHub
- URL: https://github.com/abd-al-rahmanh/multi-doc-retrieval-watsonx
- Owner: Abd-al-RahmanH
- License: mit
- Created: 2024-11-10T05:49:10.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-01-20T17:19:13.000Z (9 months ago)
- Last Synced: 2025-01-20T18:26:52.712Z (9 months ago)
- Topics: chatbot, rag, streamlit, watsonx-ai
- Language: Python
- Homepage: https://huggingface.co/spaces/RAHMAN00700/chat_multi4_doc
- Size: 686 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
---
title: Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx
emoji: 😻
colorFrom: purple
colorTo: pink
sdk: streamlit
sdk_version: 1.40.0
app_file: app.py
pinned: false
---# Multi-Document Retrieval with Watsonx 😻
**A Streamlit-powered app for querying multiple document types using Watsonx and LangChain.**
This project allows users to upload various file formats (PDFs, DOCX, CSV, JSON, YAML, HTML, etc.) and retrieve contextually accurate responses using Watsonx LLM models and LangChain. The app provides a seamless interface to perform retrieval-augmented generation (RAG) from uploaded documents
**Note**: While this app runs efficiently on machines with low specifications, for faster indexing and response times, I recommend using a more powerful machine.
## Live App
[Link to live app](https://huggingface.co/spaces/RAHMAN00700/Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx)
---## Features
- **File Support**: Supports multiple file formats such as PDFs, Word documents, PowerPoint presentations, CSV, JSON, YAML, HTML, and plain text.
- **Watsonx LLM Integration**: Utilize IBM Watsonx's LLM models for querying and generating answers.
- **Embeddings**: Uses `HuggingFace` embeddings for document indexing.
- **RAG (Retrieval Augmented Generation)**: Combines document-based retrieval with LLMs for accurate responses.
- **Streamlit Interface**: Provides an intuitive user experience.---
## Installation
Follow these steps to clone and run the project locally:
### Prerequisites
1. **Python 3.8+** installed on your system.
2. Install `pip` (Python package manager).
3. An IBM Watsonx API key and Project ID.
4. Install Git if not already installed.### Clone the Repository
```bash
git clone https://github.com/Abd-al-RahmanH/Multi-Doc-Retrieval-Watsonx.git
cd Multi-Doc-Retrieval-Watsonx
```
### Install Dependencies
1. Create a virtual environment (optional but recommended):
```bash
python -m venv env
source env/bin/activate # On Windows: .\env\Scripts\activate
```2. Install required Python packages:
```bash
pip install -r requirements.txt
```### Set Environment Variables
Create a `.env` file in the project directory with the following keys:
```env
WATSONX_API_KEY=
WATSONX_PROJECT_ID=
```### Run the App
1. Start the Streamlit app by running:
```bash
streamlit run app.py
```2. Open the URL displayed in your terminal (usually [http://localhost:8501](http://localhost:8501)) to access the app.
---
## How to Use
1. **Upload Documents**: Drag and drop supported files (e.g., PDFs, DOCX, JSON) in the app sidebar.
2. **Select Model and Parameters**: Choose a Watsonx model and configure settings like output tokens and decoding methods.
3. **Ask Questions**: Enter queries in the chat input to retrieve answers based on the uploaded document.
---## Project Structure
```plaintext
Multi-Doc-Retrieval-Watsonx/
├── app.py # Main application file
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── .env # Environment variables (not included in repo, create manually)
```---
## Dependencies
- **Streamlit**: For building the user interface.
- **LangChain**: For document retrieval and RAG implementation.
- **HuggingFace Transformers**: For embedding and vector representation.
- **Watsonx Foundation Models**: For querying and text generation.
- **Various Python Libraries**: For file handling, including `pandas`, `python-docx`, `python-pptx`, and more.---
## Contributing
We welcome contributions! If you'd like to improve this project:
1. Fork the repository.
2. Create a feature branch: `git checkout -b feature-name`.
3. Commit your changes: `git commit -m 'Add a new feature'`.
4. Push to the branch: `git push origin feature-name`.
5. Open a Pull Request.---
## More Blogs and Interesting Projects
For more blogs and interesting projects, visit my personal website: [https://abdulrahmanh.com](https://abdulrahmanh.com)
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---