Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sourasishbasu/chatpdf-clone-llama2b

Chatbot app for interactively conversing with PDFs
https://github.com/sourasishbasu/chatpdf-clone-llama2b

astradb chatbot gradientai llama2 llama2-7b llm meta python streamlit-webapp vector-database

Last synced: about 1 month ago
JSON representation

Chatbot app for interactively conversing with PDFs

Host: GitHub
URL: https://github.com/sourasishbasu/chatpdf-clone-llama2b
Owner: SourasishBasu
License: apache-2.0
Created: 2023-12-31T06:18:47.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-12-31T07:48:40.000Z (about 1 year ago)
Last Synced: 2024-10-18T22:05:22.591Z (4 months ago)
Topics: astradb, chatbot, gradientai, llama2, llama2-7b, llm, meta, python, streamlit-webapp, vector-database
Language: Jupyter Notebook
Homepage:
Size: 23.4 KB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ChatPDF Clone
> Chatbot app for interactively conversing with documents

# Introduction

With the emergence of services like ChatGPT, showcasing the power of LLMs and RAG in generating contextually relevant responses, I was motivated to understand their underlying mechanics. It led to me working on this chatbot which not only converses intelligently but also interacts seamlessly with PDF documents.

# Technical Overview

We utilize LlamaIndex, enhanced with the Llama2 model API using Gradient's LLM solution, seamlessly merge it with DataStax's Apache Cassandra as a vector database and develop the frontend UI with Streamlit using Python.

![image](https://github.com/SourasishBasu/ChatPDF-clone-llama2b/assets/89185962/89c8505f-4c5a-429c-bade-03e4fdc7f7d2)

Architecture Diagram

### LlamaIndex
Serves as a powerful framework for handling embeddings, efficient document indexing and retrieval.

![image](https://github.com/SourasishBasu/ChatPDF-clone-llama2b/assets/89185962/e6588ecf-f1ec-49ab-b354-11f23a76ea08)

![image](https://github.com/SourasishBasu/ChatPDF-clone-llama2b/assets/89185962/be5095a7-4b4c-409c-861d-1a6c33092ddd)

Retriever Engine

### GradientAI's LLM
By tapping into Gradient's LLM solution, we leverage state-of-the-art open source language models such as Meta's LLAMA 2 model specifically `llama-2b-chat`, allowing the chatbot to generate coherent and informed responses.

### Cassandra Vector Store
Integrates Apache Cassandra as a vector database, offering a solution for storing and managing vector embeddings of the provided documents which facilitates efficient retrieval and storage of document-related information.

### Streamlit
Simplifies the creation and deployment of web applications, providing a user friendly interface to initiate conversations with chatbot, explore document-related insights, and experience immersive interactions with PDFs in a visually appealing manner.

# Prerequisites

- Python 3.9 or above
- **GradientAI Account**:
- Create an account on GradientAI to access the LLMs required for training and deploying models.
- Create a new workspace & generate and store your `Access token` and `Workspace ID` credentials as secrets/environment variables.

- **AstraDB Account**
- Set up an account on AstraDB, a cloud-native database service built on Apache Cassandra and create a Vector Database.
- Under Connect generate an App Token as `Database Administrator` and save the `app-token.json` and the `Secure-Connect-Bundle.zip`.

![image](https://github.com/SourasishBasu/ChatPDF-clone-llama2b/assets/89185962/50570a51-fade-485d-b747-f8ba308f16e7)

# Setup

To set up ChatPDF Clone, follow these steps:

1. **Clone the Repository**: Clone the ChatPDF Clone repository to your local machine.
```bash
git clone https://github.com/your-username/ChatPDF-Clone.git
```

2. **Install Dependencies**: Navigate to the project directory and install the necessary dependencies.
```bash
cd ChatPDF-Clone/project
pip install -r requirements.txt
```

3. **Configure Credentials**: Add the GradientAI credentials as environment variables to your project environment. Copy the `Secure-Connect-Bundle.zip` and `app-token.json` into the project root directory.

## Open in Google Colab

Click to open the Notebook directly in Google Colab. Configure the access tokens under the Secrets section and upload PDFs into the Documents folder.

# Usage

Once the setup is complete, you can use ChatPDF Clone for interactive conversations. Run the script as follows and navigate to the locahost URL generated to access the webapp:

```bash
streamlit run main.py
```

In the following examples, I provided the PDF for a [summary of Merchant of Venice](https://pennstatelaw.psu.edu/_file/TheMerchantofVeniceSummary.pdf) to the service.

### Screenshots

![image](https://github.com/SourasishBasu/ChatPDF-clone-llama2b/assets/89185962/503c9af3-3f25-4b49-9a78-e4fb90444a85)

![image](https://github.com/SourasishBasu/ChatPDF-clone-llama2b/assets/89185962/3c632afd-a807-408b-80c5-1fe815d4a41e)

![image](https://github.com/SourasishBasu/ChatPDF-clone-llama2b/assets/89185962/526195de-ca55-4289-9c72-694ba6e6c2b5)

## Challenges

During the development of ChatPDF Clone, we encountered several challenges, including:

- **Integration Complexity**: Integrating GradientAI and AstraDB posed challenges in terms of authentication and data synchronization.
- **Retrieval Performance**: Retrieval Accuracy and Speed was severely affected with increase in document quantity.
- **Handling Dynamic Conversations**: Adapting the chatbot to handle dynamic and evolving conversations while maintaining coherence presented a challenge.