Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/odeyiany2/academic-research-assistant-rag-project

This repo contains my mid-camp project for the AISOC camp hosted by Zion Pibowei
https://github.com/odeyiany2/academic-research-assistant-rag-project

Last synced: 7 days ago
JSON representation

This repo contains my mid-camp project for the AISOC camp hosted by Zion Pibowei

Host: GitHub
URL: https://github.com/odeyiany2/academic-research-assistant-rag-project
Owner: Odeyiany2
Created: 2024-08-26T08:18:30.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-09-11T09:42:31.000Z (2 months ago)
Last Synced: 2024-09-12T00:19:28.614Z (2 months ago)
Language: Python
Homepage:
Size: 51.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# RAG Project: 📚 Academic Research Assistant ( Business and Finance )

## 1. Objective
As a penultimate student who is studying Accounting, the use case for my RAG chatbot will be based on Business and Finance. The goal is to build a simple RAG chatbot that takes in documents talking about business and finance via file upload, generates embeddings of the documents, stores the embeddings in a vector store, and retrieves relevant embeddings to answer the user's query. As a student I have had to read very large files for my courses and most times I find it difficult and time consuming to get answers from the files. With the chatbot access to answers will be easier, faster and more efficient.

## 2. Implementation Components
* FastAPI: Serves as the backend for the application, handling file uploads, query processing, and interacting with LangChain and ChromaDB.

* LangChain and Hugging Face Emeddings: generate embeddings for the doument and process user queries

* ChromaDB: store the generated embeddings in vector store

* Streamlit: For the deployment of the chatbot

### Models Used to be Used
- Via Groq:
- `llama-3.1-70b-versatile`
- Via Vertex AI on GCP (Not used yet):
- `gemini-1.5-pro-001`
- `mistral-large@2407`
- Via AnthropicVertex GCP (Not used yet):
- `claude-3-opus@20240229`
- `claude-3-5-sonnet@20240620`

## To Run Locally 👩🏽‍💻
Follow the steps below to run the codes locally and replicate the results.
- Clone the repo to your local machine
- VS code: use the link below to clone on vs code
```
https://github.com/Odeyiany2/Academic-Research-Assistant-RAG-Project.git
```
- Git Bash

```
gh repo clone Odeyiany2/Academic-Research-Assistant-RAG-Project
```
- Create a virtual environment, activate it and install the requirements.txt
```
pythom -m env venv
```

```
venv\Scripts\activate
```

```
pip install -r requirements.txt
```
- Run the app.py file
```
uvicorn app:app --host 127.0.0.1 --port 5000 --reload
```
- Run the streamlit_app.py file
```
streamlit run streamlit_app.py
```

**Note: Ensure to create a file to store your API keys and access them.**

## 🚀 Future Considerations to Build a more Robust RAG
- Training with more large files to help the LLM
- Ensuring the RAG can take larger files from users and efficiently summarize important details.
- Possibility of expanding the number of courses
- Getting access to more LLMS as the last two on the Model section to get better comparisons on how different models perform.

### Streamlit Demo Video
https://github.com/user-attachments/assets/4f8720e8-b2ba-40e3-95ad-2ad3de958da5