https://github.com/shibbir-ahmad24/ms-final-project-on-llm-rag-powered-medical-chatbot

Developing an LLM-RAG Powered Medical Chatbot for Clinical Question Answering
https://github.com/shibbir-ahmad24/ms-final-project-on-llm-rag-powered-medical-chatbot

chatbot clinical-nlp llm mimic-iv rag web-application

Last synced: 8 months ago
JSON representation

Developing an LLM-RAG Powered Medical Chatbot for Clinical Question Answering

Host: GitHub
URL: https://github.com/shibbir-ahmad24/ms-final-project-on-llm-rag-powered-medical-chatbot
Owner: shibbir-ahmad24
Created: 2025-02-27T07:14:12.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-03-04T00:18:22.000Z (8 months ago)
Last Synced: 2025-03-04T01:23:39.121Z (8 months ago)
Topics: chatbot, clinical-nlp, llm, mimic-iv, rag, web-application
Homepage:
Size: 275 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# MS Final Project on Medical Chatbot

# Project Title

Developing an **LLM-RAG Powered Medical Chatbot** for Clinical Question Answering

# Overview

Healthcare professionals and researchers often need quick and reliable insights from medical records, especially discharge summaries, which contain critical patient information. This project aims to build an **AI-powered medical chatbot** that leverages **Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs)** to answer clinical questions based on real-world patient discharge notes from the **MIMIC-IV dataset**. By integrating **advanced NLP techniques**, this chatbot will provide meaningful, **context-aware responses** to help healthcare professionals, researchers, and even patients understand complex medical summaries efficiently.

# Problem Statement

Medical discharge summaries are **dense, complex, and filled with medical jargon,** making it challenging for both healthcare providers and patients to extract relevant insights quickly. Traditional chatbots often:
- Struggle with **hallucinations (false or misleading responses).**
- Fail to retrieve **accurate, case-specific knowledge** from structured datasets.
- Lack **domain-specific adaptation** to medical terminology and discharge notes.

This project aims to answer the following key questions:
- **How can we ensure chatbot responses are grounded in real clinical discharge notes?**
- **Does integrating a RAG pipeline improve response accuracy compared to a traditional LLM chatbot?**
- **Do any versions of the chatbot hallucinate or provide false information?**
- **How can we effectively fetch and utilize MIMIC-IV discharge summaries for conditions like Heart Attack & Kidney Disease to improve chatbot reliability?**

# Objective

The primary goal is to develop and evaluate a **medical chatbot** that:

- **Implements a RAG-based pipeline** to fetch relevant discharge notes before generating responses.
- **Supports clinical question answering** for **Heart Attack & Kidney Disease**, ensuring responses are grounded in real clinical data.
- **Minimizes hallucinations and false information** by comparing **RAG vs. non-RAG chatbot responses**.
- **Fetches and processes MIMIC-IV discharge summaries** efficiently.
- **Deploys using Hugging Face Spaces or Streamlit**, integrating **VectorDB** for efficient data retrieval and response generation.

By the end of this project, the chatbot should enhance **medical decision-making** and act as a **reliable AI assistant** in healthcare settings.

# Data Source

- discharge note dataset from MIMIC-IV DB: https://physionet.org/content/mimic-iv-note/2.2/note/
- clinical trials from ClinicalTrials.gov API: https://clinicaltrials.gov/data-api/api

# Working Steps

1. Discharge Notes Collection from MIMIC-IV Database
2. Clinical Trials Retrieval from ClinicalTrials.gov API
3. Data Preprocessing (Clean, Segment, and Categorize)
4. Handling User Queries for Clinical Question Answering
- to design and optimize the system for answering Heart Attack and Kidney Disease-related queries
5. Vectorization using BioBERT Embedding Model
6. Vectorstore Setup for Efficient Retrieval
7. Query Handling with RAG
- to implement RAG pipeline to retrieve relevant information based on user query
8. Response Generation using LLM
- to generate responses based on the retrieved data, ensuring clarity, accuracy, and relevance in answering the user’s query.
10. Evaluation by Comparing RAG and Non-RAG Based LLM Models
11. User Interface Design and Deployment
- to design and deploy a clean, intuitive interface using Streamlit, enabling healthcare professionals to easily interact with the chatbot.

# RAG Pipeline

![RAG Pipeline](https://github.com/shibbir-ahmad24/MS-Final-Project-on-LLM-RAG-Powered-Medical-Chatbot/blob/main/RAG-Pipeline2.jpg)

# Tech Stack

- Python (Core programming language)
- Hugging Face Transformers (LLM integration)
- Faiss / ChromaDB (Vector database for retrieval)
- SQL (Structured data management)
- Jupyter Notebook (Exploration & prototyping)
- Streamlit (Deployment & UI)
- Docker (Containerization for deployment)

# Project Deadline

April 30, 2025

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shibbir-ahmad24/ms-final-project-on-llm-rag-powered-medical-chatbot

Awesome Lists containing this project

README