An open API service indexing awesome lists of open source software.

https://github.com/priyam-hub/medicare-bot

MediCare-Bot provides clear, reliable health information by combining trusted medical sources with smart search and AI. It makes medical queries easy to understand and accessible for everyone.
https://github.com/priyam-hub/medicare-bot

chatbot groq-api langchain llama3 medical pinecone-db retrieval-augmented-generation sentence-transformers

Last synced: 3 months ago
JSON representation

MediCare-Bot provides clear, reliable health information by combining trusted medical sources with smart search and AI. It makes medical queries easy to understand and accessible for everyone.

Awesome Lists containing this project

README

          

# πŸ€– MediCare-Bot β€” Your trusted health companion.

[![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)](https://www.python.org/)
[![Flask](https://img.shields.io/badge/Flask-2.0+-orange.svg)](https://flask.palletsprojects.com/)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![LLaMA 3](https://img.shields.io/badge/LLaMA-3%208B-purple.svg)](https://ai.meta.com/llama/)
[![Docker](https://img.shields.io/badge/Docker-Containerized-blue)](https://www.docker.com/)

*Supporting patients with trusted answers, one conversation at a time.*

[Features](#-features) β€’ [Installation](#-installation) β€’ [Usage](#-usage) β€’ [Tech Stack](#-tech-stack) β€’ [License](#-license) β€’ [Contact](#-contact)

---

## 🌟 Overview

**MediCare-Bot** is designed to assist users with reliable and easy-to-understand health information. It learns from trusted medical books and uses intelligent search techniques to find the most relevant content. By combining smart retrieval with a powerful language model, the chatbot offers clear and accurate responses to medical queries, making healthcare information more accessible and conversational for everyone.

---

## πŸ“š Dataset

**Source:** [Medical Book Dataset – Kaggle](https://www.kaggle.com/datasets/abhirajmandal/medical-book)

#### πŸ” Description

This dataset contains textual content extracted from medical books, designed to provide structured and informative knowledge on a wide range of medical topics. It includes concise explanations, definitions, symptoms, causes, and treatments for various diseases and conditions. The dataset is ideal for building knowledge-driven applications like medical chatbots, as it captures essential clinical and healthcare-related information in a digestible format.

**Use Case:**
The dataset serves as the foundational knowledge base for the chatbot, allowing it to generate helpful and accurate responses to user queries by referencing real medical literature.

---

## πŸ“Œ Features

* **Patient-Centric Design**
- Focused on delivering helpful and compassionate responses tailored to user needs.

* **Trusted Medical Knowledge**
- Built using information from reliable medical books to ensure accuracy.

* **Natural Conversations**
- Understands and responds in a human-like, conversational tone.

* **Instant Answers**
- Provides quick and relevant responses to a wide range of medical questions.

* **Context-Aware**
- Remembers and considers the context of your query for better responses.

* **User-Friendly Interface**
- Simple and clean chat interface that's easy to use for everyone.

* **Secure and Private**
- No personal data is stored or sharedβ€”ensuring privacy and confidentiality.

---

## πŸ› οΈ Installation

#### Step - 1: Repository Cloning

```bash
# Clone the repository
git clone https://github.com/priyam-hub/MediCare-Bot.git

# Navigate into the directory
cd MediCare-Bot
```

#### Step - 2: Enviornmental Setup and Dependency Installation

```bash
# Run env_setup.sh
bash env_setup.sh

# Select 1 to create Python Environment
# Select 2 to create Conda Environment

# Python Version - 3.10

# Make the Project to run as a Local Package
python setup.py
```

#### Step - 3: Create a .env file in the root directory to add Credentials

```bash
PINECONE_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
GROQ_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
```

#### Step - 4: Store the Vector Embeddings & Initialize RAG

```bash

# Run the Main Python Script
python main.py
```

#### Step - 5: Run the Flask Server

```bash

# Run the Web App using Flask Server
python web/app.py
```

Upon running, navigate to the provided local URL in your browser to interact with the MediCare-Bot

---

## βš™οΈ Technology Stack

* **Python** – Core programming language used for building the backend.
πŸ”— [Install Python](https://www.python.org/downloads/)

* **PyTorch** – Deep learning framework used under the hood for embedding models and LLMs.
πŸ”— [Install PyTorch](https://pytorch.org/get-started/locally/)

* **Transformers (Hugging Face)** – Used to load and manage pre-trained embedding models.
πŸ”— [Transformers Documentation](https://huggingface.co/docs/transformers/index)

* **LangChain** – Framework for building applications with LLMs, supporting RAG and vector search.
πŸ”— [LangChain Installation Guide](https://docs.langchain.com/docs/get_started/installation)

* **Flask** – Lightweight web framework to deploy the chatbot as a web API.
πŸ”— [Flask Installation](https://flask.palletsprojects.com/en/latest/installation/)

---

## 🧠 Artificial Intelligence Models Stack

* **Vector Embedding – `sentence-transformers/all-MiniLM-L6-v2`**
Lightweight and efficient embedding model for generating semantic vector representations of medical texts.
πŸ”— [Model on Hugging Face](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)

* **Vector Database – Pinecone**
High-performance vector store to manage and search large-scale embedding data.
πŸ”— [Pinecone Documentation](https://docs.pinecone.io/docs/overview)

* **Information Retrieval – RAG (Retrieval-Augmented Generation)**
Augments LLM responses by retrieving relevant content from medical embeddings.
πŸ”— [RAG on LangChain](https://docs.langchain.com/docs/components/retrievers/)

* **Inference Engine – Groq**
Ultra-fast inference platform to run LLaMa-3 and generate real-time answers.
πŸ”— [Groq API](https://groq.com/)

* **Large Language Model – LLaMa-3 8B**
Advanced open-source language model from Meta used for understanding medical context and generating responses.
πŸ”— [LLaMa 3 (Meta)](https://ai.meta.com/llama/)

* **API Integration – Flask**
Handles user requests and chatbot interactions through a simple web API.
πŸ”— [Flask Official Docs](https://flask.palletsprojects.com/en/latest/)

* **Environmental Image – Docker**
Ensures consistent, portable, and isolated runtime environment for the chatbot application.
πŸ”— [Docker Installation](https://docs.docker.com/get-docker/)

* **Deployment – EC2 Instance**
Scalable cloud server to host and serve the medical chatbot application.
πŸ”— [Amazon EC2](https://aws.amazon.com/ec2/)

---

## 🧠 Large Language Model Inference Comparison Chart

| **Model ID** | **Requests/Min** | **Requests/Day** | **Tokens/Min** | **Tokens/Day** |
| ----------------------------------------------- | ---------------- | ---------------- | -------------- | -------------- |
| `allam-2-7b` | 30 | 7,000 | 6,000 | No limit |
| `compound-beta` | 15 | 200 | 70,000 | No limit |
| `deepseek-r1-distill-llama-70b` | 30 | 1,000 | 6,000 | No limit |
| `gemma2-9b-it` | 30 | 14,400 | 15,000 | 500,000 |
| `llama3-70b-8192` | 30 | 14,400 | 6,000 | 500,000 |
| `llama3-8b-8192` | 30 | 14,400 | 6,000 | 500,000 |
| `mistral-saba-24b` | 30 | 1,000 | 6,000 | 500,000 |
| `qwen-qwq-32b` | 30 | 1,000 | 6,000 | No limit |

---

## πŸ“Š Success Rates of Different Meta-LlaMa Models

| **Category** | **Benchmark** | **LLaMA 3 8B** | **LLaMA 2 7B** | **LLaMA 2 13B** | **LLaMA 3 70B** | **LLaMA 2 70B** |
| ------------------------- | ---------------------------- | -------------- | -------------- | --------------- | --------------- | --------------- |
| **General** | MMLU (5-shot) | 66.6 | 45.7 | 53.8 | 79.5 | 69.7 |
| | AGIEval English (3–5 shot) | 45.9 | 28.8 | 38.7 | 63.0 | 54.8 |
| | CommonSenseQA (7-shot) | 72.6 | 57.6 | 67.6 | 83.8 | 78.7 |
| | Winogrande (5-shot) | 76.1 | 73.3 | 75.4 | 83.1 | 81.8 |
| | BIG-Bench Hard (3-shot, CoT) | 61.1 | 38.1 | 47.0 | 81.3 | 65.7 |
| | ARC-Challenge (25-shot) | 78.6 | 53.7 | 67.6 | 93.0 | 85.3 |
| **Knowledge Reasoning** | TriviaQA-Wiki (5-shot) | 78.5 | 72.1 | 79.6 | 89.7 | 87.5 |
| **Reading Comprehension** | SQuAD (1-shot) | 76.4 | 72.2 | 72.1 | 85.6 | 82.6 |
| | QuAC (1-shot, F1) | 44.4 | 39.6 | 44.9 | 51.1 | 49.4 |
| | BoolQ (0-shot) | 75.7 | 65.5 | 66.9 | 79.0 | 73.1 |
| | DROP (3-shot, F1) | 58.4 | 37.9 | 49.8 | 79.7 | 70.2 |

---

## πŸ“ Project Structure

```plaintext
MediCare-Bot/
β”œβ”€β”€ .docker_ignore # Ignoring Files in Docker
β”œβ”€β”€ .env # Store the Pinecone and Groq Credentials
β”œβ”€β”€ .gitignore # Ignoring files for Git
β”œβ”€β”€ Dockerfile # Stored the Docker Setup
β”œβ”€β”€ env_setup.sh # Package installation configuration
β”œβ”€β”€ folder_structure.py # Contains the Project Folder Structure
β”œβ”€β”€ LICENCE # MIT License
β”œβ”€β”€ main.py # Store Vector Embeddings and Initialize the RAG
β”œβ”€β”€ README.md # Project documentation
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ setup.py # Create the Project as Python Package
β”œβ”€β”€ config/ # Configuration files
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── config.py/ # All Configuration Variables of Pipeline
β”œβ”€β”€ data/ # Data directory
β”‚ └── Medical_Book.pdf # Medical Book Dataset used for Vector Embedding
β”œβ”€β”€ notebooks/ # Jupyter notebooks for experimentation
β”‚ └── Medical_Chatbot.ipynb # Experimented Chatbot in Jupyter Notebook
β”œβ”€β”€ src/ # Source code
β”‚ β”œβ”€β”€ llm/ # Large Language Model Directory
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── llm_builder.py # Python file to Build the LLM using ChatGroq
β”‚ β”œβ”€β”€ prompts/ # System Prompt Directory
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── prompt_builder.py # Python file Build the Prompt for LLM Inference
β”‚ β”œβ”€β”€ text_splitting/ # Text Splitting Directory
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── split_text.py # Python File to Split the Text
β”‚ β”œβ”€β”€ vector_index/ # Vector Index Directory
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── index_manager.py # Python file to create index in Vector DB
β”‚ └── utils/ # Utility Functions Directory
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ download_embeddings.py # Download the Embedding Model
β”‚ β”œβ”€β”€ load_pdf.py # Load the PDF for Embedding
β”‚ └── logger.py # Logger Setup
└── web/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ animations/
β”‚ β”œβ”€β”€ chatbot.json # Chatbot Animations
β”‚ └── doctor.json # Doctor Animations
β”œβ”€β”€ static/
β”‚ β”œβ”€β”€ style.css # Styling of the Web Page
β”œβ”€β”€ templates/
β”‚ β”œβ”€β”€ chat.html # Default Web Page
└── app.py/ # To run the flask server

```
---

## πŸ› οΈ Future Work Roadmap

### **πŸ”Ή Phase 1: Feature Expansion (1–2 Months)**

**Goal**: Enhance user experience and improve core functionality.

* Integrate symptom checker with dynamic conversation flow.
* Add multilingual support for broader accessibility.
* Implement feedback mechanism for users to rate chatbot responses.
* Improve handling of edge cases in medical queries.

### **πŸ”Ή Phase 2: Clinical Reliability & Compliance (3–4 Months)**

**Goal**: Increase medical accuracy and build trust with regulatory alignment.

* Collaborate with healthcare professionals for validation of responses.
* Include references to verified medical literature in answers.
* Ensure HIPAA/GDPR compliance for secure data handling.
* Add a disclaimer and escalation option for complex queries.

### **πŸ”Ή Phase 3: Real-Time Integration & Monitoring (5–6 Months)**

**Goal**: Make the chatbot production-ready for deployment in real environments.

* Integrate with wearable health devices and EMR/EHR systems.
* Deploy monitoring system to track chatbot performance and anomalies.
* Introduce voice interaction using Whisper or similar model.
* Host full system with CI/CD on scalable cloud infrastructure (e.g., AWS/GCP).

---

## πŸ“œ License

This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more details.

---

**Made by Priyam Pal - AI and Data Science Engineer**

[↑ Back to Top](#-affective-ai--understanding-emotions-through-text)