https://github.com/priyam-hub/medicare-bot
MediCare-Bot provides clear, reliable health information by combining trusted medical sources with smart search and AI. It makes medical queries easy to understand and accessible for everyone.
https://github.com/priyam-hub/medicare-bot
chatbot groq-api langchain llama3 medical pinecone-db retrieval-augmented-generation sentence-transformers
Last synced: 3 months ago
JSON representation
MediCare-Bot provides clear, reliable health information by combining trusted medical sources with smart search and AI. It makes medical queries easy to understand and accessible for everyone.
- Host: GitHub
- URL: https://github.com/priyam-hub/medicare-bot
- Owner: priyam-hub
- License: apache-2.0
- Created: 2025-05-05T09:32:56.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-07-16T05:28:49.000Z (3 months ago)
- Last Synced: 2025-07-17T08:06:14.003Z (3 months ago)
- Topics: chatbot, groq-api, langchain, llama3, medical, pinecone-db, retrieval-augmented-generation, sentence-transformers
- Language: Jupyter Notebook
- Homepage:
- Size: 10.7 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π€ MediCare-Bot β Your trusted health companion.
[](https://www.python.org/)
[](https://flask.palletsprojects.com/)
[](LICENSE)
[](https://ai.meta.com/llama/)
[](https://www.docker.com/)*Supporting patients with trusted answers, one conversation at a time.*
[Features](#-features) β’ [Installation](#-installation) β’ [Usage](#-usage) β’ [Tech Stack](#-tech-stack) β’ [License](#-license) β’ [Contact](#-contact)
---
## π Overview
**MediCare-Bot** is designed to assist users with reliable and easy-to-understand health information. It learns from trusted medical books and uses intelligent search techniques to find the most relevant content. By combining smart retrieval with a powerful language model, the chatbot offers clear and accurate responses to medical queries, making healthcare information more accessible and conversational for everyone.
---
## π Dataset
**Source:** [Medical Book Dataset β Kaggle](https://www.kaggle.com/datasets/abhirajmandal/medical-book)
#### π Description
This dataset contains textual content extracted from medical books, designed to provide structured and informative knowledge on a wide range of medical topics. It includes concise explanations, definitions, symptoms, causes, and treatments for various diseases and conditions. The dataset is ideal for building knowledge-driven applications like medical chatbots, as it captures essential clinical and healthcare-related information in a digestible format.
**Use Case:**
The dataset serves as the foundational knowledge base for the chatbot, allowing it to generate helpful and accurate responses to user queries by referencing real medical literature.---
## π Features
* **Patient-Centric Design**
- Focused on delivering helpful and compassionate responses tailored to user needs.* **Trusted Medical Knowledge**
- Built using information from reliable medical books to ensure accuracy.* **Natural Conversations**
- Understands and responds in a human-like, conversational tone.* **Instant Answers**
- Provides quick and relevant responses to a wide range of medical questions.* **Context-Aware**
- Remembers and considers the context of your query for better responses.* **User-Friendly Interface**
- Simple and clean chat interface that's easy to use for everyone.* **Secure and Private**
- No personal data is stored or sharedβensuring privacy and confidentiality.---
## π οΈ Installation
#### Step - 1: Repository Cloning
```bash
# Clone the repository
git clone https://github.com/priyam-hub/MediCare-Bot.git# Navigate into the directory
cd MediCare-Bot
```#### Step - 2: Enviornmental Setup and Dependency Installation
```bash
# Run env_setup.sh
bash env_setup.sh# Select 1 to create Python Environment
# Select 2 to create Conda Environment# Python Version - 3.10
# Make the Project to run as a Local Package
python setup.py
```#### Step - 3: Create a .env file in the root directory to add Credentials
```bash
PINECONE_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
GROQ_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
```#### Step - 4: Store the Vector Embeddings & Initialize RAG
```bash
# Run the Main Python Script
python main.py
```#### Step - 5: Run the Flask Server
```bash
# Run the Web App using Flask Server
python web/app.py
```Upon running, navigate to the provided local URL in your browser to interact with the MediCare-Bot
---
## βοΈ Technology Stack
* **Python** β Core programming language used for building the backend.
π [Install Python](https://www.python.org/downloads/)* **PyTorch** β Deep learning framework used under the hood for embedding models and LLMs.
π [Install PyTorch](https://pytorch.org/get-started/locally/)* **Transformers (Hugging Face)** β Used to load and manage pre-trained embedding models.
π [Transformers Documentation](https://huggingface.co/docs/transformers/index)* **LangChain** β Framework for building applications with LLMs, supporting RAG and vector search.
π [LangChain Installation Guide](https://docs.langchain.com/docs/get_started/installation)* **Flask** β Lightweight web framework to deploy the chatbot as a web API.
π [Flask Installation](https://flask.palletsprojects.com/en/latest/installation/)---
## π§ Artificial Intelligence Models Stack
* **Vector Embedding β `sentence-transformers/all-MiniLM-L6-v2`**
Lightweight and efficient embedding model for generating semantic vector representations of medical texts.
π [Model on Hugging Face](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)* **Vector Database β Pinecone**
High-performance vector store to manage and search large-scale embedding data.
π [Pinecone Documentation](https://docs.pinecone.io/docs/overview)* **Information Retrieval β RAG (Retrieval-Augmented Generation)**
Augments LLM responses by retrieving relevant content from medical embeddings.
π [RAG on LangChain](https://docs.langchain.com/docs/components/retrievers/)* **Inference Engine β Groq**
Ultra-fast inference platform to run LLaMa-3 and generate real-time answers.
π [Groq API](https://groq.com/)* **Large Language Model β LLaMa-3 8B**
Advanced open-source language model from Meta used for understanding medical context and generating responses.
π [LLaMa 3 (Meta)](https://ai.meta.com/llama/)* **API Integration β Flask**
Handles user requests and chatbot interactions through a simple web API.
π [Flask Official Docs](https://flask.palletsprojects.com/en/latest/)* **Environmental Image β Docker**
Ensures consistent, portable, and isolated runtime environment for the chatbot application.
π [Docker Installation](https://docs.docker.com/get-docker/)* **Deployment β EC2 Instance**
Scalable cloud server to host and serve the medical chatbot application.
π [Amazon EC2](https://aws.amazon.com/ec2/)---
## π§ Large Language Model Inference Comparison Chart
| **Model ID** | **Requests/Min** | **Requests/Day** | **Tokens/Min** | **Tokens/Day** |
| ----------------------------------------------- | ---------------- | ---------------- | -------------- | -------------- |
| `allam-2-7b` | 30 | 7,000 | 6,000 | No limit |
| `compound-beta` | 15 | 200 | 70,000 | No limit |
| `deepseek-r1-distill-llama-70b` | 30 | 1,000 | 6,000 | No limit |
| `gemma2-9b-it` | 30 | 14,400 | 15,000 | 500,000 |
| `llama3-70b-8192` | 30 | 14,400 | 6,000 | 500,000 |
| `llama3-8b-8192` | 30 | 14,400 | 6,000 | 500,000 |
| `mistral-saba-24b` | 30 | 1,000 | 6,000 | 500,000 |
| `qwen-qwq-32b` | 30 | 1,000 | 6,000 | No limit |---
## π Success Rates of Different Meta-LlaMa Models
| **Category** | **Benchmark** | **LLaMA 3 8B** | **LLaMA 2 7B** | **LLaMA 2 13B** | **LLaMA 3 70B** | **LLaMA 2 70B** |
| ------------------------- | ---------------------------- | -------------- | -------------- | --------------- | --------------- | --------------- |
| **General** | MMLU (5-shot) | 66.6 | 45.7 | 53.8 | 79.5 | 69.7 |
| | AGIEval English (3β5 shot) | 45.9 | 28.8 | 38.7 | 63.0 | 54.8 |
| | CommonSenseQA (7-shot) | 72.6 | 57.6 | 67.6 | 83.8 | 78.7 |
| | Winogrande (5-shot) | 76.1 | 73.3 | 75.4 | 83.1 | 81.8 |
| | BIG-Bench Hard (3-shot, CoT) | 61.1 | 38.1 | 47.0 | 81.3 | 65.7 |
| | ARC-Challenge (25-shot) | 78.6 | 53.7 | 67.6 | 93.0 | 85.3 |
| **Knowledge Reasoning** | TriviaQA-Wiki (5-shot) | 78.5 | 72.1 | 79.6 | 89.7 | 87.5 |
| **Reading Comprehension** | SQuAD (1-shot) | 76.4 | 72.2 | 72.1 | 85.6 | 82.6 |
| | QuAC (1-shot, F1) | 44.4 | 39.6 | 44.9 | 51.1 | 49.4 |
| | BoolQ (0-shot) | 75.7 | 65.5 | 66.9 | 79.0 | 73.1 |
| | DROP (3-shot, F1) | 58.4 | 37.9 | 49.8 | 79.7 | 70.2 |---
## π Project Structure
```plaintext
MediCare-Bot/
βββ .docker_ignore # Ignoring Files in Docker
βββ .env # Store the Pinecone and Groq Credentials
βββ .gitignore # Ignoring files for Git
βββ Dockerfile # Stored the Docker Setup
βββ env_setup.sh # Package installation configuration
βββ folder_structure.py # Contains the Project Folder Structure
βββ LICENCE # MIT License
βββ main.py # Store Vector Embeddings and Initialize the RAG
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ setup.py # Create the Project as Python Package
βββ config/ # Configuration files
β βββ __init__.py
β βββ config.py/ # All Configuration Variables of Pipeline
βββ data/ # Data directory
β βββ Medical_Book.pdf # Medical Book Dataset used for Vector Embedding
βββ notebooks/ # Jupyter notebooks for experimentation
β βββ Medical_Chatbot.ipynb # Experimented Chatbot in Jupyter Notebook
βββ src/ # Source code
β βββ llm/ # Large Language Model Directory
β β βββ __init__.py
β β βββ llm_builder.py # Python file to Build the LLM using ChatGroq
β βββ prompts/ # System Prompt Directory
β β βββ __init__.py
β β βββ prompt_builder.py # Python file Build the Prompt for LLM Inference
β βββ text_splitting/ # Text Splitting Directory
β β βββ __init__.py
β β βββ split_text.py # Python File to Split the Text
β βββ vector_index/ # Vector Index Directory
β β βββ __init__.py
β β βββ index_manager.py # Python file to create index in Vector DB
β βββ utils/ # Utility Functions Directory
β βββ __init__.py
β βββ download_embeddings.py # Download the Embedding Model
β βββ load_pdf.py # Load the PDF for Embedding
β βββ logger.py # Logger Setup
βββ web/
βββ __init__.py
βββ animations/
β βββ chatbot.json # Chatbot Animations
β βββ doctor.json # Doctor Animations
βββ static/
β βββ style.css # Styling of the Web Page
βββ templates/
β βββ chat.html # Default Web Page
βββ app.py/ # To run the flask server
```
---## π οΈ Future Work Roadmap
### **πΉ Phase 1: Feature Expansion (1β2 Months)**
**Goal**: Enhance user experience and improve core functionality.
* Integrate symptom checker with dynamic conversation flow.
* Add multilingual support for broader accessibility.
* Implement feedback mechanism for users to rate chatbot responses.
* Improve handling of edge cases in medical queries.### **πΉ Phase 2: Clinical Reliability & Compliance (3β4 Months)**
**Goal**: Increase medical accuracy and build trust with regulatory alignment.
* Collaborate with healthcare professionals for validation of responses.
* Include references to verified medical literature in answers.
* Ensure HIPAA/GDPR compliance for secure data handling.
* Add a disclaimer and escalation option for complex queries.### **πΉ Phase 3: Real-Time Integration & Monitoring (5β6 Months)**
**Goal**: Make the chatbot production-ready for deployment in real environments.
* Integrate with wearable health devices and EMR/EHR systems.
* Deploy monitoring system to track chatbot performance and anomalies.
* Introduce voice interaction using Whisper or similar model.
* Host full system with CI/CD on scalable cloud infrastructure (e.g., AWS/GCP).---
## π License
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more details.
---
**Made by Priyam Pal - AI and Data Science Engineer**
[β Back to Top](#-affective-ai--understanding-emotions-through-text)