https://github.com/hlexnc/project-arepo
Data-driven stroke risk assessment & personalized recommendations, powered by machine-learning and an NLU-driven chatbot.
https://github.com/hlexnc/project-arepo
chatbot data-analysis docker docker-compose machine-learning nlu-chatbot python rasa scikit-learn sklearn streamlit
Last synced: 4 months ago
JSON representation
Data-driven stroke risk assessment & personalized recommendations, powered by machine-learning and an NLU-driven chatbot.
- Host: GitHub
- URL: https://github.com/hlexnc/project-arepo
- Owner: HlexNC
- License: gpl-3.0
- Created: 2024-10-08T08:40:29.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-14T22:02:38.000Z (over 1 year ago)
- Last Synced: 2025-08-13T20:21:35.570Z (10 months ago)
- Topics: chatbot, data-analysis, docker, docker-compose, machine-learning, nlu-chatbot, python, rasa, scikit-learn, sklearn, streamlit
- Language: Python
- Homepage:
- Size: 86.1 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Project Apero is a comprehensive web application—originally developed as a university project—designed to help individuals assess their stroke risk and receive personalized recommendations. It leverages data analysis, machine learning, and a conversational chatbot interface to provide actionable health insights.
---
## Table of Contents
- [Project Description](#project-description)
- [Screenshots](#screenshots)
- [Key Features](#key-features)
- [Installation](#installation)
- [Prerequisites](#prerequisites)
- [Steps to Get Started](#steps-to-get-started)
- [Training the Rasa Chatbot (Optional)](#training-the-rasa-chatbot-optional)
- [Data Overview](#data-overview)
- [Dataset](#dataset)
- [Data Handling and Augmentation](#data-handling-and-augmentation)
- [Usage](#usage)
- [Personalized Stroke Risk Assessment](#personalized-stroke-risk-assessment)
- [Interactive Data Analysis](#interactive-data-analysis)
- [Chatbot Assistance](#chatbot-assistance)
- [Project Structure](#project-structure)
- [Disclaimer](#disclaimer)
- [Repository Visualization](#repository-visualization)
- [License](#license)
- [Contact](#contact)
---
## Project Description
**Project Apero** provides an intuitive interface for users to input personal health metrics—such as age, gender, BMI, blood glucose levels, etc.—and receive an estimated probability of experiencing a stroke. It then offers evidence-based recommendations and insights to help reduce stroke risk.
The application is built using:
- **Streamlit** for the web front end
- **Scikit-Learn** for machine learning model training and prediction
- **Rasa** for an integrated chatbot
- **Docker** for containerization and ease of deployment
> **Why "Apero"?**
> The project name is derived from a playful acronym referencing a healthy approach to “Assessing Personal Event Risks Online,” aiming to make preventative health measures more approachable.
---
## Screenshots

---
## Key Features
1. **Personalized Stroke Risk Assessment**
- **Input Health Metrics**: Users provide personal data (e.g., age, gender, BMI, blood glucose levels, and more).
- **Risk Probability**: The machine learning model calculates a stroke risk score (0.0 to 1.0).
- **Actionable Recommendations**: Users receive suggestions tailored to their risk factors.
2. **Interactive Data Analysis & Visualization**
- **Data Exploration**: Filter and explore health-related metrics to reveal trends and correlations.
- **Visual Insights**: Interactive charts, heatmaps, and summary statistics enhance comprehension of stroke risk factors.
3. **Conversational Chatbot Assistance**
- **Real-Time Queries**: A Rasa-powered chatbot answers questions about stroke risk, data insights, and wellness tips.
- **Guided Interaction**: The chatbot can direct users to relevant pages, simplify data analysis steps, and deliver personalized feedback.
4. **Robust Data Handling and Augmentation**
- **Outlier Detection**: Statistical methods to detect and remove anomalies.
- **Synthetic Data Generation**: Augments the dataset by ~30% to improve model performance and generalization.
---
## Installation
### Prerequisites
- **Docker**
Make sure [Docker](https://www.docker.com/) is installed and running on your machine.
- **Git**
Required to clone the repository.
- **Python 3.9+ (Optional)**
If you want to run or modify components without Docker.
### Steps to Get Started
1. **Clone the Repository**
```bash
git clone https://github.com/HlexNC/Project-Arepo.git
cd Project-Arepo
```
2. **Build and Run Docker Services**
```bash
docker-compose up --build
```
- This command builds Docker images, trains machine learning models, and starts all necessary services.
- The initial startup may take several minutes as the environment sets up.
3. **Access the Web Application**
- Open a web browser and go to: [http://localhost:8501](http://localhost:8501).
- You will see the main page with navigation options for data analysis, personalized recommendations, and chatbot interaction.
> [!TIP]
> If you want to run or modify components without Docker, refer to the comments in the `docker-compose.yml` or optional instructions in the [`docs/`](./docs) folder (if provided).
### Training the Rasa Chatbot (Optional)
If you make changes to the chatbot’s training data or if the chatbot fails to respond:
1. **Open a terminal inside the Rasa server container**
```bash
docker exec -it rasa_server bash
```
2. **Train the Rasa Model**
```bash
rasa train
```
3. **Restart Services**
```bash
exit
docker-compose down
docker-compose up --build -d
```
---
## Data Overview
### Dataset
We utilize the publicly available [Stroke Prediction Dataset](https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset). This dataset includes key health metrics:
- Age, BMI, Glucose Levels
- Hypertension, Heart Disease
- Smoking Status, etc.
> [!NOTE]
> This dataset is used strictly for educational and demonstration purposes.
### Data Handling and Augmentation
- **Outlier Detection**: Employs a Z-score based method to filter anomalous data points.
- **Synthetic Data Augmentation**: Adds ~30% realistic synthetic data to boost model robustness and representation.
- **Feature Scaling**: Numerical features are normalized or standardized, aligning with best practices from [Google’s ML Crash Course](https://developers.google.com/machine-learning/crash-course).
---
## Usage
### Personalized Stroke Risk Assessment
1. **Navigate to "Personalized Recommendations"**
From the sidebar, select **Personalized Recommendations**.
2. **Provide Personal Health Metrics**
Enter details such as age, gender, BMI, glucose levels, hypertension status, etc.
3. **Receive Risk Probability & Recommendations**
The system displays a **stroke risk** (0.0 to 1.0) and personalized tips (e.g., diet, exercise, medical follow-up).
### Interactive Data Analysis
1. **Go to "Data Analysis"**
Choose **Data Analysis** from the sidebar.
2. **Explore and Visualize**
- Filter or query the dataset to inspect correlations and distributions.
- View generated charts, tables, or heatmaps.
3. **Model Training & Evaluation**
Train or re-train the underlying machine learning models (Logistic Regression, Random Forest, SVM) within the app. Evaluate performance via metrics like Accuracy, Precision, Recall, etc.
### Chatbot Assistance
1. **Select "Chatbot"**
Click on **Chatbot** in the sidebar to open the conversational assistant interface.
2. **Interact with the Rasa-Powered Chatbot**
- Ask questions about stroke risk or how to interpret certain metrics.
- Get immediate recommendations and clarifications on data analysis results.
---
## Project Structure
Below is a simplified overview of the repository layout:
```bash
Project-Arepo/
├── actions/
│ ├── actions.py # Custom Rasa actions
│ ├── Dockerfile
│ └── requirements-actions.txt
├── data/
│ ├── data_analysis.py
│ ├── data_augmentation.py
│ ├── data_loader.py
│ ├── data_preprocessor.py
│ ├── raw/
│ │ └── healthcare-dataset-stroke-data.csv
│ └── processed/
├── src/
│ ├── app.py # Streamlit main entry point
│ ├── chatbot/
│ │ └── rasa_chatbot.py
│ └── web/
│ ├── home.py
│ ├── recommendations.py
│ ├── chatbot_page.py
│ └── data_analysis_page.py
├── models/
├── docs/
│ ├── img/
│ │ └── ASP_Banner.png
│ └── ...
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── LICENSE
└── README.md
```
---
> [!WARNING]
> **Project Apero is not a substitute for professional medical advice, diagnosis, or treatment.**
> The stroke risk assessments and recommendations are for **educational and research** purposes only. Always seek the advice of qualified healthcare professionals for any medical concerns.
> Use of this application is entirely at your own risk, and the developers assume **no liability** for any actions taken based on its output.
---
## Repository Visualization
Below is an automatically generated repository structure diagram. The diagram is updated whenever changes are pushed to the main branch:
---
## License
This project is licensed under the **GNU General Public License v3**.
See the [LICENSE](LICENSE) file for details.
---
## Contact
For inquiries, feedback, or suggestions:
- **GitHub Issues**: [Project Apero Repository](https://github.com/HlexNC/Project-Arepo/issues)
- **Email**: Please refer to the repository maintainers’ profiles.
We welcome contributions! Feel free to open a pull request or start a discussion in our GitHub repository.