https://github.com/oscartma/regression-project
Regression is a fundamental supervised machine learning technique used to predict continuous numerical outcomes based on input features.
https://github.com/oscartma/regression-project
mae mse r-squared
Last synced: 2 months ago
JSON representation
Regression is a fundamental supervised machine learning technique used to predict continuous numerical outcomes based on input features.
- Host: GitHub
- URL: https://github.com/oscartma/regression-project
- Owner: OscarTMa
- License: mit
- Created: 2024-11-15T15:07:56.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-11-22T08:25:54.000Z (11 months ago)
- Last Synced: 2025-04-02T02:45:41.333Z (6 months ago)
- Topics: mae, mse, r-squared
- Language: Jupyter Notebook
- Homepage:
- Size: 307 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Regression-Project
## Table of Contents
1. [Description](#description)
2. [Installation](#installation)
3. [Usage](#usage)
4. [Project Structure](#project-structure)
5. [Contributing](#contributing)
6. [License](#license)
7. [Workflows](#workflows)## Description
Regression is a fundamental supervised machine learning technique used to predict continuous numerical outcomes based on input features. This project focuses on applying regression to [The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.] using a structured dataset. The analysis is designed to provide insights into the relationships between input features and the target variable while also delivering an accurate predictive model.## Key Concepts Covered
1.Exploratory Data Analysis (EDA)
Understand the data through visualization, summary statistics, and correlation analysis.2.Data Preprocessing
- Handle missing values and outliers.
- Transform and encode categorical variables.
- Standardize or normalize numerical features.3.Modeling and Evaluation
- Experiment with different regression techniques (Linear Regression, Decision Trees, Gradient Boosting, etc.).
- Use metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) to evaluate model performance.4.Feature Importance and Interpretability
- Understand which features influence predictions the most.
- Visualize model predictions versus actual values.
## Project Goals
1.Build a robust regression model to predict [target variable, e.g., house prices].
2.Explore and visualize patterns in the data.
3.Highlight practical insights for stakeholders, such as the most influential factors affecting the target variable.
4.This project provides three distinct **workflows** to enable deployment and environment setup across various platforms. Each is designed for specific use cases## Technologies Used
- Python for data processing and modeling.
- Pandas and NumPy for data manipulation.
- Matplotlib and Seaborn for visualizations.
- Scikit-learn for machine learning algorithms and metrics.
- EC2 AWS
- Streamlit Cloud
- Ngrok## Installation
1. Clone this repository:
```bash
git clone https://github.com/oscar/Regression-Project.git## Usage
jupyter notebook notebooks/exploratory_analysis.ipynb## Project Structure
Regression-Project/
│
├── data/
│ ├── raw/
│ ├── processed/
│
├── notebooks/
│ ├── exploratory_analysis.ipynb
│ ├── regression_model.ipynb
│
├── scripts/
│ ├── data_preprocessing.py
│ ├── model_training.py
│ ├── evaluation.py
│
├── visuals/
│
├── README.md
├── requirements.txt
├── LICENSE
├── .gitignore
│ ├── workflows
│ ├── workflows_aws
│ ├── workflows_ngrok## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements.## License
This project is licensed under the MIT License. See the LICENSE file for more details.## Workflows
**Workflows: Automating Deployment and Setup**
This project provides three distinct workflows to enable deployment and environment setup across various platforms. Each is designed for specific use cases, as detailed below.1. Workflows/streamlit.yml: **Deploying to Streamlit Cloud**
This workflow facilitates automatic deployment of the application to Streamlit Cloud, a free hosting service specifically designed for Streamlit-based applications.**Key Features:**
- **Core File**: .github/workflows/streamlit.yml.
- Automatically installs dependencies from requirements.txt.
- Configures environment variables and prepares necessary files, such as kaggle.json, for data access.
- Ideal for simple and rapid deployments of Streamlit applications.
**Benefits:**
- Free hosting with public access.
- Streamlined management of dependencies and environment setup.2. workflows_ngrok/streamlit_ngrok_solution: **Running with Ngrok**
This workflow is intended to expose the local Streamlit application to the internet using Ngrok, a tool for creating secure HTTPS tunnels to localhost.**Key Features:**
- Core File: workflows_ngrok/streamlit_ngrok_solution.
- Configures Ngrok to create a tunnel and expose the application.
- **Currently Non-Functional** due to recent changes in Ngrok's tunneling and endpoint APIs.
**Current Limitations:**
- Ngrok updates have broken compatibility with the current implementation.
- Ongoing efforts aim to adapt this workflow to the latest Ngrok standards.
**Future Use:**
- This workflow has potential for temporary public exposure of local applications without a dedicated hosting platform.3. workflows_aws/ec2_AWS.txt: **Deploying to AWS EC2**
This workflow outlines the steps for deploying the application on an AWS EC2 instance. It is suited for scalable and customizable environments.**Key Steps:**
- **Launch an EC2 Instance:** Use the AWS Management Console or AWS CLI to create an instance.
- **User Data File:** Copy the commands in workflows_aws/ec2_AWS.txt and paste them into the "User Data" field during the instance setup. These commands:
- Install Python, pip, and Streamlit.
- Install dependencies from requirements.txt.
- Launch the application on the configured port.- **Benefits:**
- Full control over the runtime environment.
- Scalability to handle varying levels of traffic and usage.## Summary
Each workflow serves a distinct purpose:- Streamlit Cloud: For quick and easy deployment.
- Ngrok: For temporary public testing (under development).
- AWS EC2: For robust and scalable deployments.Follow the instructions in each workflow file to implement the desired deployment strategy.