https://github.com/oscartma/regression-project

Regression is a fundamental supervised machine learning technique used to predict continuous numerical outcomes based on input features.
https://github.com/oscartma/regression-project

mae mse r-squared

Last synced: 2 months ago
JSON representation

Regression is a fundamental supervised machine learning technique used to predict continuous numerical outcomes based on input features.

Host: GitHub
URL: https://github.com/oscartma/regression-project
Owner: OscarTMa
License: mit
Created: 2024-11-15T15:07:56.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-11-22T08:25:54.000Z (11 months ago)
Last Synced: 2025-04-02T02:45:41.333Z (6 months ago)
Topics: mae, mse, r-squared
Language: Jupyter Notebook
Homepage:
Size: 307 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Regression-Project

## Table of Contents
1. [Description](#description)
2. [Installation](#installation)
3. [Usage](#usage)
4. [Project Structure](#project-structure)
5. [Contributing](#contributing)
6. [License](#license)
7. [Workflows](#workflows)

## Description
Regression is a fundamental supervised machine learning technique used to predict continuous numerical outcomes based on input features. This project focuses on applying regression to [The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.] using a structured dataset. The analysis is designed to provide insights into the relationships between input features and the target variable while also delivering an accurate predictive model.

## Key Concepts Covered
1.Exploratory Data Analysis (EDA)
Understand the data through visualization, summary statistics, and correlation analysis.

2.Data Preprocessing

- Handle missing values and outliers.
- Transform and encode categorical variables.
- Standardize or normalize numerical features.

3.Modeling and Evaluation

- Experiment with different regression techniques (Linear Regression, Decision Trees, Gradient Boosting, etc.).
- Use metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) to evaluate model performance.

4.Feature Importance and Interpretability

- Understand which features influence predictions the most.
- Visualize model predictions versus actual values.

## Project Goals
1.Build a robust regression model to predict [target variable, e.g., house prices].
2.Explore and visualize patterns in the data.
3.Highlight practical insights for stakeholders, such as the most influential factors affecting the target variable.
4.This project provides three distinct **workflows** to enable deployment and environment setup across various platforms. Each is designed for specific use cases

## Technologies Used
- Python for data processing and modeling.
- Pandas and NumPy for data manipulation.
- Matplotlib and Seaborn for visualizations.
- Scikit-learn for machine learning algorithms and metrics.
- EC2 AWS
- Streamlit Cloud
- Ngrok

## Installation
1. Clone this repository:
```bash
git clone https://github.com/oscar/Regression-Project.git

## Usage
jupyter notebook notebooks/exploratory_analysis.ipynb

## Project Structure
Regression-Project/
│
├── data/
│ ├── raw/
│ ├── processed/
│
├── notebooks/
│ ├──
│ ├──
│
├── scripts/
│ ├── data_preprocessing.py
│ ├── model_training.py
│ ├── evaluation.py
│
├── visuals/
│
├── README.md
├── requirements.txt
├── LICENSE
├── .gitignore
│ ├── workflows
│ ├── workflows_aws
│ ├── workflows_ngrok exploratory_analysis.ipynb regression_model.ipynb

## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements.

## License
This project is licensed under the MIT License. See the LICENSE file for more details.

## Workflows

**Workflows: Automating Deployment and Setup**
This project provides three distinct workflows to enable deployment and environment setup across various platforms. Each is designed for specific use cases, as detailed below.

1. Workflows/streamlit.yml: **Deploying to Streamlit Cloud**
This workflow facilitates automatic deployment of the application to Streamlit Cloud, a free hosting service specifically designed for Streamlit-based applications.

**Key Features:**
- **Core File**: .github/workflows/streamlit.yml.
- Automatically installs dependencies from requirements.txt.
- Configures environment variables and prepares necessary files, such as kaggle.json, for data access.
- Ideal for simple and rapid deployments of Streamlit applications.

**Benefits:**
- Free hosting with public access.
- Streamlined management of dependencies and environment setup.

2. workflows_ngrok/streamlit_ngrok_solution: **Running with Ngrok**
This workflow is intended to expose the local Streamlit application to the internet using Ngrok, a tool for creating secure HTTPS tunnels to localhost.

**Key Features:**
- Core File: workflows_ngrok/streamlit_ngrok_solution.
- Configures Ngrok to create a tunnel and expose the application.
- **Currently Non-Functional** due to recent changes in Ngrok's tunneling and endpoint APIs.

**Current Limitations:**
- Ngrok updates have broken compatibility with the current implementation.
- Ongoing efforts aim to adapt this workflow to the latest Ngrok standards.
**Future Use:**
- This workflow has potential for temporary public exposure of local applications without a dedicated hosting platform.

3. workflows_aws/ec2_AWS.txt: **Deploying to AWS EC2**
This workflow outlines the steps for deploying the application on an AWS EC2 instance. It is suited for scalable and customizable environments.

**Key Steps:**
- **Launch an EC2 Instance:** Use the AWS Management Console or AWS CLI to create an instance.
- **User Data File:** Copy the commands in workflows_aws/ec2_AWS.txt and paste them into the "User Data" field during the instance setup. These commands:
- Install Python, pip, and Streamlit.
- Install dependencies from requirements.txt.
- Launch the application on the configured port.

- **Benefits:**
- Full control over the runtime environment.
- Scalability to handle varying levels of traffic and usage.

## Summary
Each workflow serves a distinct purpose:

- Streamlit Cloud: For quick and easy deployment.
- Ngrok: For temporary public testing (under development).
- AWS EC2: For robust and scalable deployments.

Follow the instructions in each workflow file to implement the desired deployment strategy.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oscartma/regression-project

Awesome Lists containing this project

README