Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ebadshabbir/company_profit-onehotencoding-

This project uses multiple linear regression to predict startup profits based on spending and location data from the **50 Startups** dataset. It includes data preprocessing, model training, and performance evaluation using Scikit-Learn.
https://github.com/ebadshabbir/company_profit-onehotencoding-

jupyter-notebook machine-learning matplotlib multiple-linear-regression onehot-encoding pandas pyhton regression sklearn

Last synced: 8 days ago
JSON representation

Host: GitHub
URL: https://github.com/ebadshabbir/company_profit-onehotencoding-
Owner: EbadShabbir
Created: 2024-10-19T17:26:39.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-10-20T16:41:33.000Z (4 months ago)
Last Synced: 2025-02-11T17:58:12.644Z (8 days ago)
Topics: jupyter-notebook, machine-learning, matplotlib, multiple-linear-regression, onehot-encoding, pandas, pyhton, regression, sklearn
Language: Jupyter Notebook
Homepage: https://www.kaggle.com/code/ebadshabbir/company-profit
Size: 8.79 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Multiple Linear Regression on 50 Startups Dataset

This project demonstrates how to perform multiple linear regression using Python and Scikit-Learn on the **50 Startups** dataset. The dataset contains data on 50 startups with information on R&D Spend, Administration, Marketing Spend, and State, along with the corresponding profit.

## Project Overview

In this project, we:

1. **Preprocess the data**: Use OneHotEncoding to handle categorical variables (State).
2. **Avoid the dummy variable trap**: Remove one of the one-hot encoded columns.
3. **Split the data**: Divide the dataset into training and test sets.
4. **Fit the multiple linear regression model**: Train the model on the training set.
5. **Evaluate the model**: Measure the accuracy of the model using training and test scores.

## Libraries and Dependencies

- `numpy`: For array operations
- `pandas`: For data handling
- `matplotlib`: For plotting (not used in this case, but imported)
- `scikit-learn`: For machine learning tasks such as encoding, splitting, and regression
```bash
pip install numpy pandas matplotlib scikit-learn

Cloning and Running the Project
Clone this repository to your local machine:

```bash
git clone https://github.com/EbadShabbir/50-startups-regression.git
Navigate to the project directory:

```bash

cd 50-startups-regression
Ensure that you have the 50_Startups.csv dataset in the same directory as the script, or adjust the dataset path accordingly in the code.

Run the Python script:

```bash

python regression_model.py