https://github.com/vijaykumarr1452/startup_success_predictor

This project demonstrates the use of Multiple Linear Regression to predict the profits of startups based on investment in R&D, Administration, and Marketing of dataset (50_Startups.csv)
https://github.com/vijaykumarr1452/startup_success_predictor

machine-learning multi-linear-regression numpy pandas python regression rsquare-values scikit-learn

Last synced: 3 months ago
JSON representation

This project demonstrates the use of Multiple Linear Regression to predict the profits of startups based on investment in R&D, Administration, and Marketing of dataset (50_Startups.csv)

Host: GitHub
URL: https://github.com/vijaykumarr1452/startup_success_predictor
Owner: vijaykumarr1452
Created: 2024-11-28T11:37:05.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-11-28T14:53:25.000Z (7 months ago)
Last Synced: 2025-02-01T01:31:04.231Z (5 months ago)
Topics: machine-learning, multi-linear-regression, numpy, pandas, python, regression, rsquare-values, scikit-learn
Language: Jupyter Notebook
Homepage: https://github.com/vijaykumarr1452/Startup_Success_Predictor
Size: 156 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Startup_Success_Predictor

## **Startup Profit Prediction Using Multiple Linear Regression**

This project demonstrates the use of **Multiple Linear Regression** to predict the profits of startups based on investment in R&D, Administration, and Marketing. The dataset (`50_Startups.csv`) includes 50 records of startups with features such as `R&D Spend`, `Administration`, `Marketing Spend`, `State`, and `Profit`. The analysis preprocesses the dataset (including encoding categorical data), splits it into training and testing sets, and builds a predictive model using Python libraries like **pandas**, **NumPy**, and **scikit-learn**. Insights on model performance are obtained using evaluation metrics such as R-squared.

---

## **Description of the Dataset**

The 50_Startups.csv dataset contains information about 50 startups, with the following attributes:

*R&D Spend*: Investment in research and development. ($) //

*Administration*: Investment in administrative operations. ($) //

*Marketing Spend*: Budget allocated for marketing campaigns. ($) //

*State*: The state where the startup operates. (New York, California, or Florida) //

*Profit*: The target variable, representing the net profit of the startup. ($) //

---

### **Key Insights** :

Profit and R&D Spend: Higher R&D investments often correlate with higher profits, suggesting it is a crucial factor.

State Influence: The dataset includes three categorical states. Encoding this variable helps explore its influence on profit.

Balanced Features: All numeric columns represent monetary values, making preprocessing straightforward without scaling.

---

## **Key Steps**

1. **Data Preparation**: Cleaned and encoded categorical data (`State`), then split into independent variables (`X`) and the dependent variable (`Profit`).  

2. **Model Building**: Trained a Multiple Linear Regression model using `LinearRegression` from **scikit-learn**.  

3. **Prediction and Evaluation**: Predicted startup profits on test data and assessed performance metrics.  

Run the notebook to reproduce the workflow or adapt the model for similar regression problems.

## **Metrics** 

![PNG](prediction_metrics.png)

---

# **Code** :

```python

import pandas as pd

from nbformat import read

# Paths to the uploaded files

notebook_path = '/mnt/data/Multiple_Linear_Regression.ipynb'

dataset_path = '/mnt/data/50_Startups.csv'

# Load and analyze the Jupyter notebook content

with open(notebook_path, 'r', encoding='utf-8') as f:

    notebook = read(f, as_version=4)

# Load the dataset to inspect its structure

dataset = pd.read_csv(dataset_path)

# Extract the notebook structure

code_cells = [cell['source'] for cell in notebook['cells'] if cell['cell_type'] == 'code']

markdown_cells = [cell['source'] for cell in notebook['cells'] if cell['cell_type'] == 'markdown']

# Get dataset summary and notebook insights

dataset_info = {

    "columns": dataset.columns.tolist(),

    "shape": dataset.shape,

    "head": dataset.head().to_dict()

}

notebook_info = {

    "total_cells": len(notebook['cells']),

    "code_cells_count": len(code_cells),

    "markdown_cells_count": len(markdown_cells),

    "example_code_snippet": code_cells[:2],  # First two code cells for reference

    "example_markdown": markdown_cells[:2]  # First two markdown cells for reference

}

dataset_info, notebook_info

```

---

### **Connect** :

If you have any questions or suggestions, feel free to reach out to me:

- Email: [[email protected]](mailto:[email protected])

- GitHub: [Profile](https://github.com/vijaykumarr1452)

- Linkedin: [Linkedin](https://www.linkedin.com/in/rachuri-vijaykumar/)

- Twitter: [Twitter](https://x.com/vijay_viju1)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vijaykumarr1452/startup_success_predictor

Awesome Lists containing this project

README