https://github.com/rahulrmcoder/linear-regression--boston-house-value-prediction

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/rahulrmcoder/linear-regression--boston-house-value-prediction
Owner: RahulRmCoder
License: mit
Created: 2024-06-21T18:20:25.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-06-27T11:18:03.000Z (almost 2 years ago)
Last Synced: 2024-06-28T09:34:36.457Z (almost 2 years ago)
Language: Jupyter Notebook
Size: 422 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Linear-Regression--Boston-House-Value-Prediction

This project demonstrates the use of linear regression to predict the median value of owner-occupied homes in the Boston area using various predictors.

## Dataset

The dataset contains the following columns:

1. **crim**: per capita crime rate by town.

2. **zn**: proportion of residential land zoned for lots over 25,000 sq.ft.

3. **indus**: proportion of non-retail business acres per town.

4. **chas**: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).

5. **nox**: nitrogen oxides concentration (parts per 10 million).

6. **rm**: average number of rooms per dwelling.

7. **age**: proportion of owner-occupied units built prior to 1940.

8. **dis**: weighted mean of distances to five Boston employment centres.

9. **rad**: index of accessibility to radial highways.

10. **tax**: full-value property-tax rate per 10,000 dollars.

11. **ptratio**: pupil-teacher ratio by town.

12. **black**: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.

13. **lstat**: lower status of the population (percent).

14. **medv**: median value of owner-occupied homes in $1000s.

## Steps to Run the Analysis

### 1. Import Necessary Libraries

```python

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn import metrics

```

### 2. Load the Data

```python

data = pd.read_csv('path/to/your/boston.csv')  # Update the path to your dataset

data.head()

```

### 3. Data Visualization and Correlation Analysis

```python

fig = plt.figure(figsize=(15, 15))

sns.heatmap(data.corr(), annot=True)

plt.show()

```

### 4.Select Relevant Features Based on Correlation Analysis

```python

data2 = data[['indus', 'rm', 'lstat', 'medv']]

```

### 5. Check for Linearity

```python

fig = plt.figure(figsize=(15, 15))

plt.subplot(2, 3, 1)

plt.scatter(data2['indus'], data2['medv'])

plt.subplot(2, 3, 2)

plt.scatter(data2['rm'], data2['medv'])

plt.subplot(2, 3, 3)

plt.scatter(data2['lstat'], data2['medv'])

plt.show()

```

### 6. Split the Data into Training and Testing Sets

```python

X = pd.DataFrame(data2[['indus', 'rm', 'lstat']])

y = pd.DataFrame(data2['medv'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)

```

### 7. Train the Linear Regression Model

```python

model = LinearRegression()

model.fit(X_train, y_train)

```

### 8. Model Evaluation

```python

y_pred = model.predict(X_test)

# Mean Absolute Error

mae = metrics.mean_absolute_error(y_test, y_pred)

print("Mean Absolute Error:", mae)

# Mean Squared Error

mse = metrics.mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)

# Root Mean Squared Error

rmse = np.sqrt(mse)

print("Root Mean Squared Error:", rmse)

# R-Squared

r2 = metrics.r2_score(y_test, y_pred)

print("R-Squared:", r2)

```

### 9. Calculate Adjusted R-Squared

```python

n = len(X_test)

k = X_test.shape[1]

adjusted_r2 = 1 - ((1 - r2) * (n - 1)) / (n - k - 1)

print("Adjusted R-Squared:", adjusted_r2)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rahulrmcoder/linear-regression--boston-house-value-prediction

Awesome Lists containing this project

README