Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/derrickbaruga7/r-linear-model
This R project focuses on predicting housing prices based on various features such as building area, number of rooms, and more. The dataset used is cleaned and analyzed to build a predictive model using linear regression.
https://github.com/derrickbaruga7/r-linear-model
data-science linear-modeling r visualization
Last synced: about 2 months ago
JSON representation
This R project focuses on predicting housing prices based on various features such as building area, number of rooms, and more. The dataset used is cleaned and analyzed to build a predictive model using linear regression.
- Host: GitHub
- URL: https://github.com/derrickbaruga7/r-linear-model
- Owner: derrickbaruga7
- Created: 2024-07-26T14:23:10.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-26T14:29:47.000Z (6 months ago)
- Last Synced: 2024-07-26T16:07:52.246Z (6 months ago)
- Topics: data-science, linear-modeling, r, visualization
- Language: R
- Homepage:
- Size: 3.91 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# R-Linear-Model
This R project focuses on predicting housing prices based on various features such as building area, number of rooms, and more. The dataset used is cleaned and analyzed to build a predictive model using linear regression.Requirements
R version 4.3.1 or later
R packages:
tidyverse 2.0.0
tidymodels 1.1.1
InstallationInstall the required R packages using the following commands:
install.packages("tidyverse")
install.packages("tidymodels")Usage
1. Load the necessary libraries:
library(tidyverse)
library(tidymodels)2. Clean the data by removing rows with missing values:
housing_data_clean <- drop_na(housing_data)
3. Examine the correlation between numeric features:
cor(housing_data_clean[, sapply(housing_data_clean, is.numeric)])
4. Create scatter plots for initial visualization:
ggplot(housing_data_clean, aes(x = BuildingArea, y = Price)) +
geom_point() +
labs(title = "Price vs Building Area")5. Split the data into training and testing sets:
set.seed(123)
data_split <- initial_split(housing_data_clean, prop = 0.75)
train_data <- training(data_split)
test_data <- testing(data_split)6. Identify and remove outliers based on calculated fences:
calculate_fences <- function(data, column) {
Q1 <- quantile(data[[column]], 0.25)
Q3 <- quantile(data[[column]], 0.75)
IQR <- Q3 - Q1
list(lower = Q1 - 1.5 * IQR, upper = Q3 + 1.5 * IQR)
}price_fences <- calculate_fences(train_data, "Price")
building_area_fences <- calculate_fences(train_data, "BuildingArea")train_data <- train_data %>%
filter(Price > price_fences$lower, Price < price_fences$upper) %>%
filter(BuildingArea > building_area_fences$lower, BuildingArea < building_area_fences$upper)7. Build the linear regression model:
model <- lm(Price ~ BuildingArea, data = train_data)
8. Predict and visualize the results:
test_data$Predicted_Price <- predict(model, test_data)
ggplot(test_data, aes(x = BuildingArea, y = Price)) +
geom_point(aes(color = "Actual")) +
geom_point(aes(y = Predicted_Price, color = "Predicted")) +
labs(title = "Actual vs Predicted Prices")
LicenseThis project is licensed under the terms of the R Foundation for Statistical Computing.