{"id":21327595,"url":"https://github.com/hamada-khairi/pfda-hamada","last_synced_at":"2025-03-16T00:14:12.887Z","repository":{"id":281357150,"uuid":"636155986","full_name":"Hamada-khairi/PFDA-Hamada","owner":"Hamada-khairi","description":"A comprehensive R-based data analysis project that examines housing rental patterns across multiple cities, utilizing statistical methods and visualization techniques to analyze 4,746 properties' data points including rent prices, locations, and amenities. The project employs various R libraries to clean, process, and visualize rental market trends","archived":false,"fork":false,"pushed_at":"2024-09-14T10:52:07.000Z","size":3862,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-08T14:53:03.341Z","etag":null,"topics":["apu","data-analysis","data-analysis-in-r","data-cleaning-and-preprocessing","data-processing-and-analysis","data-science","data-visualization-project","ggplot2","house-rent-prediction","r-programming-projects","r-statistics","r-studio","real-estate-analytics"],"latest_commit_sha":null,"homepage":"https://hamadakh.com","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Hamada-khairi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-04T08:38:08.000Z","updated_at":"2025-01-23T21:38:21.000Z","dependencies_parsed_at":"2025-03-08T14:53:04.545Z","dependency_job_id":"2e0eb965-00b3-42f0-a4a6-1d052bf8a976","html_url":"https://github.com/Hamada-khairi/PFDA-Hamada","commit_stats":null,"previous_names":["hamada-khairi/pfda-hamada"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hamada-khairi%2FPFDA-Hamada","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hamada-khairi%2FPFDA-Hamada/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hamada-khairi%2FPFDA-Hamada/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hamada-khairi%2FPFDA-Hamada/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Hamada-khairi","download_url":"https://codeload.github.com/Hamada-khairi/PFDA-Hamada/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243806075,"owners_count":20350775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apu","data-analysis","data-analysis-in-r","data-cleaning-and-preprocessing","data-processing-and-analysis","data-science","data-visualization-project","ggplot2","house-rent-prediction","r-programming-projects","r-statistics","r-studio","real-estate-analytics"],"created_at":"2024-11-21T21:18:24.571Z","updated_at":"2025-03-16T00:14:12.881Z","avatar_url":"https://github.com/Hamada-khairi.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 🏡 House Rent Data Analysis and Prediction\n\n## 👥 Team Members\n- **TP065783**: Khaled Awad\n- **TP064361**: Abdelrahman Mourad\n- **TP066168**: Mohamed Khairy\n\n## 📋 Overview\n\nWelcome to the House Rent Data Analysis and Prediction project! This project delves into analyzing a comprehensive dataset of rental housing costs. With 4,746 observations across 12 columns, our goal is to identify patterns and relationships in the data, which include factors like rent, area type, city, size, and furnishing status. We also aim to use predictive analysis to provide insights into the rental market.\n\n## 📚 Table of Contents\n- [🔧 Installing Packages](#-installing-packages)\n- [📦 Loading Libraries](#-loading-libraries)\n- [📂 Data Loading and Pre-processing](#-data-loading-and-pre-processing)\n  - [📋 Data Cleaning](#-data-cleaning)\n- [📊 Analysis and Visualizations](#-analysis-and-visualizations)\n  - [📍 Relationship Between Rent, Area Type, and Point of Contact](#-relationship-between-rent-area-type-and-point-of-contact)\n  - [🏙️ Relationship Between Rent, City, and Size](#-relationship-between-rent-city-and-size)\n  - [🛋️ Relationship Between Rent, City, and Furnished Status](#-relationship-between-rent-city-and-furnished-status)\n  - [🏘️ Most Popular Houses per Category](#-most-popular-houses-per-category)\n  - [🌆 Cities with Highest Amounts in Each Category](#-cities-with-highest-amounts-in-each-category)\n- [✨ Additional Features](#-additional-features)\n- [📌 Conclusion](#-conclusion)\n- [📖 References](#-references)\n\n## 🔧 Installing Packages\n\nTo perform the analysis and visualizations, you need to install the following R packages:\n\n```r\ninstall.packages(\"dplyr\")\ninstall.packages(\"ggplot2\")\ninstall.packages(\"corrplot\")\ninstall.packages(\"plotly\")\ninstall.packages(\"tidyr\")\ninstall.packages(\"tidyverse\")\ninstall.packages(\"caTools\")\n```\n\n## 📦 Loading Libraries\n\nLoad the necessary libraries to utilize various functions for data manipulation and visualization:\n\n```r\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(corrplot)\nlibrary(plotly)\nlibrary(tidyr)\nlibrary(tidyverse)\nlibrary(caTools)\n```\n\n## 📂 Data Loading and Pre-processing\n\n### 🗂️ Loading the Dataset\nWe load the house rent dataset into R for our analysis:\n\n```r\ndata \u003c- read.csv(\"path_to_dataset/House_Rent_Dataset.csv\")\nhead(data)\n```\n\n### 📋 Data Cleaning\n\n#### Checking for Missing Values\nEnsuring data quality is crucial, so we start by checking for missing values in the dataset:\n\n```r\ncolSums(is.na(data))\n```\nIn this dataset, there are no missing values.\n\n#### Checking for Garbage Values\nWe look for inconsistencies in categorical columns to ensure data integrity:\n\n```r\nunique(data$Area.Type)\nunique(data$City)\nunique(data$Furnishing.Status)\nunique(data$Tenant.Preferred)\nunique(data$Point.of.Contact)\n```\nNo garbage values were found in this dataset.\n\n#### Summary Statistics\nTo understand the basic properties of the data, we generate summary statistics:\n\n```r\nsummary(data)\n```\n- **Rent**: Average rent is `34,993`, with a maximum value of `3,500,000`.\n- **Size**: Average size is `967 sq ft`, with a maximum size of `8,000 sq ft`.\n- **Bathroom**: Average number of bathrooms is `1.9`, with a maximum of `10`.\n\n#### Removing Outliers\nOutliers are identified using the interquartile range method to maintain the integrity of our analysis:\n\n```r\noutliers \u003c- function(x) {\n  Q1 \u003c- quantile(x, probs=.25)\n  Q3 \u003c- quantile(x, probs=.75)\n  iqr = Q3-Q1\n  upper_limit = Q3 + (iqr*1.5)\n  lower_limit = Q1 - (iqr*1.5)\n  x \u003e upper_limit | x \u003c lower_limit\n}\n\nremove_outliers \u003c- function(df, cols = names(df)) {\n  for (col in cols) {\n    df \u003c- df[!outliers(df[[col]]),]\n  }\n  df\n}\ndata \u003c- remove_outliers(data, c('Rent', 'Size', 'Bathroom'))\n```\n\n## 📊 Analysis and Visualizations\n\n### 📍 Relationship Between Rent, Area Type, and Point of Contact\n\n#### 🧐 Analysis 1.1: Houses with \"Contact Owner\" as Point of Contact\nWe explored properties rented directly through contact with the owner:\n\n```r\ndata[which(data$Point.of.Contact == \"Contact Owner\"),]\n```\nMost properties rented through direct contact with the owner are suitable for singles and families.\n\n#### 💰 Analysis 1.2: Average and Maximum Rent\nDetermine the average and maximum rent:\n\n```r\nmean(data$Rent)\nmax(data$Rent)\n```\n\n#### 📈 Analysis 1.3: Rent Distribution by Area Type\nExamine how rent varies across different area types using a boxplot:\n\n```r\nggplot(data = data, mapping = aes(x = Area.Type, y = Rent)) +\n  geom_boxplot(col=\"orange\") +\n  labs(title = \"Distribution of Rent By Area Type\")\n```\n- **Carpet Area**: Highest average rent.\n- **Super Area**: Moderately priced.\n- **Built Area**: Lowest average rent.\n\n#### 🏠 Analysis 1.4: Average House Rents and Sizes by Point of Contact\nDetermine average house sizes and rents by point of contact:\n\n```r\nggplot(temp, aes(x = \"\", y = Avg_Rent, fill = Point.of.Contact)) +\n  geom_col() +\n  geom_text(aes(label = round(Avg_Rent, 2)), position = position_stack(vjust = 0.5)) +\n  coord_polar(theta = \"y\") +\n  labs(title = \"Average Rent By Point of Contact\")\n```\n- **Agent Contact**: Highest average rent and size.\n- **Builder Contact**: Lowest rent and size.\n\n### 🏙️ Relationship Between Rent, City, and Size\n\n#### 🌟 Analysis 2.1: Most and Least Preferred Cities\nIdentifying the most and least preferred cities based on rental properties:\n\n```r\nCity_Count \u003c- data %\u003e% group_by(City) %\u003e% summarise(count = length(BHK)) %\u003e% arrange(desc(count))\nggplot(City_Count, mapping = aes(x= City, y= count, fill = count)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(x = \"City\", y = \"Count\", title = \"Houses Counted by Cities\")\n```\n- **Most Preferred**: Chennai\n- **Least Preferred**: Mumbai\n\n#### 📏 Analysis 2.2: Rent Per Size\nCalculating the rent per size for each property:\n\n```r\ndata$Rent_per_size \u003c- data$Rent / data$Size\n```\n\n#### 📐 Analysis 2.3: Relationship Between House Size and Rent\nExploring how house size impacts the rent:\n\n```r\nggplot(data, aes(x=Size, y=Rent)) + \n  geom_point() + geom_smooth() +\n  labs(title = \"Relationship Between Size \u0026 Rent\")\n```\n- Positive relationship: Larger size generally corresponds to higher rent.\n\n#### 🏘️ Analysis 2.4: Average House Sizes by City\nIdentify the average house sizes for each city:\n\n```r\ntemp \u003c- data %\u003e% group_by(City) %\u003e% summarise(Avg_Size = mean(Size))\nggplot(data = temp, mapping = aes(x = City, y = Avg_Size, fill = City)) +\n  geom_bar(stat=\"identity\", position = \"dodge\") +\n  labs(title = \"Average House Sizes By City\")\n```\n- **Largest Average Size**: Hyderabad\n- **Smallest Average Size**: Delhi\n\n### 🛋️ Relationship Between Rent, City, and Furnished Status\n\n#### 🛠️ Analysis 3.1: Preferred Furnishing Status\nFinding the most and least preferred furnishing status:\n\n```r\nFurnished_Status \u003c- data %\u003e% group_by(Furnishing.Status) %\u003e% summarise(count = length(BHK))\nggplot(Furnished_Status, mapping = aes(x= Furnishing.Status, y= count, fill = count)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(x = \"Furnished Status\", y = \"Count\", title = \"Count By Facilities\")\n```\n- **Most Preferred**: Semi-Furnished\n\n#### 🏙️ Analysis 3.2: City Impact on Rent Prices\nExamining how rent prices vary by city:\n\n```r\nggplot(data = data, mapping = aes(x = City, y = Rent)) +\n  geom_boxplot(col=\"black\") +\n  labs(title = \"Distribution of Rent By City\")\n```\n- **Highest Rent**: Mumbai\n- **Lowest Rent**: Kolkata\n\n### 🏘️ Most Popular Houses per Category\n\n#### 🚿 Analysis 4.1: Most Popular Number of Bathrooms\nIdentify the most common number of bathrooms in rental properties:\n\n```r\nBathroom_Count \u003c- data %\u003e% group_by(Bathroom) %\u003e% summarise(count\n\n = length(BHK)) %\u003e% top_n(5)\nggplot(Bathroom_Count, mapping = aes(x= Bathroom, y= count, fill = count)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(x = \"Bathrooms\", y = \"Count\", title = \"Count Of Bathrooms in 1 House\")\n```\n- **Most Common**: 2 Bathrooms\n\n#### 🏡 Analysis 4.4: Most Popular House Sizes\nAnalyze the distribution of house sizes:\n\n```r\nSize_Count \u003c- data %\u003e% group_by(Size) %\u003e% summarise(count = length(Size)) %\u003e% top_n(8)\nggplot(Size_Count, mapping = aes(x= Size, y= count, fill = count)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(title = \"Count By Size\")\n```\n- **Most Popular Size**: 600 sq ft\n\n### 🌆 Cities with Highest Amounts in Each Category\n\n#### 🏙️ Analysis 5.1: City with Highest Total Amount of BHK\nIdentifying the city with the most BHK:\n\n```r\nTotal_Amount_BHK_Per_city \u003c- data %\u003e% group_by(City) %\u003e% summarise(Total_BHK = sum(BHK))\nggplot(Total_Amount_BHK_Per_city, mapping = aes(x= City, y= Total_BHK, fill = Total_BHK)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(title = \"Total Amount Of BHK Per City\")\n```\n- **Highest Total BHK**: Chennai\n\n#### 💵 Analysis 5.3: City with Highest Total Rent\nFinding which city has the highest total rent:\n\n```r\nTotal_Amount_Rent_Per_city \u003c- data %\u003e% group_by(City) %\u003e% summarise(Total_Rent = sum(Rent))\nggplot(Total_Amount_Rent_Per_city, mapping = aes(x= City, y= Total_Rent, fill = Total_Rent)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(title = \"Total amount Of rent per City\")\n```\n- **Highest Total Rent**: Mumbai and Chennai are close.\n\n## ✨ Additional Features\n\n### 📊 Feature 1: Correlogram Matrix\nGenerating a correlogram to visualize the relationships between different variables:\n\n```r\nCorrelogram_Matrix \u003c- cor(data[,c(2,3,4,11)])\ncorrplot(Correlogram_Matrix, addCoef.col = TRUE)\n```\n\n### 📈 Feature 2: Scatter Plot with Regression Line\nCreate a scatter plot with a regression line for rent by house size:\n\n```r\nattach(data)\nplot(Size, Rent, main = \"Scatterplot of rent vs size\", xlab = \"House size\", ylab =\"House rent\")\nabline(lm(Rent ~ Size), col =\"blue\", lwd = 2)\n```\n\n### 🎻 Feature 3: Violin Plot for Rent by Size\nVisualizing the distribution of rent with a violin plot:\n\n```r\nggplot(data, aes(x = Rent, y = Size)) + geom_violin(trim = FALSE)\n```\n## 📷 ScreenShots\n![R-9](https://github.com/user-attachments/assets/db6b3260-e6d1-410e-92a1-61cd5e2c24e9)\n![R-8](https://github.com/user-attachments/assets/958d318a-7d1c-4b4e-ac34-5ff0942d72a5)\n![R-7](https://github.com/user-attachments/assets/61da1f8b-b0d8-4430-9e4e-1cf8eaf5d1f2)\n![R-6](https://github.com/user-attachments/assets/0c2fe23b-67d6-41e9-ab38-1b98eb80fedc)\n![R-5](https://github.com/user-attachments/assets/6e95a893-0f6c-4d7c-8f88-60dcca3937d7)\n![R-4](https://github.com/user-attachments/assets/fd5e036f-eb6d-4fc4-8cf8-6b0108ce145c)\n![R-3](https://github.com/user-attachments/assets/8cec6caa-1ff0-4a60-bef0-36081456e51c)\n![R-2](https://github.com/user-attachments/assets/eac34bbb-03df-489f-a89c-3c9c7923b62f)\n![R-1](https://github.com/user-attachments/assets/99e9afaf-a45c-4e65-9d32-02a65af33e59)\n\n\n\n## 📌 Conclusion\n\nThis analysis offers a comprehensive exploration of rental housing data, revealing key insights into factors affecting rent prices, area preferences, and housing features. By understanding these patterns, both renters and property managers can make more informed decisions in the rental market.\n\n## 📖 References\n\n1. [How to Remove Outliers in R](https://www.r-bloggers.com/2021/09/how-to-remove-outliers-in-r-3/)\n2. [Box plot by group in ggplot2](https://r-charts.com/distribution/box-plot-group-ggplot2/)\n3. [ggplot2 scatter plots: Quick start guide - R software and data visualization](http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-start-guide-r-software-and-data-visualization)\n4. [8 Tips for Better Data Visualization - Towards Data Science](https://towardsdatascience.com/8-tips-for-better-data-visualization-2f7118e8a9f4)\n5. [Tidyverse packages](https://www.tidyverse.org/packages/)\n6. [Predicting House Prices using R](https://www.kaggle.com/code/pradeeptripathi/predicting-house-prices-using-r)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhamada-khairi%2Fpfda-hamada","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhamada-khairi%2Fpfda-hamada","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhamada-khairi%2Fpfda-hamada/lists"}