{"id":25301605,"url":"https://github.com/kristishqau/apartmentregressionanalysis","last_synced_at":"2026-05-01T13:31:28.167Z","repository":{"id":223338939,"uuid":"759943781","full_name":"kristishqau/ApartmentRegressionAnalysis","owner":"kristishqau","description":"This data science project aims to predict apartment prices through regression analysis. The dataset used contains information about apartments, and the project involves various steps such as data preprocessing, exploratory data analysis, feature engineering, and building a decision tree regression model.","archived":false,"fork":false,"pushed_at":"2024-02-24T10:59:51.000Z","size":1326,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-07T01:18:15.101Z","etag":null,"topics":["apartment-prices","data-preprocessing","data-science","data-visualization","decision-tree-regression","jupyter-notebook","prediction","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kristishqau.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-19T16:18:19.000Z","updated_at":"2025-01-24T23:16:02.000Z","dependencies_parsed_at":"2024-02-24T11:46:01.361Z","dependency_job_id":null,"html_url":"https://github.com/kristishqau/ApartmentRegressionAnalysis","commit_stats":null,"previous_names":["chralsh/apartment_price_ds_regression_project","kristishqau/apartment_price_ds_regression_project","kristishqau/apartmentregressionanalysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristishqau%2FApartmentRegressionAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristishqau%2FApartmentRegressionAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristishqau%2FApartmentRegressionAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristishqau%2FApartmentRegressionAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kristishqau","download_url":"https://codeload.github.com/kristishqau/ApartmentRegressionAnalysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247574092,"owners_count":20960497,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apartment-prices","data-preprocessing","data-science","data-visualization","decision-tree-regression","jupyter-notebook","prediction","python3"],"created_at":"2025-02-13T06:43:51.393Z","updated_at":"2026-05-01T13:31:27.893Z","avatar_url":"https://github.com/kristishqau.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Project Structure\n\n1. **Introduction**\n   - This project focuses on predicting apartment prices using regression analysis. It involves data preprocessing, exploratory data analysis, feature engineering, and building a decision tree regression model.\n\n2. **Libraries Used**\n   - The project utilizes the following libraries and modules:\n     - NumPy\n     - Pandas\n     - Matplotlib\n     - Seaborn\n     - Scikit-learn\n     - Geopy\n\n3. **Loading the Raw Data**\n   - The raw data is loaded from an Excel file using the Pandas library.\n   ```python\n   raw_data = pd.read_excel(\"C:\\\\Users\\\\user\\\\Downloads\\\\Apartments_Data.xlsx\")\n\n4. **Preprocessing**\n   - **Exploring Descriptive Statistics:**\n     - Descriptive statistics are examined to gain insights into the variables.\n     - Two sets of descriptive statistics are presented: one for numerical variables and another for both numerical and categorical variables.\n     ```python\n     display(raw_data.describe(include='all'))\n     display(raw_data.describe())\n     ```\n\n   - **Determining Variables of Interest:**\n     - Calculating the distance from the center using latitude and longitude to identify potential factors affecting prices.\n     ```python\n     raw_data['latitude_center'] = 41.327953\n     raw_data['longitude_center'] = 19.819025\n     raw_data['distance_from_center'] = raw_data.apply(lambda x: geodesic((x['lat'], x['lon']),(x['latitude_center'], x['longitude_center'])).km, axis=1)\n     ```\n\n   - **Adding New Columns:**\n     - Utilizing comments in the dataset to add new columns such as 'Parkim' and 'Ashensor' based on the presence of specific keywords.\n     ```python\n     raw_data['Parkim'] = raw_data['comments'].apply(hasParking)\n     raw_data['Ashensor'] = raw_data['comments'].apply(hasElevator)\n     ```\n\n   - **Cleaning Data:**\n     - Removing unwanted characters (e.g., '$') from the 'price' column and converting it to a float.\n     ```python\n     raw_data['price'] = raw_data['price'].replace('[^\\d.]', '', regex=True)\n     raw_data['price'] = raw_data['price'].astype(float)\n     ```\n\n   - **Handling Missing Values:**\n     - Identifying and handling missing values in the dataset.\n     ```python\n     data.isnull().sum()\n     data_no_mv = data.dropna(axis=0)\n     ```\n\n   - **Exploring PDFs and Identifying Outliers:**\n     - Visualizing the probability distribution function (PDF) of the 'price' variable and addressing outliers.\n     ```python\n     sns.distplot(data_no_mv['price'])\n     # Additional steps for handling outliers\n     ```\n\n5. **Exploratory Data Analysis**\n   - Visualizations depicting relationships between variables, including scatter plots for price and different features.\n   ```python\n   f, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, sharey=True, figsize=(15, 3))\n   ax1.scatter(data_cleaned['year'], data_cleaned['price'])\n   # Additional scatter plots for other features\n   plt.show()\n\n6. **Create Dummy Variables**\n   - Creating dummy variables from categorical data using the 'get_dummies' method.\n\n7. **Regression Model**\n   - Declaring inputs and targets for the regression model.\n   - Scaling the data using StandardScaler.\n   - Splitting the data into training and testing sets.\n   - Building a decision tree regression model and evaluating its performance.\n     ```python\n     # Example code snippet\n     model = DecisionTreeRegressor()\n     model.fit(x_train, y_train)\n     ```\n\n8. **Model Evaluation**\n   - Checking R-squared scores and residual plots to assess the model's goodness of fit.\n     ```python\n     # Example code snippet\n     train_score = model.score(x_train, y_train)\n     ```\n\n9. **Testing**\n   - Hyperparameter tuning using grid search for DecisionTreeRegressor.\n     ```python\n     # Example code snippet\n     param_grid = {'max_depth': [None, 5, 10, 15], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4]}\n     grid_search = GridSearchCV(model, param_grid, cv=5, scoring='neg_mean_squared_error')\n     grid_search.fit(x_train, y_train)\n     ```\n   - Evaluating the model on the testing set.\n     ```python\n     # Example code snippet\n     best_params = grid_search.best_params_\n     best_model = DecisionTreeRegressor(**best_params)\n     test_score = best_model.score(x_test, y_test)\n     ```\n\n10. **Results**\n    - Visualizations of actual vs predicted prices on both the training and testing sets.\n      ```python\n      # Example code snippet\n      plt.scatter(y_train, y_hat)\n      plt.xlabel('Targets (y_train)', size=18)\n      plt.ylabel('Predictions (y_hat)', size=18)\n      plt.show()\n      ```\n\n## Improvements\n\n**Note:** This project was developed for learning purposes and is not intended for production use. While it provides insights into regression analysis and decision tree modeling, there are several areas for improvement:\n\n1. **Data Quality Enhancement**\n\n2. **Feature Engineering**\n\n3. **Model Optimization**\n\n4. **Handling Categorical Data**\n\n5. **Model Interpretability and Testing**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkristishqau%2Fapartmentregressionanalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkristishqau%2Fapartmentregressionanalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkristishqau%2Fapartmentregressionanalysis/lists"}