{"id":20619858,"url":"https://github.com/mafda/seattle_airbnb_data_analysis","last_synced_at":"2026-05-29T12:04:18.765Z","repository":{"id":246462321,"uuid":"816774913","full_name":"mafda/seattle_airbnb_data_analysis","owner":"mafda","description":"This repository contains a comprehensive analysis of the Seattle Airbnb dataset, conducted using the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology.","archived":false,"fork":false,"pushed_at":"2024-07-03T05:57:45.000Z","size":2047,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-17T05:09:16.045Z","etag":null,"topics":["crisp-dm","data-analysis","data-science","jupyter-notebook","pandas-python","seattle-data"],"latest_commit_sha":null,"homepage":"https://medium.com/@mafda/seattle-airbnb-data-analysis-a-data-driven-journey-with-crisp-dm-for-data-scientists-b66b5672c617","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mafda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-18T11:34:16.000Z","updated_at":"2024-07-28T22:08:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"7c56adfa-c41e-4b23-9311-88ee7a23540f","html_url":"https://github.com/mafda/seattle_airbnb_data_analysis","commit_stats":null,"previous_names":["mafda/seattle_airbnb_data_analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mafda%2Fseattle_airbnb_data_analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mafda%2Fseattle_airbnb_data_analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mafda%2Fseattle_airbnb_data_analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mafda%2Fseattle_airbnb_data_analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mafda","download_url":"https://codeload.github.com/mafda/seattle_airbnb_data_analysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242277684,"owners_count":20101542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crisp-dm","data-analysis","data-science","jupyter-notebook","pandas-python","seattle-data"],"created_at":"2024-11-16T12:12:42.216Z","updated_at":"2026-05-29T12:04:13.729Z","avatar_url":"https://github.com/mafda.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Seattle Airbnb Data Analysis\n\nThe Seattle Airbnb dataset is a comprehensive collection of data related to\nAirbnb listings in Seattle. It provides detailed information across several\naspects of the Airbnb ecosystem, including listings, calendar availability, and\nuser reviews.\n\nThe Seattle Airbnb dataset offers a rich source of information for analyzing\nvarious aspects of Airbnb listings in Seattle, including trends in pricing and\navailability, geographic distribution of listings, and customer satisfaction\nbased on reviews. This dataset is valuable for conducting detailed exploratory\ndata analysis (EDA), machine learning projects, and deriving insights into the\nshort-term rental market in Seattle.\n\nThe dataset is divided into three main files:\n\n- `listings.csv`: Contains detailed information about each Airbnb listing in\n  Seattle.\n- `calendar.csv`: Provides daily availability and pricing information for each\n  listing.\n- `reviews.csv`: Contains user reviews for each listing.\n\n## Project Setup\n\n### Clone this repository\n\n```shell\n(base)$: git clone git@github.com:mafda/seattle_airbnb_data_analysis.git\n(base)$: cd seattle_airbnb_data_analysis\n```\n\n### Configure environment\n\n- Create the conda environment\n\n    ```shell\n    (base)$: conda env create -f environment.yml\n    ```\n\n- Activate the environment\n\n    ```shell\n    (base)$: conda activate seattle_airbnb\n    ```\n\n- Download the dataset from [Seattle Airbnb Open\n  Data](https://www.kaggle.com/datasets/airbnb/seattle/data), create `data`\n  folder and copy the data here.\n\n    ```shell\n    (seattle_airbnb)$: mkdir data\n    ```\n\n## Project Structure\n\n```shell\n.\n├── README.md\n├── data\n│   ├── calendar.csv\n│   ├── listings.csv\n│   └── reviews.csv\n├── environment.yml\n└── src\n    └── seattle_airbnb_data_analysis.ipynb\n```\n\n## Metodology\n\nThis project will follow [CRoss Industry Standard Process for Data Mining -\nCRISP-DM](https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining)\nmethodology.\n\n### Key Highlights\n\n- **Business Understanding**: Identify the key business questions, such as\n  factors influencing occupancy rates, pricing strategies, and customer\n  satisfaction.\n\n- **Data Understanding**: Explore the structure and content of the dataset,\n  including listings.csv, calendar.csv, and reviews.csv.\n\n- **Data Preparation**: Clean and pre-process the data by handling missing\n  values and converting data types.\n\n- **Modeling**: Apply statistical and machine learning models to analyze the\n  linear relationship between features and price. Use visualization techniques\n  to illustrate the relationships.\n\n- **Evaluation**: Validate the models and analysis results for accuracy and\n  reliability.\n\n- **Deployment**: Document the insights and recommendations based on the\n  analysis. Provide actionable strategies for optimizing listings, pricing, and\n  improving customer experience.\n\n## Findings\n\nSome exploratory analysis.\n\n### Identify temporal patterns in reserves and prices\n\nThe analysis reveals a relationship between occupancy rates and prices. This\nindicates a seasonal trend with peaks in winter (January) and summer (July),\nfollowed by gradual decreases. This correlation suggests that higher occupancy\nrates, likely driven by seasonal demand, are associated with higher prices.\n\n![](./assets/q1.png)\n\n### Determine factors that influence property prices.\n\nThe correlation matrix shows that there is a high correlation between price and\nreview-related variables (such as number of reviews and review score).\nSignificant correlations are also observed with the number of bathrooms, beds,\nbedrooms, accommodation capacity and number of guests included. This suggests\nthat these factors are key determinants of listing prices, with larger,\nbetter-reviewed properties tending to command higher prices.\n\n![](./assets/corr_matrix.png)\n\n### Evaluate customer satisfaction based on reviews\n\nThe most frequently used words in user reviews were \"great,\" \"clean,\"\n\"location,\" and \"comfortable.\" These keywords suggest that customers generally\nhad positive experiences and appreciated the cleanliness, location, and comfort\nof the accommodations. The analysis shows that customer satisfaction is\noverwhelmingly positive, with a few areas for improvement in handling\nmultilingual reviews.\n\n![](./assets/q3.png)\n\n## Data Modeling\n\n**Linear Regression** is a simple model that attempts to capture the linear\nrelationship between **features and price**.\n- Pros: Easy to interpret and quick to train.\n- Cons: May not capture complex relationships between variables.\n\n### Results\n\nAlthough the model has a reasonable level of predictive ability, an R² of **0.558**\n  also suggests that there is **44.2%** of the variability that is not being\n  captured by the model, indicating that some important features could be\n  missing or that the model is not complex enough to capture all relationships\n  present in the data.\n\n![Actual vs Predicted Prices](./assets/actual_predicted_prices.png)\n\n| Cost                      | Result |\n| ------------------------- | ------ |\n| Mean Absolute Error (MAE) | 38.09  |\n| R² Score                  | 0.558  |\n\n## Analysis of Results and Conclusions\n\nAlthough the model has a reasonable level of predictive ability, an R² of 0.558 also suggests that there is 44.2% of the variability that is not being captured by the model, indicating that some important features could be missing or that the model is not complex enough to capture all relationships present in the data.\n\nFor a detailed analysis of the results obtained, visit our [Medium\nblog](https://medium.com/@mafda/seattle-airbnb-data-analysis-a-data-driven-journey-with-crisp-dm-for-data-scientists-b66b5672c617).\nHere you will find a comprehensive explanation of the findings and their\ninterpretation in the context of the Seattle Airbnb data analysis.\n\n\u003e [Seattle Airbnb: Temporal Trends, Price Influences, and Customer Satisfaction Analysis.](https://medium.com/@mafda/seattle-airbnb-data-analysis-a-data-driven-journey-with-crisp-dm-for-data-scientists-b66b5672c617)\n\n## References\n\n- [Data Scientist Nanodegree\n  Program](https://www.udacity.com/course/data-scientist-nanodegree--nd025)\n- [Seattle Airbnb Dataset](https://www.kaggle.com/datasets/airbnb/seattle/data)\n\n---\n\nmade with 💙 by [mafda](https://mafda.github.io/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmafda%2Fseattle_airbnb_data_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmafda%2Fseattle_airbnb_data_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmafda%2Fseattle_airbnb_data_analysis/lists"}