{"id":17526927,"url":"https://github.com/digital-engineering/airbnb-scraper","last_synced_at":"2025-03-06T06:31:09.723Z","repository":{"id":55520405,"uuid":"66564394","full_name":"digital-engineering/airbnb-scraper","owner":"digital-engineering","description":"Airbnb Scraper: Advanced Airbnb Search using Scrapy","archived":false,"fork":false,"pushed_at":"2022-10-20T08:54:26.000Z","size":169,"stargazers_count":193,"open_issues_count":0,"forks_count":66,"subscribers_count":18,"default_branch":"master","last_synced_at":"2024-11-01T16:11:47.836Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/digital-engineering.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-08-25T14:20:50.000Z","updated_at":"2024-10-30T16:20:08.000Z","dependencies_parsed_at":"2023-01-20T05:35:26.948Z","dependency_job_id":null,"html_url":"https://github.com/digital-engineering/airbnb-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digital-engineering%2Fairbnb-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digital-engineering%2Fairbnb-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digital-engineering%2Fairbnb-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digital-engineering%2Fairbnb-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/digital-engineering","download_url":"https://codeload.github.com/digital-engineering/airbnb-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242161545,"owners_count":20081891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-20T15:02:38.135Z","updated_at":"2025-03-06T06:31:09.704Z","avatar_url":"https://github.com/digital-engineering.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Airbnb Scraper: Advanced Airbnb Search using Scrapy\n\n## Disclaimer: No longer maintained\n\n### This project is not currently maintained, due to difficulty in using scrapy to make requests to the Airbnb API. Project is on hold until further notice. Currently exploring a simpler approach here: https://github.com/JoeBashe/stl-scraper\n\nUse Airbnb's unofficial API to efficiently search for rental properties.\nRegex matching, ranged search, open matched properties in a browser, save to CSV, xlsx, or ElasticSearch (alpha).\n\n## Notes\n\n- Airbnb's API is subject to change at any moment, which would break this scraper. They've already changed it several\n  times in the past. Also, using this probably violates their TOS. Please only use for educational or research purposes.\n- The scraper was recently updated to work with Airbnb's new v3 GraphQL API. Some features are still being updated.\n- If you get 403 Forbidden errors when running this scraper, try browsing the Airbnb site in your web browser from the\n  same computer first, then try running the script again.\n\n## Requirements\n\n* **Python 3.10+**\n* [Scrapy](http://scrapy.org/)\n* [openpyxl](https://openpyxl.readthedocs.io/en/default/#installation)\n* ElasticSearch 7+ if using elasticsearch pipeline\n* see [requirements.txt](requirements.txt) for details\n\n## Installation (nix)\n\n```bash\n# Create venv\npython3.10 -m venv env\n\n# Enable venv\n. env/bin/activate\n\n# Install required packages\npip install -Ur requirements.txt\n\n# Create settings.py\ncp deepbnb/settings.py.dist deepbnb/settings.py\n\n# @NOTE: Don't forget to set AIRBNB_API_KEY in settings.py. To find your API key, \n# search Airbnb using Chrome, open dev tools, and look for to the url parameter  \n# named \"key\" in async requests to /api/v2/explore_tabs under the Network tab.\n```\n\n## Configuration\n\nEdit `deepbnb/settings.py` for settings. I've created some custom settings which are\ndocumented [below](https://github.com/digital-engineering/airbnb-scraper#settings). The rest are documented\nin https://docs.scrapy.org/en/latest/topics/settings.html.\n\n## Example Usage\n\n#### Minimal scraper usage:\n\n    scrapy crawl airbnb -a query=\"Colorado Springs, CO\" -o colorado_springs.csv\n\n#### Advanced examples:\n\n##### Madrid, fixed dates\n\n```\nscrapy crawl airbnb \\\n    -a query=\"Madrid, Spain\" \\\n    -a checkin=2023-10-01 \\\n    -a checkout=2023-11-30 \\\n    -a max_price=1900 \\\n    -a min_price=1800 \\\n    -a neighborhoods=\"Acacias,Almagro,Arganzuela,Argüelles,Centro,Cortes,Embajadores,Imperial,Jerónimos,La Latina,Malasaña,Moncloa,Palacio,Recoletos,Retiro,Salamanca,Sol\" \\\n    -s MUST_HAVE=\"(atico|attic|balcon|terra|patio|outdoor|roof|view)\" \\\n    -s CANNOT_HAVE=\"studio\" \\\n    -s MINIMUM_WEEKLY_DISCOUNT=20 \\\n    -s WEB_BROWSER=\"/usr/bin/chromium\" \\\n    -o madrid.xlsx\n```\n\n##### New York ranged date search\n\n```\nscrapy crawl airbnb \\\n    -a query=\"New York, NY\" \\\n    -a checkin=\"2023-01-22+7-0\" \\\n    -a checkout=\"2023-02-22+14-3\" \\\n    -a max_price=1800 \\\n    -s CANNOT_HAVE=\"guest suite\" \\\n    -s MUST_HAVE=\"(walking distance|short walk|no car needed|walk everywhere|metro close|public transport)\" \\\n    -o newyork.csv\n```\n\n## Ranged date queries\n\nIf you have flexible checkin / checkout dates, use the ranged search feature to search a range of checkin / checkout\ndates.\n\n### Search checkin date range +5 days -2 days\n\n    scrapy crawl airbnb \\\n        -a query=\"Minneapolis, MN\" \\\n        -a checkin=\"2023-10-15+5-2\" \\\n        -a checkout=\"2023-11-15\" \\\n        -o minneapolis.csv\n\nThis search would look for rentals in Minneapolis using Oct 15 2023 as base check-in date, and also searching for\nrentals available for check-in 2 days before, up to 5 days after. In other words, check-ins from Oct 13 to Oct 20. This\nis specified by the string `+5-2` appended to the checkin date `2023-10-15+5-2`. The string must always follow the\npattern`+[days_after]-[days_before]` unless `[days_after]` and `[days_before]` are equal, in which case you can\nuse `+-[days]`. The numbers may be any integer 0 or greater (large numbers untested).\n\n### Search checkin date +5 days -2 days, checkout date + or - 3 days\n\n    scrapy crawl airbnb \\\n        -a query=\"Florence, Italy\" \\\n        -a checkin=\"2023-10-15+5-2\" \\\n        -a checkout=\"2023-11-15+-3\" \\\n        -o firenze.csv\n\n## Scraping Description\n\nAfter running the crawl command, the scraper will start. It will first run the\nsearch query, then determine the quantity of result pages, and finally iterate\nthrough each of those, scraping each of the property listings on each page.\n\nScraped items (listings) will be passed to the default item pipeline, where,\noptionally, the `description`, `name`, and `reviews.description` fields will\nbe filtered using either or both of the `CANNOT_HAVE` and `MUST_HAVE` regexes.\nFiltered items will be dropped. Accepted items can be optionally opened in a\ngiven web browser, so that you can easily view your search results.\n\nFinally, the output can be saved to an xlsx format file for additional\nfiltering, sorting, and inspection.\n\n## Parameters\n\nYou can find the values for these by first doing a search manually on the\nAirbnb site.\n\n* `query`: City and State to search. **(required)**\n* `checkin`, `checkout`: Check-in and Check-out dates.\n* `min_price`, `max_price`: Minimum and maximum price for the period.\n  *The Airbnb search algorithm calculates this based upon search length.\n  It will be either the daily or monthly price, depending on the length\n  of the stay.*\n* `neighborhoods`: Comma-separated list of neighborhoods within the city\n  to filter for.\n* `output`: Name of output file. Only `xlsx` output is tested.\n\n## Settings\n\nThese settings can be edited in the `settings.py` file, or appended to the\ncommand line using the `-s` flag as in the example above.\n\n* `CANNOT_HAVE=\"\u003ccannot-have-regex\u003e\"`  \n  Don't accept listings that match the given regex pattern.\n  **(optional)**\n\n\n* `FIELDS_TO_EXPORT=\"['field1', 'field2', ...]\"`  \n  Can be found in settings.py. Contains a list of all possible fields to\n  export, i.e. all fields of `AirbnbScraperItem`. Comment items to\n  remove undesired fields from output. Applies only to `xlsx` output.\n\n\n* `MINIMUM_MONTHLY_DISCOUNT=30`  \n  Minimum monthly discount.\n  **(optional)**\n\n\n* `MINIMUM_WEEKLY_DISCOUNT=25`  \n  Minimum weekly discount.\n  **(optional)**\n\n\n* `MUST_HAVE=\"(\u003cmust-have-regex\u003e)\"`  \n  Only accept listings that match the given regex pattern.\n  **(optional)**\n\n\n* `ROOM_TYPES=\"['Camper/RV', 'Campsite', 'Entire guest suite']\"`  \n  Room Types to filter.\n  **(optional)**\n\n\n* `SKIP_LIST=\"['12345678', '12345679', '12345680']\"`  \n  Property IDs to filter.\n  **(optional)**\n\n\n* `WEB_BROWSER=\"/path/to/browser %s\"`  \n  Web browser executable command. **(optional)**\n\n  *Examples*:\n    - MacOS  \n      `WEB_BROWSER=\"open -a /Applications/Google\\ Chrome.app\"`\n\n    - Windows  \n      `WEB_BROWSER=\"C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\"`\n\n    - Linux  \n      `WEB_BROWSER=\"/usr/bin/google-chrome\"`\n\n## Elasticsearch\n\nEnable `deepbnb.pipelines.ElasticBnbPipeline` in `settings.py`\n\n## Credits\n\n- This project was originally inspired by [this excellent blog post](http://www.verginer.eu/blog/web-scraping-airbnb/)\n  by Luca Verginer.\n- In converting this to use the unofficial API, https://stevesie.com/apps/airbnb-api was very helpful.\n- [This analysis of Bali Airbnbs](https://github.com/daben/m2851-prac1) provided inspiration for more eloquent code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigital-engineering%2Fairbnb-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigital-engineering%2Fairbnb-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigital-engineering%2Fairbnb-scraper/lists"}