https://github.com/shilongdai/apartment_scraper
Python webscraper for apartments.com
https://github.com/shilongdai/apartment_scraper
bs4 csv json python selenium webscraping
Last synced: 1 day ago
JSON representation
Python webscraper for apartments.com
- Host: GitHub
- URL: https://github.com/shilongdai/apartment_scraper
- Owner: shilongdai
- Created: 2022-05-19T04:20:08.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-03-21T01:30:44.000Z (over 2 years ago)
- Last Synced: 2025-09-01T23:39:36.458Z (about 1 month ago)
- Topics: bs4, csv, json, python, selenium, webscraping
- Language: Python
- Homepage:
- Size: 8.01 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apartment_Scraper
This is a python + selenium + bs4 based webscraper used to extract apartment data from apartments.com. The final output is a json file with the desired apartment informations, and a csv file with apartments in vector forms.
## Sample Output
The sample output from scanning one apartment can be found in the repository at sample_compile.json and sample_compile.csv
## Usage
First, edit the config.ini file to point the _DRIVER_ field to the path of the chromedriver. Then, update the _URL_TEMPLATE_ to be pointed to the search result page of apartments.com for a given area. The two placeholder %d in the templates are used to narrow down the price range of the search results so that the 28 page limitation is avoided.
After the config is done, execute the script:
```
python scrape.py
```It will output a urls.json file containing the url to all the individual apartment pages.
Then, create a pages directory in the current working directory and execute:
```
python download.py urls.json pages/
```This will download the html of all the apartment pages in the urls.json file.
After the download is complete, create an extract directory in the current working directory and execute:
```
python extract.py pages/ extract/
```This will extract all the relevant strings from the html page and write the results to the extract directory in json format.
Finally, after the extraction is complete, run:
```
python compile.py extract/
```to format the data and output a final .json and .csv file.