Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pawod/gis-berlin-rents
A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
https://github.com/pawod/gis-berlin-rents
apartment-rents berlin crawler gis immobilienscout24
Last synced: 2 months ago
JSON representation
A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
- Host: GitHub
- URL: https://github.com/pawod/gis-berlin-rents
- Owner: pawod
- License: mit
- Created: 2017-06-22T10:50:31.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-12-07T23:59:10.000Z (about 2 years ago)
- Last Synced: 2024-08-02T12:47:24.398Z (5 months ago)
- Topics: apartment-rents, berlin, crawler, gis, immobilienscout24
- Language: Python
- Homepage:
- Size: 343 KB
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-starred - pawod/gis-berlin-rents - A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin. (gis)
README
# gis-berlin-rents
## About
This tool crawls information about rental apartments from [immobilienscout24.de](https://www.immobilienscout24.de/Suche/S-T/P-1/Wohnung-Miete/Berlin/Berlin). More specifically in the area of Berlin. The crawled results are stored in a CSV file at the `./out` dir. Following format is used:
price in EUR | flat size in m^2 | number of rooms | address | WGS84 latitude | WGS84 longitude | UTM zone | UTM latitude band | UTM easting coord | UTM northing coord | date of crawlingApartments with incomplete or ambiguous addresses are omitted. Apartments with missing features or having price ranges instead of a fixed price are also omitted.
Crawled results are always appended to the file. Based on the apartments' addresses, the coordinates are extracted in a separate step from the Google Maps API. Random delays in between 2 and 10 seconds are added in between each API call to prevent getting blocked.
## Requirements
- Python 3.5 or higher
## Setup your Python Environment
Make sure to have all required packages installed. You can install them via following command:
pip install requirements.txt
The `requirements.txt` is located at the project's root.
## Configuration
The `settings.py` allows you to adjust following settings:- USER_AGENT: The user agent string to be used for the web crawler.
- DOWNLOAD_DELAY = The delay in seconds in between crawling each page
- PAGE_START = The page number to start crawling at
- PAGE_END = The number of the last page to be crawledThe remaining variables should not be changed unless the crawler needs to be adapted to reflect changes of the website to be crawled.