https://github.com/pawod/gis-berlin-rents

A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
https://github.com/pawod/gis-berlin-rents

apartment-rents berlin crawler gis immobilienscout24

Last synced: about 1 month ago
JSON representation

A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.

Host: GitHub
URL: https://github.com/pawod/gis-berlin-rents
Owner: pawod
License: mit
Created: 2017-06-22T10:50:31.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2022-12-07T23:59:10.000Z (over 2 years ago)
Last Synced: 2024-11-04T08:35:37.736Z (6 months ago)
Topics: apartment-rents, berlin, crawler, gis, immobilienscout24
Language: Python
Homepage:
Size: 343 KB
Stars: 7
Watchers: 2
Forks: 1
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-starred - pawod/gis-berlin-rents - A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin. (gis)

README

# gis-berlin-rents

## About

Apartments with incomplete or ambiguous addresses are omitted. Apartments with missing features or having price ranges instead of a fixed price are also omitted.

Crawled results are always appended to the file. Based on the apartments' addresses, the coordinates are extracted in a separate step from the Google Maps API. Random delays in between 2 and 10 seconds are added in between each API call to prevent getting blocked.

## Requirements

- Python 3.5 or higher

## Setup your Python Environment

Make sure to have all required packages installed. You can install them via following command:

pip install requirements.txt

The `requirements.txt` is located at the project's root.

## Configuration

The `settings.py` allows you to adjust following settings:

- USER_AGENT: The user agent string to be used for the web crawler.
- DOWNLOAD_DELAY = The delay in seconds in between crawling each page
- PAGE_START = The page number to start crawling at
- PAGE_END = The number of the last page to be crawled

The remaining variables should not be changed unless the crawler needs to be adapted to reflect changes of the website to be crawled.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pawod/gis-berlin-rents

Awesome Lists containing this project

README