https://github.com/reisdev/reads

Real Estate Agency Data Scraper
https://github.com/reisdev/reads

crawler python scraping scrapy selenium-python selenium-webdriver spider

Last synced: 5 months ago
JSON representation

Real Estate Agency Data Scraper

Host: GitHub
URL: https://github.com/reisdev/reads
Owner: reisdev
License: mit
Created: 2018-05-15T18:02:59.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2023-10-30T19:01:33.000Z (over 2 years ago)
Last Synced: 2024-05-01T16:03:46.725Z (about 2 years ago)
Topics: crawler, python, scraping, scrapy, selenium-python, selenium-webdriver, spider
Language: Python
Homepage:
Size: 92.8 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # R.E.A.D.S - Real Estate Agency Data Scraper

Project built to crawl Real Estate Agency websites. It can get the price, location and anything else.

Built using the tool [Scrapy](https://scrapy.org/), a [Python](https://python.org) framework to

extract data from web pages.

This project actually have spiders for the following websites:

| Country | Agency |

|-|-|

| Brazil | [Stória Imóveis](https://www.storiaimoveis.com.br/) |

| Brazil | [ImovelWeb](http://www.imovelweb.com.br/)|

| Brazil | [ZapImóveis](http://zapimoveis.com.br/) |

| Brazil | [VivaReal](https://www.vivareal.com.br/) |

## Dependencies

### Major

|Package|Version|

| - | - |

| [Python](https://python.org) | v3.6.5 |

### Python

| Package | Version |

| - | - |

| [Selenium](http://selenium-python.readthedocs.io/) | v3.12.0 |

### Extra

|Package|Version|

|-|-|

| [GeckoDriver](https://github.com/mozilla/geckodriver/releases)¹| v0.20.1 |

**¹ :** Geckodriver also can be installed using the command `npm install -g geckodriver`

## How to

### Clone the repository

To clone the repository, run in the command line:

```bash

$ git clone http://github.com.br/MatheusDosReis/real-estate-agency-scraper

$ cd real-state-agency-scraper

```

### Install python dependencies

Run the command bellow:

```bash

$ pip install -r requirements.txt

```

### Create the result's folder

Run the command:

```bash 

$ mkdir results

```

## Usage

### Spiders available

List of names of the available spiders:

* storia

* imovelweb

* zapimoveis

* vivareal

#### Run a spider

To crawl a specific spider:

```bash

scrapy crawl 

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/reisdev/reads

Awesome Lists containing this project

README