https://github.com/reisdev/reads
Real Estate Agency Data Scraper
https://github.com/reisdev/reads
crawler python scraping scrapy selenium-python selenium-webdriver spider
Last synced: 5 months ago
JSON representation
Real Estate Agency Data Scraper
- Host: GitHub
- URL: https://github.com/reisdev/reads
- Owner: reisdev
- License: mit
- Created: 2018-05-15T18:02:59.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2023-10-30T19:01:33.000Z (over 2 years ago)
- Last Synced: 2024-05-01T16:03:46.725Z (about 2 years ago)
- Topics: crawler, python, scraping, scrapy, selenium-python, selenium-webdriver, spider
- Language: Python
- Homepage:
- Size: 92.8 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# R.E.A.D.S - Real Estate Agency Data Scraper
Project built to crawl Real Estate Agency websites. It can get the price, location and anything else.
Built using the tool [Scrapy](https://scrapy.org/), a [Python](https://python.org) framework to
extract data from web pages.
This project actually have spiders for the following websites:
| Country | Agency |
|-|-|
| Brazil | [Stória Imóveis](https://www.storiaimoveis.com.br/) |
| Brazil | [ImovelWeb](http://www.imovelweb.com.br/)|
| Brazil | [ZapImóveis](http://zapimoveis.com.br/) |
| Brazil | [VivaReal](https://www.vivareal.com.br/) |
## Dependencies
### Major
|Package|Version|
| - | - |
| [Python](https://python.org) | v3.6.5 |
### Python
| Package | Version |
| - | - |
| [Selenium](http://selenium-python.readthedocs.io/) | v3.12.0 |
### Extra
|Package|Version|
|-|-|
| [GeckoDriver](https://github.com/mozilla/geckodriver/releases)¹| v0.20.1 |
**¹ :** Geckodriver also can be installed using the command `npm install -g geckodriver`
## How to
### Clone the repository
To clone the repository, run in the command line:
```bash
$ git clone http://github.com.br/MatheusDosReis/real-estate-agency-scraper
$ cd real-state-agency-scraper
```
### Install python dependencies
Run the command bellow:
```bash
$ pip install -r requirements.txt
```
### Create the result's folder
Run the command:
```bash
$ mkdir results
```
## Usage
### Spiders available
List of names of the available spiders:
* storia
* imovelweb
* zapimoveis
* vivareal
#### Run a spider
To crawl a specific spider:
```bash
scrapy crawl
```