Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nothingnothings/zap-scraper
Zap Imóveis Website Scraper Built with Python
https://github.com/nothingnothings/zap-scraper
beautifulsoup beautifulsoup4 docker python scraper selenium sql zap-imoveis
Last synced: about 1 month ago
JSON representation
Zap Imóveis Website Scraper Built with Python
- Host: GitHub
- URL: https://github.com/nothingnothings/zap-scraper
- Owner: nothingnothings
- Created: 2024-08-20T17:31:20.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2024-10-02T20:16:46.000Z (4 months ago)
- Last Synced: 2024-11-05T10:20:07.456Z (3 months ago)
- Topics: beautifulsoup, beautifulsoup4, docker, python, scraper, selenium, sql, zap-imoveis
- Language: HTML
- Homepage:
- Size: 447 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Zap Scraper - A Web Scraper Built with Python
This script is an example of a Web Scraper built in
Python.
## Introduction
[![en](https://img.shields.io/badge/lang-en-red.svg?style=flat-square)](https://github.com/nothingnothings/zap-scraper)
[![pt-br](https://img.shields.io/badge/lang-pt--br-green.svg?style=flat-square)](https://github.com/nothingnothings/zap-scraper/blob/master/README.pt-br.md)This Python script extracts and stores information about listings available on the Zap Imóveis website in a containerized SQL database.
The script uses Selenium for web scraping and BeautifulSoup for the parsing of the HTML.
For more information about how to use it, read the instructions below.
## Project's Directory Structure
```
.\
│
├── docker\
│ └── docker-compose.yml
│
├── .env
├── .env.example
├── .gitignore
├── README.md
├── output_format_example.json
├── requirements.txt
├── scraped_page_example.html
├── scraper-logo.png
├── test.py
└── zap.py
```## Requirements.txt
```
pymysql
requests
beautifulsoup4
selenium
python-dotenv
```## Installation/Usage
1. Run `git clone` to clone the project into your local Git repository.
2. Create a free account on [ZenRows](https://www.zenrows.com/) to obtain a proxy. After creating the account, copy the proxy URL found at `https://app.zenrows.com/builder`, just below the API key.
3. Insert the proxy URL (e.g., `http://:@proxy.zenrows.com:8001`) received from ZenRows into the `.env` file at the root of the project. To do this, rename the `.env.example` file to `.env` and add the proxy URL in this format:
```
PROXY_URL=
```
4. Install the correct version of [geckodriver](https://github.com/mozilla/geckodriver/releases) for your operating system. Download and install the appropriate version (Linux, Mac, Windows) to ensure the script works correctly.5. The `docker-compose.yml` file contains a ready-to-use SQL database. To initialize it, with Docker installed and running, type the following commands:
```
cd docker
docker-compose up -d
```6. Install the necessary dependencies listed in the `requirements.txt` file using `pip`:
```
pip install -r requirements.txt
```7. Before running the main script `zap.py`, it is recommended to run the test script `test.py`, which opens a Google page to ensure everything is working correctly:
```
python test.py
```## Notes
- In the root of the project, there is an HTML file called `scraped_page_example.html` that shows the format of the page affected by the script's scraping.
- Also at the root of the project, there is the `output_format_example.json` file, which shows how each property from the page is inserted into the final SQL table `properties`.
- The script examines only one page at a time to avoid bot detection by the website. If you want to scan more than one page, change the value of the `url` variable in the `searchZapImoveis` function to the desired page.
Example:```
# first page
https://www.zapimoveis.com.br/venda/apartamentos/sp+sao-paulo/?__ab=sup-hl-pl:newC,exp-aa-test:control,super-high:new,olx:control,phone-page:control,off-no-hl:new,zapcopsmig:control&transacao=venda&onde=,S%C3%A3o%20Paulo,S%C3%A3o%20Paulo,,,,,city,BR%3ESao%20Paulo%3ENULL%3ESao%20Paulo,-23.555771,-46.639557,&tipos=apartamento_residencial&pagina=1# second page
https://www.zapimoveis.com.br/venda/apartamentos/sp+sao-paulo/?__ab=sup-hl-pl:newC,exp-aa-test:control,super-high:new,olx:control,phone-page:control,off-no-hl:new,zapcopsmig:control&transacao=venda&onde=,S%C3%A3o%20Paulo,S%C3%A3o%20Paulo,,,,,city,BR%3ESao%20Paulo%3ENULL%3ESao%20Paulo,-23.555771,-46.639557,&tipos=apartamento_residencial&pagina=2```