{"id":18255889,"url":"https://github.com/nothingnothings/zap-scraper","last_synced_at":"2026-05-05T11:37:18.294Z","repository":{"id":253991637,"uuid":"845155675","full_name":"nothingnothings/zap-scraper","owner":"nothingnothings","description":"Zap Imóveis Website Scraper Built with Python","archived":false,"fork":false,"pushed_at":"2024-10-02T20:16:46.000Z","size":458,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-08T22:23:59.393Z","etag":null,"topics":["beautifulsoup","beautifulsoup4","docker","python","scraper","selenium","sql","zap-imoveis"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nothingnothings.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-20T17:31:20.000Z","updated_at":"2025-03-08T20:30:17.000Z","dependencies_parsed_at":"2024-08-23T20:53:25.823Z","dependency_job_id":"3b29a710-ae50-42ee-b697-683fddff77a1","html_url":"https://github.com/nothingnothings/zap-scraper","commit_stats":null,"previous_names":["nothingnothings/zap-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nothingnothings/zap-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nothingnothings%2Fzap-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nothingnothings%2Fzap-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nothingnothings%2Fzap-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nothingnothings%2Fzap-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nothingnothings","download_url":"https://codeload.github.com/nothingnothings/zap-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nothingnothings%2Fzap-scraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264476832,"owners_count":23614579,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","beautifulsoup4","docker","python","scraper","selenium","sql","zap-imoveis"],"created_at":"2024-11-05T10:19:00.018Z","updated_at":"2026-05-05T11:37:18.286Z","avatar_url":"https://github.com/nothingnothings.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eZap Scraper - A Web Scraper Built with Python\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"scraper-logo.png\" alt=\"Zap-Scraper-logo\" width=\"120px\" height=\"120px\"/\u003e\n  \u003cbr\u003e\n  \u003ci\u003eThis script is an example of a Web Scraper built in\n    \u003cbr\u003ePython.\u003c/i\u003e\n  \u003cbr\u003e\n\u003c/p\u003e\n\n\n\n\n## Introduction\n\n[![en](https://img.shields.io/badge/lang-en-red.svg?style=flat-square)](https://github.com/nothingnothings/zap-scraper)\n[![pt-br](https://img.shields.io/badge/lang-pt--br-green.svg?style=flat-square)](https://github.com/nothingnothings/zap-scraper/blob/master/README.pt-br.md)\n\nThis Python script extracts and stores information about listings available on the Zap Imóveis website in a containerized SQL database.\n\n\nThe script uses Selenium for web scraping and BeautifulSoup for the parsing of the HTML.\n\n\nFor more information about how to use it, read the instructions below.\n\n\n\n\n## Project's Directory Structure\n\n\n```\n.\\\n│\n├── docker\\\n│   └── docker-compose.yml\n│\n├── .env\n├── .env.example\n├── .gitignore\n├── README.md\n├── output_format_example.json\n├── requirements.txt\n├── scraped_page_example.html\n├── scraper-logo.png\n├── test.py\n└── zap.py\n```\n\n## Requirements.txt\n\n```\npymysql\nrequests\nbeautifulsoup4\nselenium\npython-dotenv\n```\n\n## Installation/Usage \n\n1. Run `git clone` to clone the project into your local Git repository.\n\n2. Create a free account on [ZenRows](https://www.zenrows.com/) to obtain a proxy. After creating the account, copy the proxy URL found at `https://app.zenrows.com/builder`, just below the API key.\n\n3. Insert the proxy URL (e.g., `http://\u003cYOUR_API_KEY\u003e:@proxy.zenrows.com:8001`) received from ZenRows into the `.env` file at the root of the project. To do this, rename the `.env.example` file to `.env` and add the proxy URL in this format:\n```\nPROXY_URL=\u003cYOUR_PROXY_URL\u003e\n```\n4. Install the correct version of [geckodriver](https://github.com/mozilla/geckodriver/releases) for your operating system. Download and install the appropriate version (Linux, Mac, Windows) to ensure the script works correctly.\n\n5. The `docker-compose.yml` file contains a ready-to-use SQL database. To initialize it, with Docker installed and running, type the following commands:\n\n```\ncd docker\ndocker-compose up -d\n```\n\n6. Install the necessary dependencies listed in the `requirements.txt` file using `pip`:\n\n```\npip install -r requirements.txt\n```\n\n7. Before running the main script `zap.py`, it is recommended to run the test script `test.py`, which opens a Google page to ensure everything is working correctly:\n\n```\npython test.py\n```\n\n## Notes\n\n- In the root of the project, there is an HTML file called `scraped_page_example.html` that shows the format of the page affected by the script's scraping.\n- Also at the root of the project, there is the `output_format_example.json` file, which shows how each property from the page is inserted into the final SQL table `properties`.\n- The script has a Zap Imóveis and a Imovel Web version. Choose whichever suits your needs (the Imovel Web version has a cloudflare verification bypass, needed to access the site's contents).\n  \nExample:\n\n```\n# first page\nhttps://www.zapimoveis.com.br/venda/apartamentos/sp+sao-paulo/?__ab=sup-hl-pl:newC,exp-aa-test:control,super-high:new,olx:control,phone-page:control,off-no-hl:new,zapcopsmig:control\u0026transacao=venda\u0026onde=,S%C3%A3o%20Paulo,S%C3%A3o%20Paulo,,,,,city,BR%3ESao%20Paulo%3ENULL%3ESao%20Paulo,-23.555771,-46.639557,\u0026tipos=apartamento_residencial\u0026pagina=1\n\n# second page\nhttps://www.zapimoveis.com.br/venda/apartamentos/sp+sao-paulo/?__ab=sup-hl-pl:newC,exp-aa-test:control,super-high:new,olx:control,phone-page:control,off-no-hl:new,zapcopsmig:control\u0026transacao=venda\u0026onde=,S%C3%A3o%20Paulo,S%C3%A3o%20Paulo,,,,,city,BR%3ESao%20Paulo%3ENULL%3ESao%20Paulo,-23.555771,-46.639557,\u0026tipos=apartamento_residencial\u0026pagina=2\n\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnothingnothings%2Fzap-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnothingnothings%2Fzap-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnothingnothings%2Fzap-scraper/lists"}