https://github.com/scrapfly/scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
https://github.com/scrapfly/scrapfly-scrapers
antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping
Last synced: 11 days ago
JSON representation
Scalable Python web scraping scripts for +40 popular domains
- Host: GitHub
- URL: https://github.com/scrapfly/scrapfly-scrapers
- Owner: scrapfly
- License: other
- Created: 2023-05-22T09:11:49.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-27T10:28:00.000Z (27 days ago)
- Last Synced: 2025-04-11T23:16:40.646Z (11 days ago)
- Topics: antibot, automation, captcha-bypass, crawler, crawling, crawling-python, datascraping, proxies, python, python-scraper, scraper, scraping, scraping-python, spider, twitter-scraper, web-crawler, web-scraping, web-scraping-python, webscraper, webscraping
- Language: Python
- Homepage: https://scrapfly.io
- Size: 4.69 MB
- Stars: 471
- Watchers: 12
- Forks: 119
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ScrapFly Scrapers 🕷️
This repository contains educational example scrapers for popular web scraping targets using the [ScrapFly](https://scrapfly.io) web scraping API and Python.
Most Scrapers use a simple web scraping stack:
- Python version 3.10+
- [Scrapfly's Python SDK](https://github.com/scrapfly/python-scrapfly) for sending HTTP request, bypass blocking and parsing the HTML using the built-in [parsel](https://pypi.org/project/parsel/) selector.
- [asyncio](https://pypi.org/project/asyncio/) for writing concurrent code using the async/await syntax.
- [JMESPath](https://pypi.org/project/jmespath/) and [nested-lookup](https://pypi.org/project/nested-lookup/) for JSON parsing when needed.
- [loguru](https://pypi.org/project/loguru/) for logging.To learn more about web scraping see our full tutorials on how to scrape these targets (and many others) see the [scrapeguide directory](https://scrapfly.io/blog/tag/scrapeguide/).
## List of Scrapers
Below is the list of available web scrapers for the supported domains along with their scrape guide, sample datasets, and status. 👇
Domain
Guide
Sample Datasets
Status
Amazon.com
How to Scrape Amazon.com Product Data and Reviews
BestBuy.com
How to Scrape BestBuy Product, Offer and Review Data
Bing.com
How to Scrape Bing Search with Python
Booking.com
How to Scrape Booking.com (2023 Update)
Crunchbase.com
How to Scrape Crunchbase in 2024
Domain.com.au
How to Scrape Domain.com.au Real Estate Property Data
Ebay.com
How to Scrape Ebay Using Python (2024 Update)
Etsy.com
How to Scrape Etsy.com Product, Shop and Search Data
Fashionphile.com
How to Scrape Fashionphile for Second Hand Fashion Data
Glassdoor.com
How to Scrape Glassdoor (2024 update)
Goat.com
How to Scrape Goat.com for Fashion Apparel Data in Python
Google.com
How to Scrape Google Search Results - How to Scrape Google Maps
Homegate.ch
How to Scrape Homegate.ch Real Estate Property Data
Idealista.com
How to Scrape Idealista.com in Python - Real Estate Property Data
Immobilienscout24.de
How to Scrape Immobilienscout24.de Real Estate Data
Immoscout24.ch
How to Scrape Immoscout24.ch Real Estate Property Data
Immowelt.de
How to Scrape Immowelt.de Real Estate Data
Indeed.com
How to Scrape Indeed.com (2024 Update)
Instagram.com
How to Scrape Instagram in 2024
Leboncoin.fr
How to Web Scrape Leboncoin.fr using Python
Nordstorm.com
How to Scrape Nordstrom Fashion Product Data
Realestate.com.au
How to Scrape Realestate.com.au Property Listing Data
Realtor.com
How to Scrape Realtor.com - Real Estate Property Data
Reddit.com
How to Scrape Reddit Posts, Subreddits and Profiles
Redfin.com
How to Scrape Redfin Real Estate Property Data in Python
Rightmove.com
How to Scrape RightMove Real Estate Property Data with Python
Seloger.com
How to Scrape Seloger.com - Real Estate Listing Data
Similarweb.com
How to Scrape SimilarWeb Website Traffic Analytics
Stockx.com
How to Scrape StockX e-commerce Data with Python
Threads.net
How to scrape Threads by Meta using Python (2024 Update)
TikTok.com
How To Scrape TikTok in 2024
Tripadvisor.com
How to Scrape TripAdvisor.com (2024 Updated)
Trustpilot.com
How to Scrape Trustpilot.com Reviews and Company Data
Twitter(X).com
How to Scrape X.com (Twitter) using Python (2024 Update)
VestiaireCollective.com
How to Scrape Vestiaire Collective for Fashion Product Data
G2.com
How to Scrape G2 Company Data and Reviews
Walmart.com
How to Scrape Walmart.com Product Data (2024 Update)
Wellfound.com
How to Scrape Wellfound Company Data and Job Listings
Linkedin.com
How to Scrape LinkedIn in 2024
Yellowpages.com
How to Scrape YellowPages.com Business Data and Reviews (2024 Update)
Yelp.com
How to Web Scrape Yelp.com (2024 update)
Zillow.com
How to Scrape Zillow Real Estate Property Data in Python
Zoominfo.com
How to Scrape Zoominfo Company Data (2024 Update)
Zoopla.co.uk
How to Scrape Zoopla Real Estate Property Data in Python
## Fair Use and Legal Disclaimer
This repository contains _educational_ reference material to illustrate how accessible web scraping can be and the provided programs are not intented to be used in web scraping production.
That being said, Scrapfly team is constantly updating and improving all of this code for optimal experience.
Scrapfly does not offer legal advice and as always, consult a lawyer when creating programs that interact with other people's websites though here's a good general intro of what NOT to do:
- Do not store PII (personally identifiable information) of EU citizens who are protected by GDPR.
- Do not scrape and repurpose entire public datasets which can be protected by database protection laws in some countries.
- Do not scrape at rates that could damage the website and scrape only publicly available data.