An open API service indexing awesome lists of open source software.

https://github.com/scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains
https://github.com/scrapfly/scrapfly-scrapers

antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping

Last synced: 11 days ago
JSON representation

Scalable Python web scraping scripts for +40 popular domains

Awesome Lists containing this project

README

        

# ScrapFly Scrapers 🕷️

This repository contains educational example scrapers for popular web scraping targets using the [ScrapFly](https://scrapfly.io) web scraping API and Python.
Most Scrapers use a simple web scraping stack:
- Python version 3.10+
- [Scrapfly's Python SDK](https://github.com/scrapfly/python-scrapfly) for sending HTTP request, bypass blocking and parsing the HTML using the built-in [parsel](https://pypi.org/project/parsel/) selector.
- [asyncio](https://pypi.org/project/asyncio/) for writing concurrent code using the async/await syntax.
- [JMESPath](https://pypi.org/project/jmespath/) and [nested-lookup](https://pypi.org/project/nested-lookup/) for JSON parsing when needed.
- [loguru](https://pypi.org/project/loguru/) for logging.

To learn more about web scraping see our full tutorials on how to scrape these targets (and many others) see the [scrapeguide directory](https://scrapfly.io/blog/tag/scrapeguide/).

## List of Scrapers
Below is the list of available web scrapers for the supported domains along with their scrape guide, sample datasets, and status. 👇


Domain
Guide
Sample Datasets
Status


Aliexpress.com
How to Scrape Aliexpress.com (2024 Update)



aliexpress-scraper-status


Amazon.com
How to Scrape Amazon.com Product Data and Reviews



amazon-scraper-status


BestBuy.com
How to Scrape BestBuy Product, Offer and Review Data



bestbuy-scraper-status


Bing.com
How to Scrape Bing Search with Python



bing-scraper-status


Booking.com
How to Scrape Booking.com (2023 Update)



booking-scraper-status

Crunchbase.com
How to Scrape Crunchbase in 2024



crunchbase-scraper-status

Domain.com.au
How to Scrape Domain.com.au Real Estate Property Data



domaincom-scraper-status

Ebay.com
How to Scrape Ebay Using Python (2024 Update)



ebay-scraper-status

Etsy.com
How to Scrape Etsy.com Product, Shop and Search Data



etsy-scraper-status

Fashionphile.com
How to Scrape Fashionphile for Second Hand Fashion Data



fashionphile-scraper-status

Glassdoor.com
How to Scrape Glassdoor (2024 update)




glassdoor-scraper-status

Goat.com
How to Scrape Goat.com for Fashion Apparel Data in Python



goat-scraper-status

Google.com
How to Scrape Google Search Results - How to Scrape Google Maps



goat-scraper-status

Homegate.ch
How to Scrape Homegate.ch Real Estate Property Data



homegate-scraper-status

Idealista.com
How to Scrape Idealista.com in Python - Real Estate Property Data



idealista-scraper-status

Immobilienscout24.de
How to Scrape Immobilienscout24.de Real Estate Data



immobilienscout24-scraper-status

Immoscout24.ch
How to Scrape Immoscout24.ch Real Estate Property Data



immoscout24-scraper-status

Immowelt.de
How to Scrape Immowelt.de Real Estate Data



immowelt-scraper-status

Indeed.com
How to Scrape Indeed.com (2024 Update)




indeed-scraper-status

Instagram.com
How to Scrape Instagram in 2024




instagram-scraper-status

Leboncoin.fr
How to Web Scrape Leboncoin.fr using Python



leboncoin-scraper-status

Nordstorm.com
How to Scrape Nordstrom Fashion Product Data



nordstorm-scraper-status

Realestate.com.au
How to Scrape Realestate.com.au Property Listing Data



realestate-scraper-status

Realtor.com
How to Scrape Realtor.com - Real Estate Property Data



realtor-scraper-status

Reddit.com
How to Scrape Reddit Posts, Subreddits and Profiles




Reddit-scraper-status

Redfin.com
How to Scrape Redfin Real Estate Property Data in Python




redfin-scraper-status

Rightmove.com
How to Scrape RightMove Real Estate Property Data with Python



Rightmove-scraper-status

Seloger.com
How to Scrape Seloger.com - Real Estate Listing Data



Seloger-scraper-status

Similarweb.com
How to Scrape SimilarWeb Website Traffic Analytics



Similarweb-scraper-status

Stockx.com
How to Scrape StockX e-commerce Data with Python



Stockx-scraper-status

Threads.net
How to scrape Threads by Meta using Python (2024 Update)




Threads-scraper-status

TikTok.com
How To Scrape TikTok in 2024



TikTok-scraper-status

Tripadvisor.com
How to Scrape TripAdvisor.com (2024 Updated)




Tripadvisor-scraper-status

Trustpilot.com
How to Scrape Trustpilot.com Reviews and Company Data



Trustpilot-scraper-status

Twitter(X).com
How to Scrape X.com (Twitter) using Python (2024 Update)




Twitter-scraper-status

VestiaireCollective.com
How to Scrape Vestiaire Collective for Fashion Product Data



Vestiaire-Collective-scraper-status

G2.com
How to Scrape G2 Company Data and Reviews



G2-scraper-status

Walmart.com
How to Scrape Walmart.com Product Data (2024 Update)



Walmart-scraper-status

Wellfound.com
How to Scrape Wellfound Company Data and Job Listings



Wellfound-scraper-status

Linkedin.com
How to Scrape LinkedIn in 2024




LinkedIn-scraper-status

Yellowpages.com
How to Scrape YellowPages.com Business Data and Reviews (2024 Update)




Yellowpages-scraper-status

Yelp.com
How to Web Scrape Yelp.com (2024 update)




Yelp-scraper-status

YouTube.com




YouTube-scraper-status

Zillow.com
How to Scrape Zillow Real Estate Property Data in Python



Zillow-scraper-status

Zoominfo.com
How to Scrape Zoominfo Company Data (2024 Update)




Zoominfo-scraper-status

Zoopla.co.uk
How to Scrape Zoopla Real Estate Property Data in Python



Zoopla-scraper-status

## Fair Use and Legal Disclaimer

This repository contains _educational_ reference material to illustrate how accessible web scraping can be and the provided programs are not intented to be used in web scraping production.
That being said, Scrapfly team is constantly updating and improving all of this code for optimal experience.

Scrapfly does not offer legal advice and as always, consult a lawyer when creating programs that interact with other people's websites though here's a good general intro of what NOT to do:
- Do not store PII (personally identifiable information) of EU citizens who are protected by GDPR.
- Do not scrape and repurpose entire public datasets which can be protected by database protection laws in some countries.
- Do not scrape at rates that could damage the website and scrape only publicly available data.