Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/luminati-io/awesome-web-scraping
A list of libraries, tools, and APIs for web scraping and data processing. Find everything you need for extracting, managing, and processing data from the web, from HTTP libraries to browser automation tools and proxy services.
https://github.com/luminati-io/awesome-web-scraping
List: awesome-web-scraping
crawling data data-collection frameworks golang guides http-requests java javascript perl php proxies python r ruby rust scraping scraping-tool web-scraping webscraping
Last synced: 1 day ago
JSON representation
A list of libraries, tools, and APIs for web scraping and data processing. Find everything you need for extracting, managing, and processing data from the web, from HTTP libraries to browser automation tools and proxy services.
- Host: GitHub
- URL: https://github.com/luminati-io/awesome-web-scraping
- Owner: luminati-io
- Created: 2024-11-06T12:59:18.000Z (2 days ago)
- Default Branch: main
- Last Pushed: 2024-11-06T14:54:33.000Z (2 days ago)
- Last Synced: 2024-11-06T15:47:21.719Z (2 days ago)
- Topics: crawling, data, data-collection, frameworks, golang, guides, http-requests, java, javascript, perl, php, proxies, python, r, ruby, rust, scraping, scraping-tool, web-scraping, webscraping
- Homepage: https://brightdata.com
- Size: 57.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Awesome Web Scraping by Bright Data
[![Promo](https://github.com/luminati-io/Amazon-scraper/blob/main/images/Proxies%20and%20scrapers%20GitHub%20bonus%20banner.png)](https://brightdata.com/?promo=github15)***Limited time promotion: Bright Data is matching your first deposit, up to $500!***
The Awesome Web Scraping by Bright Data is a collection of resources, tools, and guides for efficient web scraping. It includes libraries, proxy integration, CAPTCHA solutions, automation tips, and free dataset samples across multiple programming languages, helping you tackle web scraping challenges with ease.
## Topics
* [Python](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/python.md) - A collection of Python libraries, tools, and frameworks for web scraping, data parsing, export, and processing, with support for anti-bot bypass, proxy integration, and automation.
* [PHP](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/php.md) - A collection of PHP libraries, frameworks, and tools for web scraping, data parsing, export, and automation, featuring solutions for proxy integration, CAPTCHA solving, and task scheduling.
* [Ruby](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/ruby.md) - A collection of Ruby resources for web scraping, data parsing, and automation, covering libraries for HTTP clients, parsers, proxy integration, CAPTCHA solving, and task scheduling.
* [JavaScript](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/javascript.md) - A collection of JavaScript resources for web scraping, data parsing, and automation, featuring libraries for HTTP clients, parsers, proxy integration, CAPTCHA solving, user-agent spoofing, and task scheduling.
* [Go](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/go.md) - A collection of Go tools and libraries for web scraping, parsing, and data automation, including HTTP clients, proxy integration, CAPTCHA solving, serialization, and task scheduling.
* [R](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/r.md) - A collection of R libraries and tools for web scraping, data parsing, automation, and export, with support for HTTP clients, proxy integration, CAPTCHA solving, and user-agent spoofing.
* [Rust](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/rust.md) - A collection of Rust tools and libraries for web scraping, parsing, and data automation, including HTTP clients, proxy integration, CAPTCHA handling, and browser automation.
* [Perl](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/perl.md) - A collecton of Perl tools and libraries for web scraping, data parsing, and automation, with tools for HTTP clients, proxy integration, CAPTCHA solving, and data export.
* [Java](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/java.md) - A collection of Java tools and libraries for web scraping, parsing, and automation, including HTTP clients, proxy integration, CAPTCHA solving, data processing, and scheduling.
* [Web Scraping Guides, Tips, and Tricks](https://github.com/luminati-io/Awesome-Web-Scraping/blob/main/guides.md) - A comprehensive document of web scraping guides, tips, and tricks for efficiently navigating web scraping challenges, handling anti-bot measures, optimizing proxy use, and much more.## Recommended CAPTCHA Solving Services
* [Bright Data's CAPTCHA Solver](https://brightdata.com/products/web-unlocker/captcha-solver)
* [2Captcha](https://2captcha.com/)## Recommended Proxy Types
* [Residential Proxies](https://brightdata.com/proxy-types/residential-proxies) - The perfect solution for large-scale and complicated projects that require real user IPs.
* [Datacenter Proxies](https://brightdata.com/proxy-types/datacenter-proxies) - A cost-effective and high speed solution, suitable for large-scale scraping on less strict websites.## Free Dataset Samples
Skip scraping completely and get the data you need. Download 1000+ records for free!
* [Amazon data](https://github.com/luminati-io/Amazon-dataset-samples)
* [Facebook data](https://github.com/luminati-io/Facebook-dataset-samples)
* [Zillow data](https://github.com/luminati-io/Zillow-dataset-samples)
* [LinkedIn data](https://github.com/luminati-io/LinkedIn-dataset-samples)
* [Crunchbase data](https://github.com/luminati-io/Crunchbase-dataset-samples)
* [Glassdoor data](https://github.com/luminati-io/Glassdoor-dataset-samples)
* [Target data](https://github.com/luminati-io/Target-dataset-samples)
* [Indeed data](https://github.com/luminati-io/Indeed-dataset-samples)
* [Walmart data](https://github.com/luminati-io/Walmart-dataset-samples)
* [Airbnb data](https://github.com/luminati-io/Airbnb-dataset-samples)
* [Shopee data](https://github.com/luminati-io/Shopee-dataset-samples)
* [Shein data](https://github.com/luminati-io/Shein-dataset-samples)
* [TikTok data](https://github.com/luminati-io/TikTok-dataset-samples)
* [Google Maps data](https://github.com/luminati-io/Google-Maps-dataset-samples)
* [Twitter data](https://github.com/luminati-io/Twitter-X-dataset-samples)
* [B2B data](https://github.com/luminati-io/B2B-business-dataset-samples)
* [ZoomInfo data](https://github.com/luminati-io/ZoomInfo-dataset-samples)
* [Pinterest data](https://github.com/luminati-io/Pinterest-dataset-samples)## Popular Web Scraping Videos (Bright Data's Collaborations)
* [I turned Tinder into a pet adoption app](https://www.youtube.com/embed/_DAb1XDsaHM)
* [3 million dollar project ideas for developers](https://www.youtube.com/embed/outB8eBDzD4)
* [Overcoming web scraping challenges, price alert monitoring with Puppeteer, NodeJS, and Hono](https://www.youtube.com/embed/TmOumwzswyU)
* [What's the best Python web scraping library?](https://www.youtube.com/embed/CwUADWr5nAI)
* [How to create custom datasets to train LLMs using Bright Data](https://www.youtube.com/embed/oTI41JHkCoc)
* [Real estate end to end data engineering using AI](https://www.youtube.com/embed/Qx6BAVqnMrs)
* [eCommerce web scraping tutorial (Puppeteer, Cheerio, and Node.js](https://www.youtube.com/embed/BGzK0xd-F5A)
* [How to scrape any website (ft. scraping browser)](https://www.youtube.com/embed/tcFz6NY3zpc)
* [Easiest way to web scraping using Playwright](https://www.youtube.com/embed/VH3gj1J_Ba8)
* [The ultimate guide to Python & ChatGPT data analysis](https://www.youtube.com/embed/eISqvRLfzTg)
* [Build a fullstack SEO rank tracker app with Next.js and Bright Data](https://www.youtube.com/embed/3oy8Mqc8zec)**For more web scraping videos, visit our [Web Data Masterclass](https://brightdata.com/web-data-masterclass)**