https://github.com/lorien/awesome-web-scraping
  
  
    List of libraries, tools and APIs for web scraping and data processing. 
    https://github.com/lorien/awesome-web-scraping
  
List: awesome-web-scraping
captcha-bypass captcha-recaptcha crawler crawling crawling-framework crawling-python crawling-tool scraping scraping-framework scraping-python scraping-tool spider web-scraping webscraping
        Last synced: 6 months ago 
        JSON representation
    
List of libraries, tools and APIs for web scraping and data processing.
- Host: GitHub
 - URL: https://github.com/lorien/awesome-web-scraping
 - Owner: lorien
 - License: other
 - Created: 2015-08-12T19:55:27.000Z (about 10 years ago)
 - Default Branch: master
 - Last Pushed: 2024-12-27T09:17:08.000Z (10 months ago)
 - Last Synced: 2025-05-01T11:05:17.411Z (6 months ago)
 - Topics: captcha-bypass, captcha-recaptcha, crawler, crawling, crawling-framework, crawling-python, crawling-tool, scraping, scraping-framework, scraping-python, scraping-tool, spider, web-scraping, webscraping
 - Language: Makefile
 - Homepage:
 - Size: 473 KB
 - Stars: 6,982
 - Watchers: 232
 - Forks: 808
 - Open Issues: 0
 - 
            Metadata Files:
            
- Readme: README.md
 - Contributing: CONTRIBUTING.md
 - License: LICENSE
 
 
Awesome Lists containing this project
- my-awesome-starred - awesome-web-scraping - List of libraries, tools and APIs for web scraping and data processing. (Makefile)
 - awesome - lorien/awesome-web-scraping - List of libraries, tools and APIs for web scraping and data processing. (Makefile)
 - awesome-awesome - awesome-web-scraping
 - awesome-open-source-marketing - lorien/awesome-web-scraping
 - fucking-lists - awesome-web-scraping
 - awesomelist - awesome-web-scraping
 - AwesomeGenomics - scrap the web
 - awesome-rainmana - lorien/awesome-web-scraping - List of libraries, tools and APIs for web scraping and data processing. (Makefile)
 - awesome-golang-repositories - awesome-web-scraping
 - collection - awesome-web-scraping
 - lists - awesome-web-scraping
 - awesome-browser-automation - Awesome Web Scraping - Comprehensive list of tools, programming libraries and web services used in web scraping. (Resources / Related tools)
 - jimsghstars - lorien/awesome-web-scraping - List of libraries, tools and APIs for web scraping and data processing. (Makefile)
 - awesome-security-collection - **3198**星
 - StarryDivineSky - lorien/awesome-web-scraping - web-scraping是一个专注于网络爬虫和数据处理的资源集合项目,为开发者提供了从基础工具到高级技术的完整解决方案。该项目通过分类整理的方式,汇总了大量适用于网页抓取、数据清洗、反爬虫策略和API接口的工具库及实用技术,帮助开发者高效完成数据采集与处理任务。项目特色在于其模块化设计,涵盖四大核心领域:网络爬虫库(如Python的Requests、Scrapy和Node.js的Puppeteer)、数据解析工具(如BeautifulSoup、Cheerio和JSONPath)、反爬虫解决方案(如代理IP池、Headless浏览器和浏览器指纹伪装)以及数据处理API(如CSV/Excel操作、数据库交互和机器学习数据预处理)。每个工具均附有简要说明,包含其适用场景、核心功能及使用示例,例如Scrapy支持异步爬取和分布式部署,Puppeteer可模拟浏览器操作绕过JS渲染限制。项目还特别强调了合规性,建议开发者遵守目标网站的robots.txt协议,并提供数据脱敏和加密等安全处理方法。通过将技术文档、教程和社区讨论整合到统一资源库中,该项目降低了学习门槛,成为数据工程师和爬虫开发者的实用指南。 (网络信息服务 / 网络爬虫)
 - awesome-data-analysis - Awesome Web Scraping - List of libraries, tools, and APIs for web scraping and data processing. (🕸️ Web Scraping & Crawling / Resources)
 - awesome-data-analysis - Awesome Web Scraping - List of libraries, tools, and APIs for web scraping and data processing. (🕸️ Web Scraping & Crawling / Resources)
 
README
          # Awesome Web Scraping
Lists of packages, services and manuals related to web scraping.
## Topics
* [Python](https://github.com/lorien/web-scraping/blob/master/python.md) - Python packages
* [PHP](https://github.com/lorien/web-scraping/blob/master/php.md) - PHP packages
* [Ruby](https://github.com/lorien/web-scraping/blob/master/ruby.md) - Ruby packages
* [JavaScript](https://github.com/lorien/web-scraping/blob/master/javascript.md) - JavaScript packages
* [Go](https://github.com/lorien/web-scraping/blob/master/golang.md) - Go packages
* [Command Line Tools](https://github.com/lorien/web-scraping/blob/master/cli.md) - tools with a command line interface
* [Web Scraping Manuals](https://github.com/lorien/awesome-web-scraping/blob/master/manuals.md) - list of articles and books teaching web scraping
* [dhamaniasad / HeadlessBrowsers](https://github.com/dhamaniasad/HeadlessBrowsers) - list of (almost) all headless web browsers in existence
* [DNS over HTTPS providers](https://github.com/curl/curl/wiki/DNS-over-HTTPS) - list of DNS over HTTPs providers
* [Awesome Pastebins](https://github.com/lorien/awesome-pastebins) - list of pastebin sites
## Captcha Solving Services
* [https://2captcha.com](https://2captcha.com/?from=3019071)
## Proxy Server Marketplaces
* https://www.blackhatworld.com/forums/proxies-for-sale.112/
* https://forum.antichat.com/forums/147/
## Telegram Discussion Groups
* [@grablab](https://t.me/grablab) - talks in English
* [@grablab_ru](https://t.me/grablab_ru) - talks in Russian
## How to Contribute to This List
See [Contributing](https://github.com/lorien/web-scraping/blob/master/CONTRIBUTING.md) guide.
## Credits
The list is based initially on some data from these sources [awesome-python](https://github.com/vinta/awesome-python), [awesome-php](https://github.com/ziadoz/awesome-php), [awesome-ruby](https://github.com/markets/awesome-ruby), [ruby-nlp](https://github.com/diasks2/ruby-nlp), [awesome-javascript](https://github.com/sorrycc/awesome-javascript)