Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-web-scraping
Best scraping tools collection in town. Find everything you need for scraping, crawling, and processing data from the web
https://github.com/lukas-bear/awesome-web-scraping
Last synced: 5 days ago
JSON representation
-
Core Libraries
-
Python
- MechanicalSoup - Web automation library
- Scrapy - Comprehensive web scraping framework
- Beautiful Soup - HTML/XML parsing library
- requests - HTTP library for humans
- aiohttp - Asynchronous HTTP client/server
- pyspider - Web crawler with GUI interface
- Scrapy - Comprehensive web scraping framework
- Beautiful Soup - HTML/XML parsing library
- requests - HTTP library for humans
- aiohttp - Asynchronous HTTP client/server
- pyspider - Web crawler with GUI interface
- MechanicalSoup - Web automation library
-
JavaScript/Node.js
- Puppeteer - Chrome automation API
- Cheerio - Fast jQuery-like parsing
- Axios - Promise based HTTP client
- node-crawler - Web crawler with jQuery
- Crawlee - Web scraping and browser automation
- Nightmare - High-level browser automation
- Axios - Promise based HTTP client
- node-crawler - Web crawler with jQuery
- Crawlee - Web scraping and browser automation
- Nightmare - High-level browser automation
- Puppeteer - Chrome automation API
- Cheerio - Fast jQuery-like parsing
-
Java
- JSoup - HTML parsing and manipulation
- Selenium WebDriver - Browser automation
- Apache HttpClient - HTTP client library
- crawler4j - Multithreaded crawler
- webmagic - Distributed crawler framework
- JSoup - HTML parsing and manipulation
- Selenium WebDriver - Browser automation
- Apache HttpClient - HTTP client library
- crawler4j - Multithreaded crawler
- webmagic - Distributed crawler framework
-
Go
- Colly
- Fetchbot - A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
- Goquery - like API for parsing and manipulating HTML documents.
- Rod - level browser automation framework powered by Chromium DevTools.
- Playwright-go - headless browser automation.
- Gocrawl - Polite, slim and concurrent web crawler.
- Colly
- Fetchbot - A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
- Goquery - like API for parsing and manipulating HTML documents.
- Rod - level browser automation framework powered by Chromium DevTools.
- Playwright-go - headless browser automation.
- Gocrawl - Polite, slim and concurrent web crawler.
-
Ruby
- Nokogiri - HTML/XML parsing
- Mechanize - Automated web interaction
- Kimurai - Modern scraping framework
- Watir - Ruby browser automation
- Anemone - Web spider framework
- Nokogiri - HTML/XML parsing
- Mechanize - Automated web interaction
- Kimurai - Modern scraping framework
- Watir - Ruby browser automation
- Anemone - Web spider framework
-
PHP
- DiDOM - A blazing-fast and easy-to-use HTML parser.
- Crawler - A powerful library for rapid web scraping and crawling development.
- DiDOM - A blazing-fast and easy-to-use HTML parser.
- Goutte - A lightweight PHP web scraper for effortless data extraction.
- Crawler - A powerful library for rapid web scraping and crawling development.
-
Programming Languages
Categories
Keywords
crawler
18
scraping
10
python
10
scraper
10
web
8
automation
8
headless-chrome
8
crawling
8
golang
6
jquery
6
go
6
ruby
6
framework
6
javascript
6
nodejs
6
web-scraping
6
headless
6
http-client
4
parser
4
http
4
robots-txt
4
html
4
dom
4
cheerio
4
testing
4
selenium
4
firefox
4
playwright
4
chromium
4
xml
4
requests
4
spider
4
apify
2
npm
2
puppeteer
2
typescript
2
extract-data
2
promise
2
selector
2
htmlparser2
2
htmlparser
2
node-module
2
developer-tools
2
chrome
2
python-library
2
pypi
2
mechanicalsoup
2
python-requests
2
aiohttp
2
async
2