Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-crawlers

https://github.com/fs-hao/awesome-crawlers

Last synced: 4 days ago
JSON representation

All
- Scrapy - 08-25 | A fast high-level screen scraping and web crawling framework. |
- you-get - 08-25 | Dumb downloader that scrapes the web. |
- colly - 08-25 | Fast and Elegant Scraping Framework for Gophers. |
- pyspider - 08-25 | A powerful spider system. |
- newspaper - 08-24 | News, full-text, and article metadata extraction in Python 3 |
- Webmagic - 08-24 | A scalable crawler framework. |
- Goutte - 08-24 | A screen scraping and web crawling library for PHP. |
- portia - 08-24 | Visual scraping for Scrapy. |
- crawlee - 08-25 | A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. |
- spider-flow - 08-25 | A visual spider framework, it's so good that you don't need to write any code to crawl the website. |
- node-crawler - 08-24 | Node-crawler has clean,simple api. |
- Nokogiri - 08-24 | A Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support. |
- ferret - 08-23 | Declarative web scraping. |
- headless-chrome-crawler - 08-24 | Headless Chrome crawls with jQuery support |
- Scrapy-Redis - 08-25 | Redis-based components for Scrapy. |
- Crawler4j - 08-24 | Simple and lightweight web crawler. |
- mechanize - 08-21 | Automated web interaction & crawling. |
- node-osmosis - 08-23 | HTML/XML parser and web scraper for Node.js. |
- scrape-it - 08-19 | A Node.js scraper for humans. |
- Hakrawler - 08-24 | Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application |
- dom-crawler - 08-23 | The DomCrawler component eases DOM navigation for HTML and XML documents. |
- scraperjs - 08-15 | A complete and versatile web scraper. |
- RoboBrowser - 08-23 | A simple, Pythonic library for browsing the web without a standalone web browser. |
- distribute_crawler - 08-23 | Uses scrapy,redis, mongodb,graphite to create a distributed spider. |
- Hawk - 08-25 | Advanced Crawler and ETL tool written in C#/WPF. |
- WebCollector - 08-17 | Simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. |
- anthelion - 08-07 | A plugin for Apache Nutch to crawl semantic annotations within HTML pages. |
- dht - 08-24 | BitTorrent DHT Protocol && DHT Spider. |
- httrack - 08-24 | Copy websites to your computer. |
- QueryList - 08-07 | The progressive PHP crawler framework. |
- Heritrix3 - 08-25 | Extensible, web-scale, archival-quality web crawler project. |
- Gecco - 08-18 | A easy to use lightweight web crawler |
- spatie/crawler - 08-24 | An easy to use, powerful crawler implemented in PHP. Can execute Javascript. |
- Abot - 08-24 | C# web crawler built for speed and flexibility. |
- Gain - 08-11 | Web crawling framework based on asyncio for everyone. |
- gocrawl - 08-23 | Polite, slim and concurrent web crawler. |
- SeimiCrawler - 08-24 | An agile, distributed crawler framework. |
- Scrapely - 08-20 | A pure-python HTML screen-scraping library. |
- go_spider - 08-18 | An awesome Go concurrent Crawler(spider) framework. |
- PSpider - 08-24 | A simple spider frame in Python3. |
- aspider - 08-14 | An async web scraping micro-framework based on asyncio. |
- upton - 08-20 | A batteries-included framework for easy web-scraping. Just add CSS(Or do more). |
- scrape - 08-22 | A simple, higher level interface for Go web scraping. |
- open-source-search-engine - 08-23 | A distributed open source search engine and spider/crawler written in C/C++. |
- cola - 07-29 | A distributed crawling framework. |
- rvest - 08-23 | Simple web scraping for R. |
- php-spider - 08-24 | A configurable and extensible PHP web spider. |
- wombat - 08-18 | Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages. |
- web-scraper-chrome-extension - 08-22 | Web data extraction tool implemented as chrome extension. |
- scrapy-cluster - 08-23 | Uses Redis and Kafka to create a distributed on demand scraping cluster. |
- django-dynamic-scraper - 08-25 | Creating Scrapy scrapers via the Django admin interface. |
- sukhoi - 07-01 | Minimalist and powerful Web Crawler. |
- creeper - 08-23 | The Next Generation Crawler Framework (Go). |
- fetchbot - 08-24 | A simple and flexible web crawler that follows the robots.txt policies and crawl delays. |
- Spidr - 07-06 | Spider a site, multiple domains, certain links or infinitely. |
- Dataflow kit - 08-24 | Extract structured data from web pages. Web sites scraping. |
- webster - 08-13 | A reliable web crawling framework which can scrape ajax and js rendered content in a web page. |
- laravel-goutte - 08-10 | Laravel 5 Facade for Goutte. |
- ACHE Crawler - 08-17 | An easy to use web crawler for domain-specific search. |
- PHPScraper - 08-23 | PHPScraper is a scraper & crawler built for simplicity. |
- Spark-Crawler - 08-25 | Evolving Apache Nutch to run on Spark. |
- ants-go - 03-13 | A open source, distributed, restful crawler engine in golang. |
- supercrawler - 08-09 | Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits. |
- MSpider - 05-31 | A simple ,easy spider using gevent and js render. |
- ebot - 07-05 | A scalable, distribuited and highly configurable web cawler. |
- spidy - 08-03 | The simple, easy to use command line web crawler. |
- spider - 08-22 | The fastest web crawler and indexer. |
- pspider - 05-22 | Parallel web crawler written in PHP. |
- js-crawler - 08-06 | Web crawler for Node.JS, both HTTP and HTTPS are supported. |
- Cobweb - 01-02 | Web crawler with very flexible crawling options, standalone or using sidekiq. |
- Infinity Crawler - 08-15 | A simple but powerful web crawler library in C#. |
- webBee - 07-27 | A DFS web spider. |
- crawley - 08-10 | Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations. |
- CoCrawler - 08-03 | A versatile web crawler built using modern tools and concurrency. |
- Squidwarc - 05-13 | High fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head |
- brownant - 05-05 | A lightweight web data extracting framework. |
- crawler - 07-15 | Scala DSL for web crawling. |
- RubyRetriever - 08-06 | RubyRetriever is a Web Crawler, Scraper & File Harvester. |
- scrala - 03-15 | Scala crawler(spider) framework, inspired by scrapy. |
- Demiurge - 06-02 | PyQuery-based scraping micro-framework. |
- web-scraper - 05-27 | Web Scraping Toolkit using HTML and CSS Selectors or XPath expressions. |
- crawlzone/crawlzone - 08-04 | Crawlzone is a fast asynchronous internet crawling framework for PHP. |
- ferrit - 11-02 | Ferrit is a web crawler service written in Scala using Akka, Spray and Cassandra. |
- SkyScraper - 03-05 | An asynchronous web scraper / web crawler using async / await and Reactive Extensions. |
- Apache Nutch - - | -- | -- | Highly extensible, highly scalable web crawler for production environment. |
- JSoup - - | -- | -- | Scrapes, parses, manipulates and cleans HTML. |
- Open Search Server - - | -- | -- | A full set of search functions. Build your own indexing strategy. Parsers extract full-text data. The crawlers can index everything. |
- Spiderman - - | -- | -- | A scalable ,extensible, multi-threaded web crawler. |
- pholcus - 08-24 | A distributed, high concurrency and powerful web crawler. |
- x-ray - 08-19 | Web scraper with pagination and crawler support. |
- Scrapy-Redis - 08-25 | Redis-based components for Scrapy. |
- MechanicalSoup - 08-24 | A Python library for automating interaction with websites. |
- DotnetSpider - 08-24 | This is a cross platfrom, ligth spider develop by C#. |
- anthelion - 08-07 | A plugin for Apache Nutch to crawl semantic annotations within HTML pages. |
- simplecrawler - 08-24 | Event driven web crawler. |
- aspider - 08-14 | An async web scraping micro-framework based on asyncio. |
- cola - 07-29 | A distributed crawling framework. |
- rvest - 08-23 | Simple web scraping for R. |
- sukhoi - 07-01 | Minimalist and powerful Web Crawler. |
- StormCrawler - 08-22 | An open source collection of resources for building low-latency, scalable web crawlers on Apache Storm |
- js-crawler - 08-06 | Web crawler for Node.JS, both HTTP and HTTPS are supported. |
- scrala - 03-15 | Scala crawler(spider) framework, inspired by scrapy. |

Programming Languages

Python 27 Java 14 Go 12 JavaScript 11 PHP 9 Ruby 4 HTML 3 C# 3 Scala 2 R 2

Ecosyste.ms: Awesome

awesome-crawlers

All