Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-crawler

A collection of awesome web crawler,spider in different languages
https://github.com/asciimoo/awesome-crawler

Last synced: about 20 hours ago
JSON representation

  • Python

    • Scrapy - A fast high-level screen scraping and web crawling framework.
    • django-dynamic-scraper - Creating Scrapy scrapers via the Django admin interface.
    • scrapy-cluster - Uses Redis and Kafka to create a distributed on demand scraping cluster.
    • distribute_crawler - Uses scrapy,redis, mongodb,graphite to create a distributed spider.
    • pyspider - A powerful spider system.
    • Demiurge - PyQuery-based scraping micro-framework.
    • Scrapely - A pure-python HTML screen-scraping library.
    • Scrapy-Redis - Redis-based components for Scrapy.
    • cola - A distributed crawling framework.
  • Java

    • Spiderman2 - A distributed web crawler framework,support js render.
    • websphinx - Website-Specific Processors for HTML information extraction.
  • C#

    • SimpleCrawler - Simple spider base on mutithreading, regluar expression.
    • ccrawler - Built in C# 3.5 version. it contains a simple extension of web content categorizer, which can saparate between the web page depending on their content.