Crawler | Ecosyste.ms: Awesome

https://github.com/shin202/flixhq-core

Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.

api apis core crawler library movies movies-api node-js nodejs scraper typescript

Last synced: 11 Apr 2025

https://github.com/dachcom-digital/pimcore-lucene-search

Pimcore Website Indexer (powered by Zend Search Lucene)

crawler lucene lucenesearch pimcore

Last synced: 07 Mar 2026

https://github.com/novemberde/serverless-crawler-demo

Serverless Architecture Crawler demo

aws crawler demo handson serverless

Last synced: 21 Jul 2025

https://github.com/bartozzz/crawlerr

A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.

crawler jsdom nodejs scraper spider web-crawler

Last synced: 23 Apr 2025

https://github.com/pithyone/zhihu-crawler

轻量级知乎爬虫，支持问题、收藏夹和本月最热

crawler php zhihu

Last synced: 24 Jan 2026

https://github.com/nicolasmure/crawlerdetectbundle

A Symfony bundle for the Crawler-Detect library (detects bots/crawlers/spiders via the user agent)

bot bundle crawler php symfony

Last synced: 30 Jul 2025

https://github.com/drogbadvc/crawlit

This project is a web crawler based on Scrapy, visualization 2D, PageRank

crawler scrapy seo streamlit

Last synced: 24 Oct 2025

https://github.com/owenliang/dht

一个DHT爬虫

bencode crawler dht

Last synced: 13 Jul 2025

https://github.com/weihanli/proxycrawler

代理爬虫服务，爬取代理IP并保存到 Redis 中, topshelf+Quartz.Net+redis

crawler proxy proxy-ip redis

Last synced: 19 Jun 2025

https://github.com/Ryaang/gpt-web-crawler

A web crawler for GPTs to build knowledge bases 用于GPT构建知识库的网站爬虫

chatgpt crawler gpt-crawler knowledge-base

Last synced: 11 Jul 2025

https://github.com/nstapelbroek/estate-crawler

Scraping the real estate agencies for up-to-date house listings as soon as they arrive!

appartments crawler huurwoningen nederland python real-estate-agencies scrapy scrapy-crawler

Last synced: 18 Jan 2026

https://github.com/generals-space/site-mirror-go

来自[码云](https://gitee.com/generals-space/site-mirror-go) 通用爬虫, 仿站工具, 整站下载

commoncrawl crawler mirror spider

Last synced: 14 Jan 2026

https://github.com/paceaux/selector-finder

Find a CSS selector on a public site

crawler css javascript nodejs screenshots selector-finder

Last synced: 12 Jan 2026

https://github.com/jimouchen/bing-chat-fxxk

newbing api by PlayWright

bing-api crawler crawler-python gpt

Last synced: 16 Jun 2025

https://github.com/flashnuke/webrecon

A collection of pentesting web scanners

crawler cyber directory-serach dns-enumeration kali-linux pentest pentesting port-scanning python recon reconnaissance scanner security

Last synced: 17 Mar 2025

https://github.com/feng19/spider_man

SpiderMan,a base-on Broadway fast high-level web crawling & scraping framework for Elixir.

crawler data-mining elixir erlang framework spider

Last synced: 03 Apr 2025

https://github.com/mechazawa/redbetter-wm2

Better.php crawler for Redacted that uses WhatManager

crawler flac redacted seedbox transcoding whatcd whatmanager

Last synced: 15 Mar 2026

https://github.com/bitxx/pholcus

对基于golang的henrylee2cn/pholcusl爬虫框架的修复和完善，满足自身需要

crawler golang pholcus

Last synced: 12 Jul 2025

https://github.com/qibinlou/faceplusplus-stars-library-images-crawler

Face++ starlib 明星库头像标注集爬虫及图片集合，用于face recognition training

crawler faceplusplus image-recognition images traning

Last synced: 11 Jul 2025

https://github.com/omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 23 Apr 2025

https://github.com/kagami/tistore

:camera: Tistory photo grabber

crawler cross-platform electron tistory

Last synced: 05 May 2025

https://github.com/rajat19/torrent-crawler

crawls and stores list of torrent links

bs4 crawler hacktoberfest python3 torrent

Last synced: 08 Apr 2026

https://github.com/tokahuke/lopez

Crawling and scraping the Web for fun and profit

crawler rust scraper seo web-scraping

Last synced: 01 Mar 2026

https://github.com/HHN/crawler4j

Open Source Web Crawler for Java - A fork of yasserg/crawler4j

crawler crawler4j java spider web-crawler web-spider

Last synced: 05 Oct 2025

https://github.com/alessandrodd/googleplay_api

Google Play Unofficial Python 3 API Library

android crawler googleplay googleplay-api playstore

Last synced: 19 Mar 2025

https://github.com/spider-rs/spider-clients

Python, Javascript, and Rust libraries for the Spider Cloud API.

ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider supabase web-scraping

Last synced: 09 Apr 2026

https://github.com/xiongwilee/techweekly

高可配的技术周报邮件推送工具

crawler nodejs techweekly

Last synced: 19 Apr 2025

https://github.com/thd3r/godork

Advanced & Fast Google Dorking Tool

crawler dork-scanner dorking godork google-dorking google-dorks osint-tool python

Last synced: 09 Apr 2026

https://github.com/jroakes/crawlnchat

A modular web crawling and chat system that allows for ingesting website content through XML sitemaps, converting to vector embeddings, and providing AI-powered chat interfaces through multiple frontend options.

crawler langgraph-python openai pinecone rag

Last synced: 11 Apr 2025

https://github.com/xiyuan-fengyu/ppspider_example

ppspider爬虫例子，B站视频信息及评论爬取，qq音乐信息及评论爬取，推特主题评论和用户信息爬取

bilibili cheerio crawler ppspider puppeteer qq-music spider twitter

Last synced: 04 Oct 2025

https://github.com/testomato/minicrawler

Multiplexing web client supporting HTTP/2 and WHATWG URL compliant parser written in C

agpl c cookie crawler http2 icu multiplexing nghttp2 parser ssl whatwg

Last synced: 12 Mar 2026

https://github.com/ph-7/emails-scraper

:ram: Simple PHP Email Grabber to get emails from a txt file containing the list of urls (add one url per line).

crawler email email-grabber email-scraper grabber php php-scraper scrape scrape-email scraper scraping script

Last synced: 09 Apr 2025

https://github.com/alanshaw/libp2p-dht-scrape-aas

🧹 A libp2p DHT scraper as a service allowing anyone to collect, consume and use to generate useful reports & visualisations.

crawler dht kademlia libp2p p2p scraper

Last synced: 17 Nov 2025

https://github.com/pyladies-brazil/crawler-tutorial

Tutorial de raspagem de dados realizado em parceria com a JusBrasil

beautifulsoup brazil crawl crawler crawler-python crawling-python pyladies pyladies-brasil pyladies-workshop python python-tutorial raspagem-de-dados requests-python

Last synced: 07 Feb 2026

https://github.com/yokawasa/scrapy-azuresearch-crawler-samples

Scrapy as a Web Crawler for Azure Search Samples

azure azure-search crawler python python3 scrapy search

Last synced: 26 Mar 2025

https://github.com/jianboy/crawl_xuexi

学习强国APP上机器学习课程，学习慕课视频批量下载

crawler machine-learning python xuexi

Last synced: 17 Jul 2025

https://github.com/norconex/collector-filesystem

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.

crawler filesystem-crawler java norconex-filesystem-collector search-engine

Last synced: 05 Nov 2025

https://github.com/racinmat/premium-downloader

crawler pornhub pornhub-downloader python

Last synced: 07 Apr 2025

https://github.com/postman-open-technologies/openapi-web-search

OpenAPI Web Search: Revolutionizing the Way Developers find API Definitions 🚀

crawler dataset gsoc gsoc-2023 openapi search-engine swagger

Last synced: 10 Apr 2025

https://github.com/petehouston/udemy-crawler

Crawling Udemy course info and save into JSON format.

crawler crawling node node-cli udemy udemy-api udemy-crawl

Last synced: 06 Jul 2025

https://github.com/capjamesg/indieweb-search

Source code for the IndieWeb search engine.

crawler indieweb search search-engine

Last synced: 10 May 2025

https://github.com/gruppio/slackwebhooksgithubcrawler

Search for Slack Webhooks token publicly exposed on Github

crawler crawling hack messages nodejs puppeteer slack slack-bot slack-webhook slackbot webhook

Last synced: 15 Apr 2025

https://github.com/HengXin666/BiLiBiLi_DanMu_Crawling

爬取B站历史弹幕/全弹幕, 支持高级弹幕, Bas弹幕爬取. [2025年]可用; 内有算法可保证几乎不丢失弹幕情况下, 减少请求次数, 以提高爬取速度; 有GUI界面, 支持继续爬取. 通过二分确认最早有弹幕的日期, 再而爬取; 内置弹幕文件去重和弹幕文件合并功能

bilibili-danmaku crawler danmaku python

Last synced: 27 Mar 2025

https://github.com/Actomaton/ActoCrawler

🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.

crawler swift

Last synced: 22 Jul 2025

https://github.com/chainski/proxyscraper

cplusplus crawler http https proxies proxy proxy-api proxygrabber proxyscraper proxytool scraper socks4 socks5

Last synced: 01 Apr 2026

https://github.com/RuedigerVoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 17 Apr 2025

https://github.com/nvk681/gumo

A crawler that extracts data from a dynamic webpage. Written in node js.

crawler elasticsearch neo4j nodejs

Last synced: 16 Mar 2026

https://github.com/asing1001/movierater

A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.

apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript

Last synced: 12 Jan 2026

https://github.com/kangfend/ig-scraper

Instagram hashtag scraper

crawler hashtag-scraper ig-scraper instagram instagram-hashtag-scraper scraper

Last synced: 14 Dec 2025

https://github.com/fanhuaandluomu/qqspider

爬取QQ用户信息（qq号、昵称、生日、地址等基本信息）并做简要analysis。

crawler python qq spider

Last synced: 01 May 2025

https://github.com/ruedigervoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 13 Apr 2025

https://github.com/thaoshibe/crawl-original-google-images

python scripts for crawling original image from Google Images

chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search

Last synced: 28 Oct 2025

https://github.com/waynechang65/ptt-crawler

ptt-crawler is a web crawler module designed to scarpe data from Ptt.

api crawl crawler javascript nodejs ptt scrape scraper scraping spider typescript web-crawler webcrawler

Last synced: 08 Oct 2025

https://github.com/woojubb/html-article-extractor

A web page content extractor

article-extracting article-extractor crawler crawling extraction extractor

Last synced: 24 Dec 2025

https://github.com/capturr/scraper

All In One API to easily scrape data from any website, without worrying about captchas and bot detection mecanisms.

captcha cheerio crawler crawling data declarative extract growth-hacking html javascript json jsonld nodejs recaptcha scraper scraping spider typescript web web-scraping

Last synced: 02 Aug 2025

https://github.com/ArchiveTeam/WebArchiver

Decentralized web archiving

archiver archiving crawler decentralized python warc web webarchiving

Last synced: 07 Apr 2025

https://github.com/fmw666/python

🍋 Python基础、Pygame游戏编程、Python算法与面试题、四种常用的Python Web框架、爬虫、数据可视化、机器学习。一共七个Python大方向！

algorithm basis crawler files gui learning-notes markdown pygame pyqt5 python3 script web

Last synced: 27 Oct 2025

https://github.com/aman-codes/webevaluator

A web crawling tool which tests websites for SSL, Cookies and ADA compliance and also suggests ways to fix them.

ada collaborate compliance cookie crawler learn ssl

Last synced: 01 Jul 2025

https://github.com/archiveteam/webarchiver

Decentralized web archiving

archiver archiving crawler decentralized python warc web webarchiving

Last synced: 15 May 2025

https://github.com/neuralegion/bright-cli

Command Line Interface (CLI) tool for BrightSec's solutions.

api cli crawler cyber-security devops har nexploit oas secops security typescript

Last synced: 01 Apr 2026

https://github.com/gplumb/netcrawlerdetect

A .net standard port of JayBizzle's CrawlerDetect project (https://github.com/JayBizzle/Crawler-Detect).

bots c-sharp crawler detect dotnet-core dotnet-standard spider user-agent

Last synced: 11 Apr 2025

https://github.com/raintree-technology/docpull

Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output

ai-training-data cli crawler developer-tools documentation llm markdown mcp pypi python rag web-scraping

Last synced: 24 Apr 2026

https://github.com/tokenmill/crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

crawler crawling crawling-framework elasticsearch java scraping storm storm-crawler vaadin

Last synced: 22 Apr 2025

https://github.com/p0dalirius/crawlersuseragents

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

bugbounty crawler crawlers pentest request tool user-agent web

Last synced: 03 Sep 2025

https://github.com/s045pd/sharingan

We will try to find your visible basic footprint from social media as much as possible - 😤 more sites is comming soon

asyncio crawler httpx python38 social-network

Last synced: 11 Apr 2025

https://github.com/spider-rs/web-crawling-guides

How to guides on web-crawling or scraping

agents ai-agents ai-scraping clean-markdown crawler fast-webcrawler html-to-markdown llm-webcrawler scraper web-scraping

Last synced: 30 Jun 2025

https://github.com/zyszys/zhengfang_system_spider

:bug:一只登录正方教务管理系统，爬取数据的小爬虫

crawler python spider zhengfang

Last synced: 16 May 2025

https://github.com/smolijar/offensive-fortune

A script for generating fortune cookie from the the funniest and most offensive stuff collected off the Internet.

crawler fortune fortune-cookie vilejoke

Last synced: 06 Sep 2025

https://github.com/axel-dev/anime-tracker

:spider_web: All in one place to track your favorite animes

angular anime anime-scraper crawler scraper web-extension

Last synced: 08 Oct 2025

https://github.com/loomisloud/onion-crawler

Tor website crawler (specific for Alphabay at the time)

crawler onion parser python tor

Last synced: 11 May 2025

https://github.com/iflycn/hero

百万英雄答题助手 - 兼容全部答题 APP

adb android crawler orc python3

Last synced: 09 Jul 2025

https://github.com/xbynet/crawler

A simple and flexible web crawler framework for java.

crawler httpclient java jsoup spider

Last synced: 14 Jan 2026

https://github.com/sigoden/rag-crawler

Crawl a website to generate knowledge file for RAG

crawler knowledge llm rag

Last synced: 24 Jul 2025

https://github.com/mediamonks/crawler

Crawl your own website with various clients for SEO and indexing purposes.

browserkit crawler crawling php prerender prerenderio seo spider

Last synced: 28 Jul 2025

https://github.com/pourmand1376/persiancrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 28 Oct 2025

https://github.com/wx-chevalier/sentinel-cendertron

Cendertron = Crawler + cendertron, Crawl AJAX-heavy client-side Single Page Applications (SPAs), deploying with docker, focusing on scraping requests(page urls, apis, etc.), followed by pentest tools(Sqlmap, etc.). Cendertron can be used for extracting requests(page urls, apis, etc.) from your Web 2.0 page.

cendertron crawler crawler-cendertron wx-be wx-code wx-pentest

Last synced: 25 Feb 2026

https://github.com/chairco/2017_pycontw_talk

crawler django django-q pycontw scheduled-tasks task

Last synced: 30 Jul 2025

https://github.com/fanyong920/crawlitem

用于爬取淘宝天猫网页的谷歌插件

crawler javascript taobao tmall

Last synced: 21 Jul 2025

https://github.com/tower1229/crawler

Nodejs crawler for cnbeta.com

crawler nodejs

Last synced: 05 Oct 2025

https://github.com/crackcomm/crawl

Lightweight library for scalable crawlers in Go.

crawl crawler go

Last synced: 16 Feb 2026

https://github.com/discovai/discovai-crawl

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

ai api crawler embedding vector-database web-scraping