Crawler | Ecosyste.ms: Awesome

https://github.com/ronin-rb/ronin-web

ronin-web is a collection of useful web helper methods and commands.

cli crawler hacktoberfest helpers html proxy-server ronin-rb ruby server spider web xml

Last synced: 03 Oct 2025

https://github.com/xiantang/spider

web crawler

crawler python3

Last synced: 14 Apr 2025

https://github.com/spider-rs/spider-nodejs

Spider ported to Node.js

crawler distributed-systems headless-chrome indexer nodejs scraper spider typescript

Last synced: 31 Mar 2025

https://github.com/kant2002/ncrawler

Web Crawler written in C#

crawler scrapper

Last synced: 17 Jul 2025

https://github.com/threekiii/awesome-scrapy

一个基于Scrapy的数据采集爬虫代码库

appium crawler fiddler python python3 scrapy selenuim spider

Last synced: 21 Aug 2025

https://github.com/himself65/luogucrawler

一个python爬虫来爬取洛谷各种信息

crawler python python3

Last synced: 09 Oct 2025

https://github.com/zenrows/scaling-to-distributed-crawling

Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.

crawler crawling distributed python python3 scraping spider

Last synced: 18 Mar 2026

https://github.com/moskrc/crawlerdetect

🕷CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.

bot crawler detect python spider user-agent

Last synced: 16 Jan 2026

https://github.com/Maicius/UniversityRecruitment-sSurvey

用严肃的数据来回答“什么样的企业会到什么样的大学招聘”？

analysis beautifulsoup crawler data redis university

Last synced: 06 Mar 2025

https://github.com/wael-sudo2/facebook-page-info-scraper

Free Facebook pages MetaData Scraping Library - Unlimited Calls

crawler crawling-python crm data-analysis data-mining facebook facebook-apis facebook-page-information facebook-page-scraper facebook-scraper facebook-scraping leadgeneration marketing metadata python scraping scraping-python selenium

Last synced: 04 Feb 2026

https://github.com/mrxujiang/crawel

基于Apify+node+react搭建的有点意思的爬虫平台

apify crawler node puppeteer react react-hooks umi umi3

Last synced: 13 Apr 2025

https://github.com/jonaslejon/lolcrawler

Headless web crawler for bugbounty and penetration-testing/redteaming

bugbounty crawler docker penetration-testing penetration-testing-tools redteam redteam-tools redteaming

Last synced: 12 Jul 2025

https://github.com/elboletaire/php-crawler

:spider: A simple crawler (spider) writen in php just for fun, with zero dependencies

crawler php spider

Last synced: 10 Jan 2026

https://github.com/p0dalirius/robotstester

This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.

bugbounty crawler pentesting python robots tool

Last synced: 21 Aug 2025

https://github.com/axetroy/crawler

nodejs 爬虫框架. crawler framework for nodejs

crawler nodejs

Last synced: 18 Jun 2025

https://github.com/maicius/universityrecruitment-ssurvey

用严肃的数据来回答“什么样的企业会到什么样的大学招聘”？

analysis beautifulsoup crawler data redis university

Last synced: 28 Apr 2025

https://github.com/scrapfly/python-scrapfly

Scrapfly Python SDK for headless browsers and proxy rotation

crawler headless-browser python scraper scraping scraping-api sdk web-scraper web-scraping

Last synced: 14 Apr 2025

https://github.com/botcity-dev/botcity-framework-web-python

BotCity Framework Web - Python

automation automation-framework crawler python robotic-process-automation rpa selenium testing web webdriver webscraping

Last synced: 05 Apr 2025

https://github.com/rix4uni/uforall

uforall is a fast url crawler this tool crawl all URLs number of different sources, alienvault,WayBackMachine,urlscan,commoncrawl

alienvault bugbounty commoncrawl crawler osint recon reconnaissance urlscan wayback

Last synced: 15 Apr 2025

https://github.com/veliovgroup/spiderable-middleware

Pre-rendering for JavaScript websites that delivers SSR-level SEO, enhanced link previews, and performance via effortless middleware integration — ideal for PWAs, SPAs, and modern JS-driven apps, websites, and webpages

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 12 Apr 2025

https://github.com/charlespikachu/seleniumlogin

Login some website using selenium.

crawler selenium selenium-webdriver spider taobao

Last synced: 23 Oct 2025

https://github.com/VeliovGroup/spiderable-middleware

Pre-rendering for JavaScript websites that delivers SSR-level SEO, enhanced link previews, and performance via effortless middleware integration — ideal for PWAs, SPAs, and modern JS-driven apps, websites, and webpages

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 13 May 2025

https://github.com/taseikyo/crawler

:snake:A collection of simple Python crawlers.

baidu-tieba bilibili bing crawler douban pixiv python-crawler python3 youku

Last synced: 19 Oct 2025

https://github.com/kkomelin/insecres

A console tool that finds insecure resources on HTTPS sites

crawler finder https security

Last synced: 22 Jun 2025

https://github.com/VAllens/CrawlerSamples

This is a Puppeteer+AngleSharp crawler console app samples, used C# 7.1 coding and dotnet core build.

anglesharp chsarp crawler dotnetcore headless headless-browsers headless-chrome headless-chromium puppeteer

Last synced: 04 May 2025

https://github.com/ryuchen/deadpool

该项目是一个使用celery作为主体框架的爬虫应用，能够灵活的添加爬虫任务，并且同时运行多站点的爬虫工作，所有组件都能够原生支持规模并发和分布式，加上celery原生的分布式调用，实现大规模并发。

celery crawler deadpool python3 spider taobao taobao-spider tmall tmall-spider

Last synced: 21 Mar 2025

https://github.com/armand1m/papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

cache crawler jsdom nodejs scraper scraping typescript web-scraping

Last synced: 28 Jan 2026

https://github.com/cyb3rmx/d00r

Simple directory brute-force tool written with python.

brute-force bruteforce crawler directory-lister hunt hunter linux login pentesting python3 security security-tools termux-hacking

Last synced: 11 Jul 2025

https://github.com/kylemocode/medium-stat-box

Practical pinned gist which show your latest medium status 📌

awesome-pinned-gists crawler github-action github-gists medium-stats

Last synced: 17 Apr 2025

https://github.com/NatsuFox/Tapestry

Tapestry - 基于 Agent Skill Bundle 的轻量级书签知识库 https://natsufox.github.io/Tapestry

agent-skills claude-code codex crawler knowledge-base openclaw workflow

Last synced: 27 Apr 2026

https://github.com/m-ahmadi/tse-client

A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.

browser caching cli cli-app compression crawler data dataset downloader iran node-module stock stock-data stock-market stock-prices tehran ticker tsetmc universal

Last synced: 18 Feb 2026

https://github.com/alwalxed/wayurls

CLI tool for fetching URLs from Wayback Machine, Common Crawl, and VirusTotal.

bugbounty bugcrowd crawler cyber-security cybersecurity golang golang-tools hackerone infosec intigriti osint osint-tool projectdiscovery tomnomnom tools virustotal wayback-machine web web-security

Last synced: 05 Sep 2025

https://github.com/m-haisham/novelsave_sources

A collection of webnovel sources offering varying amounts of scraping capability.

crawler lightnovel scraper

Last synced: 22 Jan 2026

https://github.com/BitTheByte/Domainker

BugBounty Tool

bb bugbounty bugcrowd checker code crawler h1 hackerone hacking hacking-tool injection python rce response struts2 subdomain sudomains

Last synced: 10 Mar 2025

https://github.com/migalabs/armiarma

Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network

crawler ethereum libp2p monitoring

Last synced: 21 Aug 2025

https://github.com/bin-huang/nodespider

[DEPRECATED] Simple, flexible, delightful web crawler/spider package

async crawl crawler node pipeline promise spider web

Last synced: 19 Sep 2025

https://github.com/hengxin666/bilibili_danmu_crawling

爬取B站历史弹幕/全弹幕, 支持高级弹幕, Bas弹幕爬取. [2025年]可用; 内有算法可保证几乎不丢失弹幕情况下, 减少请求次数, 以提高爬取速度; 有GUI界面, 支持继续爬取. 通过二分确认最早有弹幕的日期, 再而爬取; 内置弹幕文件去重和弹幕文件合并功能

bilibili-danmaku crawler danmaku python

Last synced: 24 Jul 2025

https://github.com/iljan/narr

Download audio tracks from Netflix to sample your favorite shows

chrome-devtools-protocol cli crawler downloader music

Last synced: 27 Jul 2025

https://github.com/scrapy-plugins/scrapy-zyte-api

Zyte API integration for Scrapy

crawler plugin proxy scraping scrapy

Last synced: 04 Apr 2025

https://github.com/basemax/googleplaywebserviceapi

Tiny script to crawl information of a specific application in the Google play/store base on PHP.

api crawler crawler-php crawlers google-play google-play-api google-play-games google-play-service google-play-services google-play-store google-playstore hacktoberfest hacktoberfest2020 php php-crawler

Last synced: 05 May 2025

https://github.com/crawlerclub/crawler

Crawler4U, a general purpose focused crawler

crawler information-extraction spider

Last synced: 17 Jan 2026

https://github.com/twtrubiks/auto_crawler_ptt_beauty_image

Auto Crawler Ptt Beauty Image Use Python Schedule

beauty crawler heroku image ptt python schedule tutorial

Last synced: 26 Jun 2025

https://github.com/hackfengJam/ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

crawler distributed-systems django elasticsearch scrapy

Last synced: 28 Mar 2025

https://github.com/flulemon/sneakpeek

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis

crawler crawler-python crawlers crawling crawling-engine crawling-framework python python3 scraper scraper-api scraper-engine scrapers scraping scraping-framework vue website-crawler

Last synced: 14 Jan 2026

https://github.com/jfreegman/toxcrawler

A Tox DHT network crawler

crawler dht dht-network tox toxcore

Last synced: 14 Apr 2025

https://github.com/scrapingant/scrapingant-client-python

ScrapingAnt API client for Python.

crawler scraper scraping scrapingant scrapy webscraping

Last synced: 23 Sep 2025

https://github.com/safonovpro/node-html-crawler

Simple for use node html crawler (spider) of site web pages

crawler es6 node spider

Last synced: 12 Mar 2026

https://github.com/gamemann/bestbuy-parser

A personal tool using Python's Scrapy framework to scrape Best Buy's product pages for RTX 3080 TIs and notify if available/not sold out.

3080 automation best bestbuy bot buy crawler parser python python3 rtx scrapy ti

Last synced: 11 Mar 2026

https://github.com/heyingcai/cetty

基于事件分发的爬虫框架

crawler event-dispatcher gather spider

Last synced: 04 May 2025

https://github.com/xfgryujk/taobaoanalysis

练习NLP，分析淘宝评论的项目

crawler nlp taobao

Last synced: 16 Apr 2025

https://github.com/andreaskoch/gargantua

The fast website crawler

command-line crawler golang xml-sitemap

Last synced: 14 Apr 2025

https://github.com/haxzie-xx/instagram-downloader

Node.js/Express app to retrive instagram video/image download urls

crawler downloader express instagram instagram-scraper nodejs

Last synced: 18 Mar 2025

https://github.com/proxzima/darkspider

Anatomy and Visualization of the Network structure of the Dark web using multi-threaded crawler

collaborate crawler dark-web extractor github github-pages hacktoberfest networkx onion osint python scraper tor

Last synced: 14 Mar 2026

https://github.com/wolverinn/igxe-c5-buff-csgo-skins-sale-data-catch

Automatically get the csgo skins sale data on igxe.cn and buff and c5game.com.You can choose the specific skins to get data.

crawler csgo-skin

Last synced: 25 Mar 2025

https://github.com/apocelipes/schannel-qt5

A GUI client of schannel powered by therecipe/qt and golang

client-side crawler go golang goqt linux qcharts qt5

Last synced: 07 May 2025

https://github.com/ph-7/crawling-emails

Very simple bash script to crawl email addresses from a specific website.

bash crawler email email-scraper scrape scrape-email scraper scraping shell wget

Last synced: 22 Aug 2025

https://github.com/miry/medup

Download all content from Medium and Dev.to to local folder

cli crawler devto json markdown medium sync tool

Last synced: 08 Apr 2025

https://github.com/helviojunior/filecrawler

File Crawler index files and search hard-coded credentials

crawler crawling-python elasticsearch leaks leaks-scanner

Last synced: 08 Apr 2025

https://github.com/subins2000/phpwebcrawler

A Web Crawler Created in PHP

crawler php

Last synced: 05 May 2025

https://github.com/harismuneer/android-apps-downloader

📱 A utility for downloading Android apps from the Google Play Store and Xiaomi App Store (the Chinese App Store).

android-application-downloader android-apps-crawler android-market-scraper android-research android-scraper android-tool app-downloader crawler crawling-tool google-play-application-downloader google-play-store-scraper gplaycli open-source-project python-scraper research-tool scraper scraping-tool wget-utility xiaomi-apps xiaomi-store-scraper

Last synced: 30 Apr 2025

https://github.com/minhhungit/github-action-rss-crawler

Auto crawl RSS feeds using Github Action

crawler csharp github-actions litedb netcore rss rss-crawler rss-items

Last synced: 15 Jul 2025

https://github.com/juzeon/advanced-php-crawler

新浪博客文章/wenku8轻小说文库爬虫，可抓取图片保存，一键制作电子书。kindle读书党的神器！

calibre crawler gitbook kindle php sina

Last synced: 20 Feb 2026

https://github.com/pykong/pypergrabber

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

crawler email-inbox google-scholar pdf pmid pubmed python sci-hub scraper

Last synced: 15 Apr 2025

https://github.com/kshru9/web-crawler

A multithreaded web crawler using two mechanism - single lock and thread safe data structures

concurrency concurrent-data-structure cpp crawler data-structures html-parser lock multithreading openssl pagerank pthread reader-writer-lock search-engine socket threading threadsafe webcrawler website-downloader

Last synced: 23 Mar 2025

https://github.com/code4everything/visual-spider

欢迎体验我们全新的桌面端效率工具RunFlow，https://myrest.top/myflow

crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization

Last synced: 04 Oct 2025

https://github.com/a252937166/toutiaocrawler

头条号爬虫案例

crawler toutiao

Last synced: 06 Jul 2025

https://github.com/aliosm/kontests

Competitive programming contests schedule

a2oj atcoder codeforces codeforces-gym codeshef competitive-programming crawler csacademy hackerearth hackerrank kickstart leetcode topcoder

Last synced: 23 Oct 2025

https://github.com/dept/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 04 Mar 2026

https://github.com/fanhuaandluomu/sina_spider

新浪微博爬虫：登录、关键词微博查询、微博监控

crawler python-2 sina-spider

Last synced: 09 Apr 2025

https://github.com/mamal72/iranian-calendar-events

Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website

crawler events iranian jalali jalali-calendar persian

Last synced: 07 May 2025

https://github.com/debugtalk/webcrawler

A web crawler based on requests-html, mainly targets for url validation test.

crawler requests-html web-crawler weblink

Last synced: 15 Apr 2025

https://github.com/deptagency/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 08 Jul 2025

https://github.com/gomjellie/pysaint

[deprecated] 유세인트 파이썬 클라이언트

crawler sap soongsil unofficial

Last synced: 30 Apr 2025

https://github.com/howie6879/php-google

Google search results crawler, get google search results that you need - php

crawler google-search php-google

Last synced: 16 May 2025

https://github.com/gimnathperera/abans-lk-webscraping

🌐 Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan web sites

crawler python scrapy spider

Last synced: 30 Jun 2025

https://github.com/mjavadhpour/telegram-member-inviter

Crawling client's groups and channels to invite their members to a target group.

crawler python python3 robot telegram telegram-client telethon

Last synced: 19 Apr 2025

https://github.com/Decodo/Python-scraper-tutorial

A short introduction to scraping with Python with given steps and an example scraper script.

beautifulsoup crawler data-mining data-science github-python json-database-python learning python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 02 May 2025

https://github.com/koallen/google-image-downloader

A script to download images from images.google.com

crawler google-images selenium

Last synced: 18 Jan 2026

https://github.com/ivan-sincek/chad

Search Google Dorks like Chad. / Broken link hijacking tool.

broken-link-takeover bug-bounty crawler ethical-hacking google google-dorking google-dorks offensive-security penetration-testing playwright python red-team-engagement scraper search-engine security social-media social-media-takeover threat-hunting threat-intelligence web-penetration-testing

Last synced: 10 Mar 2026

https://github.com/tychozzz/article_crawler

✨ Article Crawler is a package used to crawl articles with Markdown format from a specific webpage and store them locally in HTML / Markdown formats.

article crawler html markdown pypi python

Last synced: 30 Apr 2025

https://github.com/k1low/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 16 Apr 2025

https://github.com/mattwang44/uspto-patft-web-crawler

Crawler for fetching information of US Patents and PDF bulk download

crawler patent patent-crawler pyqt5 python3 uspto

Last synced: 11 Oct 2025

https://github.com/ndgigliotti/shopify-spy

Extract structured data from Shopify websites.

crawler data data-acquisition data-science dropshipping ecommerce scrape scraper scraping scrapy shopify spider

Last synced: 26 Jan 2026

https://github.com/k1LoW/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 08 Jul 2025

https://github.com/jurooravec/crawlee-one

Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.

actor apify crawlee crawler framework scraper scraping web

Last synced: 09 Feb 2026

https://github.com/simionrobert/bitinsight

:earth_africa: Bittorrent Network Overview through Infohash Indexing, Metadata and IP visualisations of the DHT network

bep51 bittorrent crawler dht elasticsearch infohash javascript nodejs torrent

Last synced: 13 Apr 2025

https://github.com/endermanch/ddom

A simple, open-source, easy to use, and free download manager for malware samples.

crawler downloader malware manager samples

Last synced: 06 Sep 2025

https://github.com/codelibs/fess-crawler

Web/FileSystem Crawler Library

crawler java

Last synced: 07 Apr 2025

https://github.com/italia/publiccode-crawler

publiccode.yml crawler for the Open Source software catalog of Developers Italia

crawler developers-italia hacktoberfest publiccode publiccodeyml

Last synced: 10 Feb 2026

https://github.com/bigsk1/supa-crawl-chat

Integrates Supabase with Crawl4AI and AI Chat to create a powerful web crawling and semantic search solution. Streamlit supabase data visualization. Run all in Docker. API and more!

crawl4ai crawler docker embeddings fastapi gpt-4o openai-api pgvector postgresql scraping streamlit supabase