Crawler | Ecosyste.ms: Awesome

https://github.com/rebrowser/rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

automation bot bot-detection chrome chromedriver cloudflare crawler crawling datadome headless headless-chrome playwright puppeteer puppeteer-extra rebrowser scraping selenium stealth web-scraping webdriver

Last synced: 14 May 2025

https://github.com/ma6254/FictionDown

biquge crawler fiction golang novels qidian spider

Last synced: 04 May 2025

https://github.com/fffonion/xehentai

Doujinshi downloader 绅士漫画下载

crawler json-rpc python xehentai

Last synced: 16 May 2025

https://github.com/xuxueli/xxl-crawler

A lightweight web crawler framework.（Java爬虫框架）

crawler distributed flexible java object-oriented spider web xxl-crawler

Last synced: 15 May 2025

https://github.com/polyrabbit/hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You

chatgpt chatgpt-api crawler data-extraction extract-summaries hacker-news hacker-news-digest hacker-news-reader machine-learning news-aggregator openai openai-api python rss spider

Last synced: 15 May 2025

https://github.com/lixi5338619/lxbook

《爬虫逆向进阶实战》书籍代码库

android-resever crawler frida java javascript python smali spiders unidbg xposed

Last synced: 13 Apr 2025

https://github.com/jsrei/js-cookie-monitor-debugger-hook

js cookie逆向利器：js cookie变动监控可视化工具 & js cookie hook打条件断点

crawler js-reverse red-team reverse-engineering userscript web-security-research

Last synced: 15 May 2025

https://github.com/xiaoxiunique/x-kit

一个用于抓取和分析 X (Twitter) 用户数据和推文的工具。

crawler kols twitter x

Last synced: 15 May 2025

https://github.com/ohhsodead/FileMasta

A search application to explore, discover and share online files

apps archives books crawler database file file-search files games indexing internet json music search search-engine software subtitles torrents videos web

Last synced: 05 Apr 2025

https://github.com/python3webspider/douyin

API of DouYin for Humans used to Crawl Popular Videos and Musics

crawler douyin spider videos

Last synced: 04 Apr 2025

https://github.com/StanGirard/seo-audits-toolkit

SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...

analysis audits crawler dashboard extractor headers internal-links lighthouse link-extractor python securityheader seo seo-tools serp summarizer

Last synced: 26 Mar 2025

https://github.com/Kharacternyk/dotcommon

What do people have in their dotfiles?

crawler dotfiles

Last synced: 29 Mar 2025

https://github.com/stangirard/seo-audits-toolkit

SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...

analysis audits crawler dashboard extractor headers internal-links lighthouse link-extractor python securityheader seo seo-tools serp summarizer

Last synced: 04 Apr 2025

https://github.com/fengzhizi715/NetDiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3

Last synced: 03 May 2025

https://github.com/nhoya/gosint

OSINT Swiss Army Knife

axfr bitbucket crawler forensics git github go golang haveibeenpwnd infosec osint pentest pgp scraper security shodan shodan-api spider telegram

Last synced: 16 Jan 2026

https://github.com/fengzhizi715/netdiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3

Last synced: 04 Apr 2025

https://github.com/rugantio/fbcrawl

A Facebook crawler

crawl crawler facebook python scraper scrapy spider

Last synced: 07 Apr 2025

https://github.com/3nock/spidersuite

Advance web security spider/crawler

bugbounty cplusplus crawler gui information-gathering osint-tool pentest qt5 recon security-tools spider web-spider webcrawler

Last synced: 29 Oct 2025

https://github.com/go-crawler/go_jobs

带你了解一下Golang的市场行情

crawler go golang lagou spider

Last synced: 16 Jan 2026

https://github.com/rndinfosecguy/Scavenger

Crawler (Bot) searching for credential leaks on paste sites.

bot crawler credentials leaks osint paste pastebin python

Last synced: 20 Mar 2025

https://github.com/Nhoya/gOSINT

OSINT Swiss Army Knife

axfr bitbucket crawler forensics git github go golang haveibeenpwnd infosec osint pentest pgp scraper security shodan shodan-api spider telegram

Last synced: 30 Mar 2025

https://github.com/josephlimtech/linkedin-profile-scraper-api

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

crawler crawling expressjs json linkedin linkedin-bot linkedin-crawler linkedin-profile linkedin-profile-scraper linkedin-scraper linkedin-scraping nodejs profile-data puppeteer scraper scrapers scraping scraping-websites spider website-scraper

Last synced: 04 Apr 2025

https://github.com/linkedtales/scrapedin

LinkedIn Scraper (currently working 2020)

crawler linkedin linkedin-scraper scraper

Last synced: 14 May 2025

https://github.com/speed/newcrawler

Free Web Scraping Tool with Java

crawler docker scraping spider

Last synced: 02 Apr 2025

https://github.com/yhy0/Jie

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers. 挖洞辅助工具(漏洞扫描、信息收集)

apollo-exp bugcrowd crawler hackerone jie scan scanner security-copilot shiro-exp src vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners

Last synced: 07 Sep 2025

https://github.com/ChenZixinn/spider_reverse

crawler python requests spider

Last synced: 28 Mar 2025

https://github.com/setvisible/ArrowDL

ArrowDL (Arrow Downloader) is a download manager for Windows, MacOS and Linux

batch-download crawler download download-manager libtorrent magnet-link mass-downloader mozilla-firefox nativeclient picture-download qt stream-downloader streaming torrent-client torrent-downloader video-downloader web-engine webextensions youtube-dl youtube-downloader

Last synced: 14 Mar 2025

https://github.com/TumblThreeApp/TumblThree

A Tumblr and Twitter Blog Backup Application

backup blog-backup c-sharp crawler csharp dotnet downloader mvvm tumblr tumblr-backup tumblr-backup-application tumblr-blog tumblr-like tumblr-search twitter twitter-backup twitter-backup-application twitter-blog windows wpf

Last synced: 22 Mar 2025

https://github.com/rajatomar788/pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.

archive-tool crawler html html-parser mirror python web webpage

Last synced: 08 Jul 2025

https://github.com/avidlearnerinprogress/python-automation-scripts

Simple yet powerful automation stuffs.

beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day

Last synced: 05 Apr 2025

https://github.com/avidLearnerInProgress/python-automation-scripts

Simple yet powerful automation stuffs.

beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day

Last synced: 24 Apr 2025

https://github.com/iudicium/pryingdeep

Prying Deep - An OSINT tool to collect intelligence on the dark web.

crawler darkweb go gocolly golang-osint onion osint osint-tools pryingdeep security-tools

Last synced: 14 Jan 2026

https://github.com/c0d3d3v/moodle-dl

Moodle-DL downloads course content fast from Moodle (eg. lecture pdfs)

crawler downloader hacktoberfest moode-crawler moodle moodle-dl moodle-downloader scraper sync

Last synced: 21 Oct 2025

https://github.com/zhuyingda/webster

a reliable high-level web crawling & scraping framework for Node.js.

automation-test automation-ui chromium crawler crawling headless-chrome javascript javascript-framework nodejs nodejs-framework puppeteer scraping-framework spider

Last synced: 15 May 2025

https://github.com/crawljax/crawljax

Crawljax

crawler crawling dom dynamic event-driven-crawling javascript test-generation web-analysis web-testing

Last synced: 16 May 2025

https://github.com/abhisharma404/vault

swiss army knife for hackers

crawler fuzzing hacking hacking-tool information-gathering lfi networking offensive-security osint pentesting port-scanner python rfi scanner scrapy security sqlite ssl-inspection vault xss-vulnerability

Last synced: 02 Apr 2025

https://github.com/nanshihui/Scan-T

a new crawler based on python with more function including Network fingerprint search

crawler netfingerprint python sybersecurity

Last synced: 04 May 2025

https://github.com/nanshihui/scan-t

a new crawler based on python with more function including Network fingerprint search

crawler netfingerprint python sybersecurity

Last synced: 02 Apr 2025

https://github.com/jaeksoft/opensearchserver

Open-source Enterprise Grade Search Engine Software

crawler custom-search enterprise indexing java lucene ocr opensearchserver search search-engine synonyms webcrawler webcrawling

Last synced: 04 Apr 2025

https://github.com/chenjiandongx/mmjpg

👩 美女写真套图爬虫（一）

crawler meinv

Last synced: 05 Apr 2025

https://github.com/chushuai/wscan

Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.

cel-go chromedp crawler headless martian passive-vulnerability-scanner poc sql-injection subdomains testwaf vulnerability-scanner waf webscan wscan xss

Last synced: 11 Jul 2025

https://github.com/dirtyfilthy/freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

crawler darknet hidden-services onion scraper spider tor

Last synced: 07 Apr 2025

https://github.com/stanzhai/html2article

Html网页正文提取

article content crawler html spider topic

Last synced: 16 May 2025

https://github.com/AlexMathew/scrapple

A framework for creating semi-automatic web content extractors

beautifulsoup crawler css-selector extractor lxml python scrapers scraping scrapy selector selector-expression tutorial web-scraper web-scraping xpath-expression

Last synced: 29 Mar 2025

https://github.com/stanzhai/Html2Article

Html网页正文提取

article content crawler html spider topic

Last synced: 08 Jul 2025

https://github.com/snakem982/pandora-box

A Simple Mihomo GUI.

crawler gui linux mac mihomo windows

Last synced: 28 Jan 2026

https://github.com/scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping

Last synced: 11 Apr 2025

https://github.com/yhy0/jie

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)

apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners

Last synced: 05 Apr 2025

https://github.com/cyubuchen/free_proxy_website

获取免费socks/https/http代理的网站集合

crawler free-proxy-list ip proxy proxy-checker spider

Last synced: 11 May 2025

https://github.com/shaohua0116/ICLR2020-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

conference crawler data-analysis iclr iclr2020 machine-learning visualization

Last synced: 19 Jul 2025

https://github.com/TikHub/TikHub-API-Python-SDK

High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).

api captcha-solver crawler data-api douyin douyin-tiktok-api instagram kuaishou netease-cloud-music private-api scrapy tiktok twitter weibo xiaohongshu xiguashipin

Last synced: 11 May 2025

https://github.com/snakem982/Pandora-Box

A Simple Mihomo GUI.

crawler gui linux mac mihomo windows

Last synced: 24 Mar 2025

https://github.com/AndyTheFactory/newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping

Last synced: 14 Mar 2025

https://github.com/tasos-py/Search-Engines-Scraper

Search google, bing, yahoo, and other search engines with python

bing crawler google python scraper search-engine yahoo

Last synced: 09 Jul 2025

https://github.com/lgraubner/sitemap-generator

Easily create XML sitemaps for your website.

crawler google seo sitemap sitemap-generator xml-sitemap

Last synced: 15 May 2025

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 16 Mar 2025

https://github.com/gadfly0x/signature_algorithm

各种App、小程序、网站的请求签名或加密算法。现已有：自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

crawler reverse-engineering spider

Last synced: 27 Apr 2025

https://github.com/0xMassi/webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping

Last synced: 04 Apr 2026

https://github.com/heqin-zhu/music-recover

:musical_note: 缓存文件转换为 MP3 文件

crawler mp3 python regex

Last synced: 16 Jan 2026

https://github.com/howie6879/magic_google

Google search results crawler, get google search results that you need

crawler google google-search spider

Last synced: 14 Dec 2025

https://github.com/0x676e67/wreq

An ergonomic Rust HTTP Client with TLS fingerprint

akamai boringssl crawler fingerprint http http-client http2 https impersonate ja3 ja4 requests rust scraper tls tls-client tls-fingerprint web-scraper web-scraping websocket

Last synced: 02 Aug 2025

https://github.com/smuyyh/crawlerforreader

Android 本地网络小说爬虫，基于jsoup及xpath

android bookreader crawler jsoup xpath

Last synced: 06 Apr 2025

https://github.com/shaohua0116/ICLR2019-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

crawler crawling-python openreview tutorial

Last synced: 19 Jul 2025

https://github.com/mhmdiaa/second-order

Second-order subdomain takeover scanner

crawler crawling infosec mapping penetration-testing penetration-testing-tools pentesting recon reconnaissance security security-tools web-application-security wordlist wordlist-generator

Last synced: 05 Apr 2025

https://github.com/Josue87/EmailFinder

Search emails from a domain through search engines

crawler osint

Last synced: 05 May 2025

https://github.com/python3spiders/allnewsspider

澎湃新闻，新浪新闻，腾讯新闻，搜狐新闻，新闻联播，泰晤士报，纽约时报，BBCNews，旨在爬取所有新闻门户网站的新闻，禁止将所得数据商用！

bbc-news crawler newsapi nytimes sina sohu spider tencent thetimes xwlb

Last synced: 06 Apr 2025

https://github.com/elvisyjlin/media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

crawler instagram pixiv reddit scraper tiktok tumblr twitter

Last synced: 03 Apr 2025

https://github.com/flairnlp/fundus

A very simple news crawler with a funny name

cc-news commoncrawl corpus corpus-tools crawler datasets image-classification image-extraction news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping

Last synced: 08 Jan 2026

https://github.com/mikemeliz/torcrawl.py

Crawl and extract (regular or onion) webpages through TOR network

crawler extractor onion osint python tor

Last synced: 14 Jan 2026

https://github.com/brendonboshell/supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

crawler distributed-crawler robot sitemap web-crawler

Last synced: 12 Jan 2026

https://github.com/microsoft/ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

crawler data github github-api github-webhooks ospo

Last synced: 27 Sep 2025

https://github.com/dennis-tra/nebula

🌌 An agnostic network crawler exposing comprehensive peer information and network topology information.

cid crawler filecoin golang hacktoberfest ipfs libp2p

Last synced: 09 Jun 2026

https://github.com/xorbit01/webpalm

🕸️ Crawl in the web network

crawler crawling data data-science datamining go golang hack mining osint redteam spider tool

Last synced: 15 Dec 2025

https://github.com/chishui/jssoup

JavaScript + BeautifulSoup = JSSoup

beautifulsoup crawler html javascript nodejs parser react-native spider

Last synced: 16 May 2025

https://github.com/scrapy-plugins/scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

crawler crawler-detection plugin proxy scraping scrapy

Last synced: 16 May 2025

https://github.com/XORbit01/webpalm

🕸️ Crawl in the web network

crawler crawling data data-science datamining go golang hack mining osint redteam spider tool

Last synced: 14 Apr 2025

https://github.com/duzun/hquery.php

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

broken-html crawler css-selectors domcrawler fast hquery html html-parser invalid-html jquery-like jquery-selectors parser php psr-0 psr-4 scraper selectors xml xml-parser

Last synced: 14 May 2025

https://github.com/crwlrsoft/crawler

Library for Rapid (Web) Crawler and Scraper Development

crawler crawling hacktoberfest php scraper scraping scraping-websites web-crawler web-crawling web-scraper web-scraping

Last synced: 15 May 2025

https://github.com/salimk/rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 12 Apr 2025

https://github.com/salimk/Rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 14 Mar 2025

https://github.com/Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

asr crawler douyin-api fastapi faster-whisper openai-whisper speech-recognition speech-to-text speech-to-text-api tiktok-analytics tiktok-api tiktok-crawler video-analysis whisper-ai whisper-api whisperbot

Last synced: 05 Apr 2025

https://github.com/evil0ctal/fast-powerful-whisper-ai-services-api

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

asr crawler douyin-api fastapi faster-whisper openai-whisper speech-recognition speech-to-text speech-to-text-api tiktok-analytics tiktok-api tiktok-crawler video-analysis whisper-ai whisper-api whisperbot

Last synced: 16 May 2025

https://github.com/rivermont/spidy

The simple, easy to use command line web crawler.

crawler crawling python python3 web-crawler web-spider

Last synced: 16 Jan 2026

https://github.com/misaka10843/copymanga-downloader

使用python+copymanga API来下载copymanga(拷贝漫画)中的漫画(无速率限制)，支持批量+选话下载和获取您收藏的漫画并下载及半自动获取订阅下载！(全平台支持(pypi)) Nas版本请查看copymanga-nasdownloader

comic copymanga crawler downloader python python3

Last synced: 14 Jan 2026

https://github.com/commoncrawl/news-crawl

News crawling with StormCrawler - stores content as WARC

apache-storm common-crawl commoncrawl crawler news storm-crawler warc web-crawler

Last synced: 12 Jun 2025

https://github.com/StJudeWasHere/seonaut

Open source SEO audit tool.

audit crawler crawlergo crawlers crawling docker docker-compose go golang multiuser search-engine-optimization seo seo-audit seotools web

Last synced: 23 Apr 2025

https://github.com/xiyuan-fengyu/ppspider

web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（nedb/mongodb），提供数据可视化和用户交互的实现方案

angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript