Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-20 00:06:46 UTC
- JSON Representation
https://github.com/natlee/ehentai-crawler
Clone a panda yourself.
anime chrome crawler downloader ehentai ehentai-crawler exhentai python selenium
Last synced: 11 Jul 2025
https://github.com/wahengchang/node-dcard-scraper
it is an example of implementing cheerio scraper of extracting images in dcard
cheerio crawler dcard example javascript nodejs npm scraper tutorial
Last synced: 11 Apr 2025
https://github.com/DiscovAI/DiscovAI-crawl
🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.
ai api crawler embedding vector-database web-scraping
Last synced: 11 Sep 2025
https://github.com/scrape-do/scrapedo-scrapers
Web scraping examples with Scrape.do 😎
antibot crawler datascraping proxies python scraper spider web-crawler web-scraping webscraper
Last synced: 12 Jun 2026
https://github.com/szczyglis-dev/php-ultra-small-proxy
[PHP] Lightweight proxy with full support for sessions, cookies, POST/FORM submissions, and URL rewriting. The proxy offers two methods of URL rewriting: XML and Regex. It also includes features such as HTTP Auth, caching, and more.
cookies crawler crawler-php css http-client http-proxy networking proxy proxy-server webbrowser website www
Last synced: 05 Oct 2025
https://github.com/sunsetmkt/bilibili-video-reply-crawler
Python爬虫获取Bilibili视频/专栏评论
bilibili crawler github-actions python python3 spider
Last synced: 11 Apr 2025
https://github.com/DavideViolante/socialblade-com-api
Unofficial APIs for socialblade.com website.
crawler scraper scraping social social-media socialblade
Last synced: 30 Jun 2025
https://github.com/vignif/crawler-google-scholar
This bot crawls and downloads statistics and pictures from google scholar's researchers.
crawler downloading-statistics google-scholar indexes statistics
Last synced: 07 Apr 2025
https://github.com/pourmand1376/PersianCrawler
Open source crawler for Persian websites.
crawler machine-learning news python scrapy tasnim text-classification
Last synced: 09 Jul 2025
https://github.com/nothing12321/proxy-grabber
Python-based Massive Proxy Grabber. This bot grabs proxies from public websites so you can use them.
bot checker crawler grabber javascript parser proxies proxies-scraper proxy proxy-checker proxy-list proxy-parser proxy-scraper proxy-scrapper proxy-tool proxygrabber python socks socks4 socks5
Last synced: 15 Apr 2025
https://github.com/shaoxiongdu/skyeye
一个基于SpringBoot的全网热点爬虫项目,原始热搜数据会入库,分词统计会存入Redis。方便之后的数据分析。
crawler crawlers mysql redis spring spring-boot
Last synced: 31 Jul 2025
https://github.com/knovour/json-web-crawler
Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.
crawler javascript jquery json web-crawler
Last synced: 06 Oct 2025
https://github.com/bitscoper/bitscoper_cyberkit
A Flutter App: Bluetooth LE Scanner, IPv4 Subnet Scanner, mDNS Scanner, UPnP Scanner, Route Tracer, TCP Port Scanner, Pinger, File Hash Calculator, String Hash Calculator, CVSS Calculator, Base Encoder, Morse Code Translator, QR Code Generator, OGP Data Extractor, Series URI Crawler, DNS Record Retriever, WHOIS Retriever, and Wi-Fi Details Viewer
android calculator crawler cybersecurity dart decoder docker encoder extractor flutter github-action ios mac retriever scanner tracer translator web windows
Last synced: 03 Apr 2026
https://github.com/Knovour/json-web-crawler
Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.
crawler javascript jquery json web-crawler
Last synced: 10 May 2025
https://github.com/victormartinez/shub_cli
A CLI for dealing with the features of ScrapingHub
cli crawler scrapinghub scrapinghub-api scrapy shub-cli spider spiders
Last synced: 08 Feb 2026
https://github.com/arshadkazmi42/github-scanner-local
Locally scan all the repositories of a github organization
bounty bug bug-bounty crawler github local no-api scanner
Last synced: 12 Aug 2025
https://github.com/rsoury/serverless-web-crawler
Serverless Web Crawler that executes for an indefinite amount of time. Perfect for Crawling Jobs that last longer than a minute and only need to be executed once or twice a month.
boilerplate crawler fargate serverless serverless-framework template
Last synced: 23 Apr 2025
https://github.com/rmncldyo/google-reverse-image-search
A simple python wrapper designed for leveraging Google's search by image capabilities to perform reverse image searches programatically.
beautifulsoup beautifulsoup4 crawler google google-image google-image-crawler google-image-scraper google-image-search google-images google-reverse-image-crawler google-reverse-image-scraper google-reverse-image-search image image-search python python3 requests reverse-image-search scraper search-by-image
Last synced: 11 Jul 2025
https://github.com/shadawck/recon-archy
Linkedin Tools (and maybe later other source) to reconstruct a company hierarchy from scraping relations and jobs title
automation company-data crawler cybersecurity geckodriver golang linkedin organisational-analysis osint osinttool reconnaissance scraper selenium
Last synced: 13 Apr 2025
https://github.com/davideviolante/socialblade-com-api
Unofficial APIs for socialblade.com website.
crawler scraper scraping social social-media socialblade
Last synced: 07 May 2025
https://github.com/axelhahn/ahcrawler
crawler for a searchengine on your website and website analytics
crawler http-header http-header-check link-checker multilanguage-support php search-engine ssl-certificate-check website-analytics webui
Last synced: 01 May 2026
https://github.com/achannarasappa/locust
Distributed web data discovery and collection framework built for serverless
aws-lambda crawler locust scraping serverless
Last synced: 13 May 2025
https://github.com/risyasin/arachnod
High performance crawler for Nodejs
cheerio crawler javascript nodejs redis scraper spider
Last synced: 05 Apr 2025
https://github.com/gambolputty/newscorpus
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
corpus crawler news newsarticles scraper
Last synced: 16 Jan 2026
https://github.com/wuseman/wmirror
wmirror allows you to download any website from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer.
Last synced: 10 Apr 2025
https://github.com/pceuropa/youtube-crawler
Youtube crawler & scraper based on scrapy. Written in Python3.
crawler csv mariadb python3 scraper scrapy sqlalchemy youtube
Last synced: 04 May 2025
https://github.com/twtrubiks/eynycrawlermega
eyny 電影 Mega and Google 連結爬蟲 use python
Last synced: 15 Apr 2025
https://github.com/fooock/robots.txt
:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot
Last synced: 14 Feb 2026
https://github.com/valmisson/ytubes
Search for videos, playlists, channels, movies. live and musics on youtube without api key.
channel crawler live movie nodejs playlist scraper search typescript videos youtube youtube-api youtube-music youtube-search ytube
Last synced: 28 Oct 2025
https://github.com/kirillplatonov/proxy_manager
Ruby proxy manager. Gem for easy usage proxy in parser/web bots.
Last synced: 24 Apr 2025
https://github.com/ze3kr/wheres-my-offer
University Admission Portal Checker
crawler offer university university-admission
Last synced: 03 Oct 2025
https://github.com/hearmeneigh/dataset-rising
Toolchain for creating custom datasets and training Stable Diffusion (1.x, 2.x, XL) models and LoRAs
booru crawler danbooru dataset dataset-generation diffusers e621 finetuning gelbooru huggingface huggingface-diffusers imageboard imagebooru lora machine-learning ml mlops sdxl stable-diffusion training
Last synced: 12 Apr 2025
https://github.com/montferret/worker
Containerized Ferret worker
chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker
Last synced: 24 May 2026
https://github.com/selmi-karim/img-cli
An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
buffer crawler crawling downloader image-downloader image-downloading nodejs phantomjs webpage
Last synced: 12 Jul 2025
https://github.com/omarhashem123/venom
Tool designed for fast crawl and extract endpoints
Last synced: 12 Jul 2025
https://github.com/nanitefactory/chromebot
Run headless Chrome using Go.
automation bot chrome-devtools chromebot crawler developer-tools golang headless-browser headless-chrome testing web
Last synced: 12 Apr 2025
https://github.com/tn3w/flask-humanify
A strong bot protection system for Flask with many features: rate limiting, special rules for users, web crawler detection, and automatic bot detection.
bot-protection captcha crawler ddos flask python rate-limiting robot
Last synced: 01 Jul 2025
https://github.com/toannd96/crawler_web_js
Dùng scrapy-splash kết hợp lua script để crawl các trang web sử dụng Javascript (websosanh)
crawler javascript lua-script scrapy scrapy-splash splash
Last synced: 13 May 2025
https://github.com/Selbi182/SpotifyDiscoveryBot
A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!
bot crawler java music spotify spring-boot springboot sqlite
Last synced: 17 Mar 2025
https://github.com/jsrei/javascript-window-listener-library
javascript逆向开发基础组件,监听window的变化
crawler js-library js-reverse reverse-engineering web-security-research
Last synced: 19 Apr 2025
https://github.com/agenty/scrapingai
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
crawler crawling datascraping extract-data scraping webscraper webscraping
Last synced: 12 Apr 2025
https://github.com/wux1an/fake-useragent
Provide random user agent
crawler random spider ua user-agent useragent
Last synced: 11 Sep 2025
https://github.com/kasthack-labs/kasthack.osp
Генератор сырых дампов пользователей VK.
crawler crawling data-mining kasthack programmable-web vk vk-api vkapi vkontakte
Last synced: 29 Sep 2025
https://github.com/ruichongliu/Crawler_pubg.op.gg
This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取
beautifulsoup4 crawler pubg python3 scrape selenium
Last synced: 25 Mar 2025
https://github.com/hoangsonww/ai-gov-content-curator
💡An end-to-end solution for aggregating, summarizing, and displaying news articles using an AI-powered backend, an automated CRON crawler, and a responsive Next.js frontend. It integrates technologies like Express.js, MongoDB, Puppeteer, and GenAI/LLMs to deliver up-to-date, curated content to government staff and other users.
artificial-intelligence axios cheerio crawler cron cronjob docker express expressjs google-generative-ai mongodb mongoose nextjs nodejs puppeteer react shadcn-ui tailwindcss typescript vercel
Last synced: 09 Apr 2025
https://github.com/betta-cyber/netease_music_api
netease cloud music api for python
crawler data-analysis netease-cloud-music
Last synced: 30 Jul 2025
https://github.com/danhje/dead-link-crawler
An efficient, asynchronous crawler that identifies broken links on a given domain.
async broken-links crawler dead-links python python3
Last synced: 23 Jun 2025
https://github.com/shavit/crawlero
Distributed web crawlers. Fault tolerance, user-agent randomizer, RabbitMQ, Tor, PostgreSQL.
crawler marketing-automation marketing-tools pbn proxy rabbitmq tor
Last synced: 15 Jul 2025
https://github.com/ikergarcia1996/questionclustering
Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY
clustering crawler deep-learning inteligencia-artificial machine-learning natural-language-processing nlp pln sentiment-analysis techonology unsupervised-machine-learning word-embeddings
Last synced: 05 Oct 2025
https://github.com/douglasdcm/caqui
Run synchronous and asynchronous commands in WebDrivers
appium asynchronous crawler python scraper synchronous webdriver winappdriver winium
Last synced: 01 Apr 2026
https://github.com/clasense4/scrapy-bhinneka-crawler
Scraping bhinneka.com, just for fun
Last synced: 17 Dec 2025
https://github.com/refraction-ray/wos-statistics
The crawler for data on web of science, especially focus on the analysis of citation data
aiohttp citation crawler webofscience
Last synced: 14 Oct 2025
https://github.com/src-d/rovers
Rovers is a service to retrieve repository URLs from multiple repository hosting providers.
Last synced: 05 May 2025
https://github.com/ravern/gollum
Robots.txt parser and fetcher for Elixir
crawler elixir robots-parser robots-txt
Last synced: 11 Dec 2025
https://github.com/isolateob/exiainvasion
一个从 blablalink 获取Nikke数据并生成练度表的开源浏览器插件。A Chrome-extension that obtains Nikke character data from blablalink and generates progress tracker.
chrome-extension crawler javascript material-ui nikke-goddess-of-victory python react vite
Last synced: 09 Apr 2026
https://github.com/twtrubiks/crawler_click_tutorial
click tutorial ( crawler ) use python
click command-line-tool crawler python tutorial
Last synced: 07 Oct 2025
https://github.com/matheuscas/pynfce
Busca e extrai dados de uma NFCe dada sua URL de acesso.
Last synced: 17 Jun 2025
https://github.com/fedebotu/neurips2022-openreviewdata
Crawl & Visualize NeurIPS 2022 Data from OpenReview
crawler dataset neurips neurips-2022 openreview peer-review review scraper
Last synced: 09 Apr 2025
https://github.com/sadeghhayeri/twitter-friend-connections
Visualizing Twitter Friend Connections
crawler data gephi gephi-visualizations graph jupyter-notebook network-analysis networkx twitter twitter-api twitter-crawler visualization
Last synced: 30 Apr 2025
https://github.com/MontFerret/worker
Containerized Ferret worker
chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker
Last synced: 03 Apr 2025
https://github.com/cyclone-github/spider
URL Spider - web crawler and wordlist / ngram generator
cewl crawler cyclone generator gramify n-gram ngram ngram-generator scaping scraper spider url url-crawler url-spider web web-crawler web-scraping wordlist wordlist-generator
Last synced: 07 Apr 2025
https://github.com/gabrielguarisa/brdata
Brazilian financial market data sources
Last synced: 12 Apr 2025
https://github.com/dev-chenxing/jjwxc-crawler
基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档 | A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy
chinese cli crawler docx download jjwxc open-source python scraping scrapy terminal word
Last synced: 09 Apr 2025
https://github.com/cybercongress/crawler
A toolchain for bringing web2 to web3
cosmos-sdk crawler cyber cyberd ipfs web3 wiki
Last synced: 15 Dec 2025
https://github.com/chinmayrane16/scraping-amazon-for-mobile-details-with-scrapy
Scraping Amazon website using Proxies for extracting Mobile details
amazon-scraper crawler googlebot json proxy pycharm pypiwin32 scrapy user-agents
Last synced: 18 Mar 2025
https://github.com/qieguo2016/doffy
a web auto run lib base on chrome headless
casper chrome-headless crawler nightmare uitest
Last synced: 13 Jul 2025
https://github.com/ezzcodeezzlife/scraper-instagram
Scrape data from Instagram without applying for the authenticated API 🎯
auth authentication crawler ig instagram instagram-api instagram-client instagram-scraper javascript js nodejs npm scraper scraper-instagram scraping wrapper
Last synced: 17 Aug 2025
https://github.com/a3r0id/httpscan
Scan a host for open HTTP ports and gain information about the services present.
crawler hacking hacking-tool http low-level penetration-testing pentest pentesting portscan portscanner scan scanner scanner-web scraper security service-discovery
Last synced: 06 Apr 2025
https://github.com/maxgio92/krawler
A crawler for kernel releases distributed by the major Linux distributions.
Last synced: 22 Mar 2025
https://github.com/minicli/curly
Simple Curl Client
crawler curl hacktoberfest php
Last synced: 22 Jun 2025
https://github.com/somnisomni/twitter-account-data-crawler
Crawl and track followers count of Twitter account
crawler crawling follower-count follower-tracker selenium selenium-python twitter twitter-api twitter-crawler twitter-crawling
Last synced: 12 Jul 2025
https://github.com/emijrp/internet-archive
Scripts for Internet Archive
archive archiving crawler digital-preservation internet-archive webpage website
Last synced: 21 Jun 2025
https://github.com/supadata-ai/mcp
Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.
ai crawler llm mcp scrape tiktok transcript whisper youtube
Last synced: 14 Oct 2025
https://github.com/ivan-sincek/scrapy-scraper
Web crawler and scraper based on Scrapy and Playwright's headless browser.
bug-bounty crawler crawling downloader downloading ethical-hacking headless-browser javascript offensive-security penetration-testing python red-team-engagement scraper scraping scrapy security spider spidering web web-penetration-testing
Last synced: 15 Apr 2025
https://github.com/wearetyomsmnv/gptbuster
Generative web directory fuzzer,crawling and subdomain checker based on chatgpt
crawler gpt hacking pentesting python3 reconnaissance web
Last synced: 13 Apr 2025
https://github.com/xiaoluoboding/metafy-svg
Easily crawl a website's metadata and generate SVG as a service.
crawler metadata saas serverless-functions svg vercel-serverless
Last synced: 23 Mar 2025
https://github.com/charles-hsiao/python-flightradar
Python airline/flights data crawler
airlines crawler flightradar flightradar24 flights python python-crawler python3
Last synced: 08 Jul 2025
https://github.com/hoc081098/comic_app_server_nodejs
Node.js sever for android comic app | https://comic-app-081098.herokuapp.com/
comic-app crawler nodejs nodejs-crawler nodejs-typescript typescript
Last synced: 06 Mar 2026
https://github.com/niloysikdar/go-imdb-crawler
Want to know which celebrities have a common birthday with yours? 👀 Get the full data about them. Made using Go + Colly
Last synced: 23 Oct 2025
https://github.com/floschnell/flatcrawl-processors
A set of processors that will instantly inform users via a set of channels (ie. Telegram) of new flats that are found on different rental websites.
bot crawler flatcrawl flats real-estate rentals-search telegram
Last synced: 01 Feb 2026
https://github.com/bgadrian/warmcache
A simple tool to scan your website to keep your cache hot & ready. Helper tool for Prerender, Squid, CDN etc..
cache cdn crawler go golang prerender prerenderio squid
Last synced: 13 Apr 2025
https://github.com/gridaco/figma-archives
Figma Files Scraper for Research & Studies
crawler dataset design-database figma machine-learning scrapy selenium
Last synced: 06 Oct 2025
https://github.com/saltyshiomix/web-master
Web mastering tools for my personal services
crawler javascript nodejs scraper typescript web
Last synced: 16 Mar 2025
https://github.com/dxsooo/shortvideocrawl
Short video crawler based on scrapy
crawler kuaishou scrapy spider video-crawler
Last synced: 26 Jul 2025
https://github.com/freekatz/jd_sentiment_analysis
一个简单的京东商品评论爬虫、处理、可视化、情感分析与模型评估实践
Last synced: 09 Apr 2025
https://github.com/1491270550/xueqiu_spider_lqh_lzq
雪球爬虫 高效爬取近期沪深A股股票评论并自动生成PDF版情感分析报告
crawler python3 spider xueqiu xueqiu-stock
Last synced: 12 Jun 2025
https://github.com/fanhuaandluomu/qqzoneparse
模拟登陆QQ空间,获取好友信息,并做分析(年龄分布、性别分布、地址分布等)具体参见说明文档及1049755192文件夹下的分析结果展示。
crawler python27 qqzone spider
Last synced: 01 May 2025
https://github.com/wangy8961/python3-concurrency-pics-01
爬虫多线程或异步下载 http://gank.io/api/data/%E7%A6%8F%E5%88%A9/1000/1 所分享的美女图片
aiohhtp asyncio coroutine crawler progressbar python3 requests threadpool
Last synced: 10 Jun 2025
https://github.com/ototot/judgegirl-scoreboard
A Fancy Scoreboard for JudgeGirl
crawler judgegirl judgegirl-scoreboard php scoreboard tocas-ui tocasui vuejs vuejs2
Last synced: 15 Apr 2025
https://github.com/amirzenoozi/insta-downloader
You Can Download Instagram Post With This Script
crawler crawling downloader instagram
Last synced: 20 Jul 2025