Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-22 00:06:47 UTC
- JSON Representation
https://github.com/ronin-rb/ronin-web-spider
A collection of common web spidering routines
crawler infosec recon ruby scraper spider utils web websecurity
Last synced: 01 Aug 2025
https://github.com/xunzhuo/airspider
A Fast and Light Python Spider Framework 🕷️
asynchronous crawler crawler-python distributed python3 redis spider spider-framework web
Last synced: 23 Mar 2025
https://github.com/helingfeng/stay-reader
📚Miniprogram Book Reader
crawler laravel-application miniprogram php
Last synced: 30 Jul 2025
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 07 May 2025
https://github.com/michaelradu/web-crawler
A Web Crawler developed in Python.
crawler crawler-python crawlers python python-3 python-script python3 script scripting scripting-language scripts web web-crawler web-crawler-python web-crawlers web-crawling webcrawl webcrawler webcrawling
Last synced: 25 Jul 2025
https://github.com/nerohin/millions-crawler
Homework III of NCKU course WEB RESOURCE DISCOVERY AND EXPLOITATION , I've used the distribute crawler to crawling over miliion web page.
crawler distributed scrapy spider web-crawler
Last synced: 09 Feb 2026
https://github.com/gbolmier/newspaper-crawler
:spider: An autonomous French newspaper crawler based on Scrapy framework
Last synced: 11 Apr 2025
https://github.com/twtrubiks/google-play-store-spider-selenium
Google-Play-Store-spider use Selenium +Beautiful Soup on Python
beautifulsoup chrome crawler firefox python selenium spider sqlite
Last synced: 15 Apr 2025
https://github.com/flute/instagram-crawler
instagram crawler, downloads all video and photos from users or tags
crawler instagram instagram-crawler instagram-downloader
Last synced: 11 Jun 2025
https://github.com/anikhasibul/stackoverflow-scraper-messenger-bot
A messenger bot that answers messages by scraping stackoverflow questions and answers
chatbot crawler messenger-bot scrapper stackoverflow
Last synced: 09 Apr 2025
https://github.com/ammarfaizi2/newsscraper
News Scraper
api api-service crawler scraper web-scrapper
Last synced: 14 Apr 2025
https://github.com/SupervisedCo/HyperCrawlTurbo
HypercrawlTurbo is a turbocharged web scraper for extracting URLs from a webpage.
ai crawler ml nlp retrieval retrieval-augmented-generation
Last synced: 29 Jul 2025
https://github.com/vndee/visee
Just a typical search engine in this universe :fire::fire::fire:
crawler django docker e-commerce elasticsearch flask kafka python visual-search
Last synced: 26 Jun 2025
https://github.com/alaouimehdi1995/simplified-search-engine
Multithreaded Web Crawler, Scraper, Indexer
container crawl crawler crawling database docker docker-compose engine index indexer indexing mongodb python python-3 scraper scraping search-algorithm search-engine searching
Last synced: 06 Mar 2025
https://github.com/pi-2r/devoxxfr2025-tock-studio-ia-gen
Projet issu du codelab Devoxx France 2025 “À la recherche du RAG perdu” : atelier de 3h pour apprendre à créer un chatbot IA Générative autonome, local et sans Internet, basé uniquement sur des frameworks open source
ai chatbot crawler devoxx devoxx-fr-2025 docker generative-ai jailbreak kotlin langchain langfuse localai mistral ollama open-source rag scrapoxy scrapy
Last synced: 07 Oct 2025
https://github.com/dori-dev/flask-corona-info
Live Corona statistics and information site with flask.
coronavirus-real-time coronavirus-tracking crawler flask python python3 scrapy spider
Last synced: 13 Sep 2025
https://github.com/kevincobain2000/go-app-reviews-scraper
Apple app store reviews and ratings scraper.
applestore applestoreconnect crawler ios iosapp ratings ratings-extractor reviews reviewscrapper scraper
Last synced: 12 May 2025
https://github.com/azimjohn/musicspider
MusicSpider API
crawler flask google-assistant music
Last synced: 01 Mar 2025
https://github.com/wisecirno/wechat-official-account-toolkit
处理微信公众号文章的工具包
crawler image-downloader wechat wechat-official-account
Last synced: 15 Jun 2025
https://github.com/omkarcloud/web-scraping-template
🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING WEB SCRAPING BOTS. 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 24 Oct 2025
https://github.com/sinramyeon/go_slack_bot
고언어 기반 슬랙 크롤링 봇입니다. Slack interactive bot made by go, including rss feed parsing, web crawling, github commit alarm
bot crawler github-api go golang rss-feed rss-feed-scraper slack slackapi slackbot
Last synced: 16 Jan 2026
https://github.com/Ivan-Alone/InstaStories-Saver
Program to saving Instagram Stories
api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories
Last synced: 13 Jul 2025
https://github.com/tosone/githubtraveler
Travel all of the GitHub users, orgs, repos.
Last synced: 11 Jun 2025
https://github.com/shawon922/jobs-crawler
Crawl IT/Telecommunication jobs from bdjobs.com
beautifulsoup4 crawler python3
Last synced: 22 Apr 2025
https://github.com/visuellverstehen/t3fetch
Fetches a website (including all subpages), so the TYPO3 cache gets filled.
cache crawler fetch typo3 typo3-extension
Last synced: 03 Aug 2025
https://github.com/tsoliangwu0130/spotify-news
A Flask application to retrieve the singers' latest news according to your Spotify current playing song.
bootstrap crawler flask oauth2 python3 restful-api spotify-api
Last synced: 26 Apr 2025
https://github.com/blesstosam/registerappleid
a node js program for registering appleid automatically
Last synced: 14 Mar 2026
https://github.com/pawod/gis-berlin-rents
A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
apartment-rents berlin crawler gis immobilienscout24
Last synced: 03 Apr 2025
https://github.com/webcoast-dk/versatile-crawler
Extendable and easy to use crawler extension for TYPO3 CMS
crawler extendable indexing search typo3
Last synced: 18 Oct 2025
https://github.com/igeligel/BackpackLogin
:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.
bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2
Last synced: 05 May 2025
https://github.com/mediamonks/symfony-crawler-bundle
Implements the crawler package into Symfony
crawler php symfony symfony-bundle
Last synced: 28 Jul 2025
https://github.com/chipscoco/oceanmonkey
OceanMonkey is a High-Level Distributed Web Crawling and Web Scraping framework base on multi-process and multi-coroutines, used to crawl websites and extract structured data from their pages like the classical scrapy framework.
coroutines crawler multiprocessing python python3 scraper scraping spider
Last synced: 11 Sep 2025
https://github.com/a252937166/quick-selenium
主要使用quick-spring和selenium两个框架爬取各种动态网页的信息
Last synced: 20 Jul 2025
https://github.com/sabinbajracharya/insta-crawler
Pulls data from instagram and saves it to Firebase for storage and Algolia for search
accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper
Last synced: 03 Jul 2025
https://github.com/spekulatius/spatie-crawler-cached-queue-example
Example to demonstrate the usage of cached queues across multiple requests.
crawler crawler-engine laravel php-crawler php-scraper queues spatie-crawler
Last synced: 01 May 2025
https://github.com/bbc2/discolinks
Command-line tool which checks a website for broken links.
broken-links crawler html http link-checker link-checkers link-checking validator web
Last synced: 22 Mar 2025
https://github.com/duongdev/facebook-group-crawler
Facebook Groups Discussions Crawler
crawler facebook groups puppeteer
Last synced: 01 May 2025
https://github.com/insign/spatie-crawler-queue-with-laravel-model
Spatie's Crawler with Laravel Model as Queue
cache crawler eloquent laravel queues spatie spatie-crawler
Last synced: 15 Apr 2025
https://github.com/twtrubiks/pttstatistics
統計PTT看板推文 or 文章標題 熱門關鍵詞 on python
crawler ptt ptt-hot-key python statistics
Last synced: 15 Apr 2025
https://github.com/umihico/minigun-requests
Web scraping API to outsource tons of GET & xpath to cloud computing
crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping
Last synced: 13 Apr 2025
https://github.com/activatedgeek/winemag-dataset
Dataset of Wine Reviews from Wine Enthusiast Magazine :grapes: :wine_glass: :earth_asia:
crawler dataset python3 scrapy scrapy-spider vega-lite visualization wine wine-tasting
Last synced: 09 Oct 2025
https://github.com/samiahmedsiddiqui/http-auth
Provides comprehensive security during development by protecting your entire site and your admin pages from brute-force attacks.
admin auth authentication brute-force brute-force-attacks crawl crawler http-auth http-authentication locked login restrict-pages restrict-site wordpress wordpress-plugin
Last synced: 12 Apr 2025
https://github.com/capturr/jsonld-extract
A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).
cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript
Last synced: 24 Mar 2025
https://github.com/bdadam/metatag-crawler
This is a simple node.js module for scraping meta information from web pages.
crawler metadata nodejs parser
Last synced: 26 Jun 2025
https://github.com/sabinbajracharya/Insta-crawler
Pulls data from instagram and saves it to Firebase for storage and Algolia for search
accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper
Last synced: 12 Apr 2025
https://github.com/trungdq88/movie-showtimes
Web Service & Android Application to look up Vietnam movie showtimes
crawler java movie-showtimes theater
Last synced: 12 Apr 2025
https://github.com/0memo07/web-crawler
Web Crawler with Python
beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler
Last synced: 24 Apr 2025
https://github.com/irq0/llar
🖖 Live Long and Read! A self-hosted news aggregator focused on customizability.
clojure crawler feed-reader hackernews-api news-aggregator news-reader reddit-api rss rss-reader
Last synced: 30 Jan 2026
https://github.com/keul/allanon
A Web crawler that visit a predictable set of URLs, and automatically download resources you want from them
Last synced: 28 Apr 2025
https://github.com/petersonjr/MetadataCrawler
A simple tool to extract metadata from relational databases
avro crawler database-schemas java jdbc metadata rdms relational-databases
Last synced: 06 May 2025
https://github.com/flute/coub-crawler
coub.com crawler, download all videos.
coub coub-com-crawler coub-crawler crawler
Last synced: 01 Mar 2026
https://github.com/pps-22-scooby/pps-22-scooby
Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.
crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers
Last synced: 24 Oct 2025
https://github.com/igeligel/backpacklogin
:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.
bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2
Last synced: 15 May 2025
https://github.com/eight04/ptt-mail-backup
一個用來抓取 PTT 站內信的 BBS Bot
bbs cli crawler ptt ptt-crawler python python3
Last synced: 05 Jul 2025
https://github.com/sebobo/shel.crawler
Neos based crawler for nodes and sites
Last synced: 12 Apr 2025
https://github.com/piotrpdev/WeBuy-Cex-Price-Tracker
A python script that gets the prices of certain Cex products and uploads them to google sheets
cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex
Last synced: 10 Mar 2025
https://github.com/luizppa/web-crawler
A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.
chilkat cpp crawler webcrawler
Last synced: 17 Aug 2025
https://github.com/luyadev/luya-module-crawler
Crawle a Website and provide intelligent search results
crawler hacktoberfest intelligent-search luya search yii2
Last synced: 25 Oct 2025
https://github.com/fanzeyi/torchic
A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.
Last synced: 28 Apr 2025
https://github.com/bajins/scripts_python
Python 脚本
crawler faker faker-generator python-3 python3 rclone rclone-client rclone-config rclone-configuration reptile reptile-image reptiles scraper spider
Last synced: 03 Oct 2025
https://github.com/yerkopalma/bash-crawler
:computer: Get a site links with bash
Last synced: 05 Aug 2025
https://github.com/bfwg/node-tinycrawler
Tiny web-crawler in a nute shell for Node.js
Last synced: 10 Nov 2025
https://github.com/adambankz/tiktok-scraper
A simple, no download scraper for social media platforms like TikTok. Just input parameters and parse useful data. Download TikTok videos with no watermark
crawler no-watermark parse scraper scraper-site tiktok-no-watermark tiktok-scraper
Last synced: 19 Feb 2026
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 14 Jan 2026
https://github.com/vshawn/tutiempo_crawler
a crawler for climate data on en.tutiempo.net
climate-data crawler tutiempo-crawler
Last synced: 15 May 2025
https://github.com/print3m/pathfinder
The ultimate crawler designed for lightning-fast recursive URL scraping.
bugbounty-tool crawler crawlergo go golang information-gathering infosec osint osint-reconnaissance path-extractor pathfinder pentesting scraper webscraping
Last synced: 23 Jun 2025
https://github.com/sweeticelolly/sao_title_bot
一个生成骚论文题目的机器人
chrome-dr chromedriver crawler generator language-learning language-model numpy python robot scholar scholarly-articles selenium selenium-webdriver
Last synced: 25 Jul 2025
https://github.com/appliedsoul/promise-crawler
Promise support for node-crawler (Web Crawler/Spider for NodeJS + server-side jQuery)
crawler node-crawler nodejs promise-node-crawler spider
Last synced: 28 Feb 2026
https://github.com/omilab/internet-archive-link-extractor
Tool for extracting external links of a URL from Internet Archive snapshots
Last synced: 18 Jul 2025
https://github.com/thesp0nge/nightcrawler-mitm
A python program that crawls a website and tries to stress it, polluting forms with bogus data
crawler offensive-scripts offensive-security stress-test web-crawler web-crawling
Last synced: 30 Apr 2025
https://github.com/ppoak/crawler
Python爬虫相关的笔记、脚本,一些自动化获取数据的工具;微博搜索自动化爬取;股吧评论多进程高速爬取;电影自动下载等...
Last synced: 27 Feb 2026
https://github.com/xvc323/omnidocs
Automated documentation crawler that generates LLM-friendly Markdown from any docs site. Export as single or multi-file, ready for AI ingestion.
crawler documentation llm markdown
Last synced: 27 Jun 2025
https://github.com/vmarcosp/supervise-crawler
:male_detective: Supervise crawler
crawler esy ocaml reasonml webcrawler
Last synced: 13 May 2025
https://github.com/tghoul/spider914j
91 web spider for java.
91porn crawler spring-boot webmagic
Last synced: 11 Jul 2025
https://github.com/markmelnic/mobile-de-crawler
A crawler for mobile.de to index all car listings on the website.
crawler requests scraper sqlite3
Last synced: 08 Oct 2025