Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/heyihuang826/ncku_course
Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).
Last synced: 01 Mar 2026
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 13 Apr 2026
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 24 Mar 2025
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 18 Mar 2025
https://github.com/eneax/web-crawler
A web crawler built in Node.js
crawler javascript nodejs web-crawler
Last synced: 15 Apr 2026
https://github.com/solracsf/perplexitybot-ips
Collected PerplexityBot IPs
bots crawler ip ipset perplexity
Last synced: 15 Feb 2026
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 23 Mar 2025
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 29 Oct 2025
https://github.com/nyarla/net-paranoid-go
(WIP) A paranoidic helpers for untrusted web content crawler
crawler filtering golang helper
Last synced: 14 Jan 2026
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 22 Jun 2025
https://github.com/wilmsn/simple_deye_crawler
A simple crawler to get data from the Deye Inverter using the status webpage
crawler deye fhem inverter shell-script
Last synced: 27 May 2026
https://github.com/seart-group/github-keyword-crawler
A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints
api-mining crawler dockerized github-api miner mongodb-database python-script
Last synced: 04 Aug 2025
https://github.com/moj124/web_crawler
The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.
crawler crawler-python links-spider
Last synced: 13 Mar 2025
https://github.com/moparisthebest/nginx-limit-crawlers
rate limit crawlers in nginx
Last synced: 14 Mar 2025
https://github.com/ymdarake/otenki-crawler
Yet another weather data scraper.
Last synced: 02 Feb 2026
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 08 Oct 2025
https://github.com/tom-draper/wiki-crawl
A game of path finding through Wikipedia topics.
api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search
Last synced: 09 Mar 2026
https://github.com/romangw/lukki
Completely free code for a webcrawling bot.
crawler python web-scraping web-scraping-python
Last synced: 08 Oct 2025
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 27 Mar 2025
https://github.com/killianmeersman/wander
Convenient scraping library for Gophers
crawler data-mining golang scraper spider
Last synced: 14 Jan 2026
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 30 Mar 2025
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 22 May 2026
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 31 Jan 2026
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 07 Jun 2026
https://github.com/bernieyangmh/check-link
Checking through whole website, identifying broken links.
Last synced: 14 Jan 2026
https://github.com/knguyen780/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 25 Jun 2025
https://github.com/ryoii/hook
A declarative Java crawler framework
crawler declarative java java-crawler-framework jdk11
Last synced: 18 Mar 2025
https://github.com/kyungw00k/stealth-wright
Silent browser automation CLI with stealth capabilities
crawler go playwright stealth-automation
Last synced: 31 May 2026
https://github.com/daitangio/find
Python + SQLite search engine
crawler indexer python search-engine
Last synced: 18 Jan 2026
https://github.com/panagiotisptr/codeforces-companion
A codeforces parser, code tester and testcase generator in Go
codeforces-parser competitions crawler go golang parser test-automation testing
Last synced: 14 Jan 2026
https://github.com/namchee/hackerbits
Web Crawler dan Clustering pada website HackerNews.
Last synced: 09 Oct 2025
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 23 Mar 2025
https://github.com/dappsar/ethglobal-crawler
A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.
Last synced: 09 Oct 2025
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 09 Oct 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 11 May 2025
https://github.com/jiusanzhou/reaper
Distributed Elegant Scraper and Crawler Framework for Rust.
crawler data-scraping rust scraper spider
Last synced: 24 Jul 2025
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 15 Mar 2025
https://github.com/xyk2002/aqistudy-crawler
关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件
Last synced: 22 Aug 2025
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 18 May 2026
https://github.com/zrquan/gatherer
Gatherer 是一个简易的爬虫工具
crawler infosec pentest security
Last synced: 14 Jan 2026
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 06 Apr 2025
https://github.com/huakunshen/cron-crawler-template
Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.
Last synced: 15 May 2026
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 31 Mar 2025
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 20 May 2026
https://github.com/humbertodias/go-nie-crawler
Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.
Last synced: 03 Mar 2025
https://github.com/ninja-yubaraj/lootbin
A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.
crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin
Last synced: 11 Oct 2025
https://github.com/andreposman/magic-number
A CLI Tool/API to calculate the passive income in FII's
Last synced: 14 Jan 2026
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 02 Aug 2025
https://github.com/radityaharya/sitesweeper
Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file
crawler pdf python website-crawler
Last synced: 27 Mar 2025
https://github.com/katronquillo/grimm
Simple search engine for the Brothers Grimm Fairy Tales
Last synced: 24 Apr 2026
https://github.com/prorobot-ai/worker
A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).
crawler golang grpc-server scraper
Last synced: 29 Jul 2025
https://github.com/alphadev3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 26 Dec 2025
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 09 Aug 2025
https://github.com/hackthedev/botnet
Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!
botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web
Last synced: 17 Mar 2025
https://github.com/jenting/compare-drugstore-price
Compare price between cosmeceutical shops
cosmed crawler golang poya side-project watsons
Last synced: 27 Mar 2025
https://github.com/m-taghizadeh/persian_question_answering_voice2voice_ai
This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.
artificial-intelligence corpus-linguistics crawler deep-learning farsi farsi-datasets large-language-models machine-learning natural-language-processing persian python question-answering scraping-python speech-to-text text-to-speech transformer-architecture word2vec
Last synced: 04 May 2026
https://github.com/ignmaro/new
The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.
bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3
Last synced: 13 Oct 2025
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 19 Apr 2026
https://github.com/marcosvbras/twitton
A simple Python library to make Twitter Search API easily to use
crawler crawling python spider twitter twitter-api
Last synced: 27 Mar 2025
https://github.com/nabi-allenby/web-crawler
BFS web crawler
crawler docker k8s kubernetes reconnaissance rust rust-lang webcrawler
Last synced: 02 Mar 2026
https://github.com/atasoglu/websense
A modular AI-powered web scraper for data pipelines.
ai automation crawler data-extraction llm parsing scraper structured-output web-scraping
Last synced: 31 Jan 2026
https://github.com/claudio-code/nap-web-crawler
Created It crawler to find broken links in docs of framework and languages
Last synced: 07 Jul 2025
https://github.com/kiranjisonawane143/blockchain-data-crawler
🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 06 May 2026
https://github.com/zhima-mochi/wordpress-articles-list-generator
Auxiliary tool
Last synced: 14 Oct 2025
https://github.com/dmarcosl/upshelf-technical-test
Technical test for Upshelf
crawler interview python scraping scrapy spider technical-test web-scraping
Last synced: 09 Apr 2025
https://github.com/m1/smap
smap is a site-mapping engine written in Go.
crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling
Last synced: 01 Jul 2025
https://github.com/Mahdijamebozorg/CryptoFundamentalAnalyzer
An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.
crawler crypto cryptocurrency data-mining datamining information-retrieval llm python
Last synced: 25 Sep 2025
https://github.com/jonesrussell/pipelinex
Firecrawl-style web intelligence pipeline powered by North Cloud
Last synced: 09 Mar 2026
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 10 May 2026
https://github.com/instagram-automations/apify-instagram-scraper
apify instagram scraper data extraction tool
api apify apify-instagram-scraper automation bot crawler data-mining docker instagram nodejs playwright proxy python scraper social-media
Last synced: 14 Oct 2025
https://github.com/dpbm/opendatasus-crawler
A simple crawler using puppeteer
brazil chrome crawler csv datasus nodejs opendatasus pdf puppeteer screenshot sus
Last synced: 14 Apr 2026