Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/tjdsneto/jcnet-crawler
Extract (scrap) movie schedule info from JCNet movies page
Last synced: 11 Apr 2026
https://github.com/heyihuang826/ncku_course
Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).
Last synced: 01 Mar 2026
https://github.com/tetreum/xupopter_chrome_extension
Extension to easily create crawling recipes
crawler scrapper scrapping webscraper
Last synced: 04 Apr 2025
https://github.com/thesurlydev/surly-spider
A command line interface for the spider library
crawl crawler rust spider surly surly-spider
Last synced: 16 Feb 2026
https://github.com/billy0402/tibame-python-data-analysis
A learning project from TibaMe Python data analysis course.
ai course crawler jupyter-notebook matplotlib pandas python requests
Last synced: 10 Apr 2026
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 09 Aug 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 29 Apr 2026
https://github.com/hong539/ip_lookup
For ip_lookup with some Public or Private API
Last synced: 19 Aug 2025
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 13 Apr 2026
https://github.com/nyarla/net-paranoid-go
(WIP) A paranoidic helpers for untrusted web content crawler
crawler filtering golang helper
Last synced: 14 Jan 2026
https://github.com/jonesrussell/pipelinex
Firecrawl-style web intelligence pipeline powered by North Cloud
Last synced: 09 Mar 2026
https://github.com/abx123/crawler
Simple lambda function to crawl daily web novel updates.
crawler firebase-database golang lambda-functions
Last synced: 28 Mar 2025
https://github.com/luickk/vulnerability-crawler
Small python program meant to analyze random sites found on google for any vulnerabilities!
Last synced: 20 Aug 2025
https://github.com/atasoglu/websense
A modular AI-powered web scraper for data pipelines.
ai automation crawler data-extraction llm parsing scraper structured-output web-scraping
Last synced: 31 Jan 2026
https://github.com/rabattkarte/free-domain-scanner
crawler dns domain domain-name domain-names go golang scanner whois
Last synced: 26 May 2026
https://github.com/bockstaller/europarl-crawler
Crawler for the documents published by the European Parliament
crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union
Last synced: 16 May 2026
https://github.com/mohammadreza-mohammadi94/python-webscraper-projects
A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.
bs4 crawler object-oriented-programming python requests scrapy webscraping
Last synced: 13 Jul 2025
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 24 Mar 2025
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 08 Oct 2025
https://github.com/romangw/lukki
Completely free code for a webcrawling bot.
crawler python web-scraping web-scraping-python
Last synced: 08 Oct 2025
https://github.com/jayzhan211/python-crawler-startups
python crawler learning
Last synced: 20 Mar 2025
https://github.com/killianmeersman/wander
Convenient scraping library for Gophers
crawler data-mining golang scraper spider
Last synced: 14 Jan 2026
https://github.com/intina47/ee_error
implementation of a web crawler using c++
cpp crawler curl gumbo libcurl stanford-nlp web
Last synced: 31 Jan 2026
https://github.com/tetreum/xupopter_runner
Executes crawling recipes coming from Xupopter Chrome Extension.
crawler scrapper scrapping webscraper
Last synced: 08 Aug 2025
https://github.com/tetreum/xupopter_client
Simple interface to manage Xupopter recipes aswell as it's runners.
crawler scrapper scrapping webscraper
Last synced: 04 Apr 2025
https://github.com/bernieyangmh/check-link
Checking through whole website, identifying broken links.
Last synced: 14 Jan 2026
https://github.com/moparisthebest/nginx-limit-crawlers
rate limit crawlers in nginx
Last synced: 14 Mar 2025
https://github.com/krishpranav/gozap
⚡️ Multiple target ZAP Scanning made in go
cli crawler go go-crawler golang zap
Last synced: 27 Mar 2025
https://github.com/kyungw00k/stealth-wright
Silent browser automation CLI with stealth capabilities
crawler go playwright stealth-automation
Last synced: 31 May 2026
https://github.com/daitangio/find
Python + SQLite search engine
crawler indexer python search-engine
Last synced: 18 Jan 2026
https://github.com/panagiotisptr/codeforces-companion
A codeforces parser, code tester and testcase generator in Go
codeforces-parser competitions crawler go golang parser test-automation testing
Last synced: 14 Jan 2026
https://github.com/namchee/hackerbits
Web Crawler dan Clustering pada website HackerNews.
Last synced: 09 Oct 2025
https://github.com/dappsar/ethglobal-crawler
A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.
Last synced: 09 Oct 2025
https://github.com/grayhat12/grawler
A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.
crawler scraping scraping-websites scrapper scrapy-crawler
Last synced: 27 Mar 2025
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 09 Oct 2025
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 18 Mar 2025
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 31 Jan 2026
https://github.com/mohitk05/drstrange
A simple breadth-first search web crawler
Last synced: 22 Aug 2025
https://github.com/jofaval/open-graph-visualizer
Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol
crawler javascript opengraph scraping web web-scraping
Last synced: 08 Sep 2025
https://github.com/kartikmehta8/pycrawler
PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.
Last synced: 13 Sep 2025
https://github.com/pengkobe/my-web-crawler
auto pull blog update from bloggers. dev based on angular2
Last synced: 18 May 2026
https://github.com/leegeunhyeok/python-gongucrawler
파이썬3 공유마당 이미지 및 상세정보 크롤러
Last synced: 24 Aug 2025
https://github.com/zrquan/gatherer
Gatherer 是一个简易的爬虫工具
crawler infosec pentest security
Last synced: 14 Jan 2026
https://github.com/onetail/crawler-with-kafka-docker
homework to crawler and anaylsis
Last synced: 18 Mar 2025
https://github.com/jimut123/leaderbehaviour
Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!
crawler leaderbehaviour newsscraper scrapy timesofindia
Last synced: 16 Jan 2026
https://github.com/waived/google-drive-crawler
Proxy-based crawler to expose public (shared) Google Drive links
crawler crawler-python file-crawler google-drive-api shared-folders web-spider
Last synced: 27 Mar 2025
https://github.com/ninja-yubaraj/lootbin
A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.
crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin
Last synced: 11 Oct 2025
https://github.com/andreposman/magic-number
A CLI Tool/API to calculate the passive income in FII's
Last synced: 14 Jan 2026
https://github.com/matheusfaustino/phrawl
Phrawl: A web crawling framework in PHP (or it seems so)
crawler crawling crawling-framework php scraper wip
Last synced: 08 Sep 2025
https://github.com/avsbharadwaj/web_crawler
A basic web crawler that prints out the links and description present on a website rescursively
Last synced: 21 Apr 2026
https://github.com/katronquillo/grimm
Simple search engine for the Brothers Grimm Fairy Tales
Last synced: 24 Apr 2026
https://github.com/kofj/octopus
Octopus an open source software to collect data from web pages.
Last synced: 15 May 2026
https://github.com/joaooliveirapro/trawlergo
TrawlerGo 🐛 is a basic HTTP crawler written in Go, designed to efficiently discover all URLs within a specified domain while capturing related HTTP request information.
Last synced: 09 Jun 2026
https://github.com/ashwantmanikoth/intellilsearch
This is a AI powered crawler that can search the web for information based on your input.
crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation
Last synced: 15 Apr 2026
https://github.com/jonasrenault/pubchem-api-crawler
Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.
chemistry crawler molecular-formula pubchem python
Last synced: 15 May 2026
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 09 Jul 2025
https://github.com/lucasromualdo/glassdoorcrawler
Crawler em Python para coletar vagas do Glassdoor e exportar para Excel
cli crawler glassdoor openpyxl pandas python web-scraping
Last synced: 25 Feb 2026
https://github.com/ignmaro/new
The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.
bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3
Last synced: 13 Oct 2025
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 01 Sep 2025
https://github.com/kahsolt/qzone_mood_dumper
Dump your qzone mood(说说) history to local SQL database storage
Last synced: 25 Aug 2025
https://github.com/nabi-allenby/web-crawler
BFS web crawler
crawler docker k8s kubernetes reconnaissance rust rust-lang webcrawler
Last synced: 02 Mar 2026
https://github.com/xoraus/revieworacle
The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.
ai crawler datascience machinelearning scrappy selenium-webdriver
Last synced: 07 May 2026
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 23 Feb 2025
https://github.com/lolyratul025/web-email-bundler
A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.
crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping
Last synced: 22 May 2026
https://github.com/mevljas/gov.si-crawler-playwright
A standalone crawler that crawls only .gov.si web sites using Playwright.
crawler multithreading playwright sqlachemy
Last synced: 19 Jan 2026
https://github.com/hdevlinz/affiliate-chrome-extension
chrome-extension crawler tiktok
Last synced: 14 May 2025
https://github.com/g-ongenae/morphalou-crawler
A Crawler for CNRTL's Morphologie words
crawler french lexical-databases list-of-words words
Last synced: 25 Feb 2025
https://github.com/jonesrussell/north-cloud
A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.
Last synced: 25 Jan 2026
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 06 Apr 2025
https://github.com/faridfr/dribbble-crawler-php
Dribbble crawler with PHP
crawler dribbble dribbble-crawler php php-crawler user-interface
Last synced: 17 Mar 2025
https://github.com/luminovrym/crawler-tools-js
Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web
crawler crawler-js data js web-scraping
Last synced: 08 Sep 2025
https://github.com/crosscutsaw/iscsicrawler
iscsicrawler is a bash script that crawls files in the iscsi targets with ease.
crawler iscsi iscsi-target iscsiadm
Last synced: 16 Jan 2026
https://github.com/constaf79/pycn
🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.
cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web
Last synced: 14 May 2026