Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-07-01 00:06:39 UTC
- JSON Representation
https://github.com/jongwony/boardgame_finder
나무위키의 보드게임 카테고리를 모두 크롤링해서 특정 필터를 걸기 위한 프로젝트입니다.
asyncio crawler namuwiki python38
Last synced: 27 Feb 2026
https://github.com/basemax/my-site-url-finders
A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.
crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder
Last synced: 15 Oct 2025
https://github.com/supratikchatterjee16/serp_bot
A generic SERP bot, that can be used with just about any search engine.
bot crawler python requests scraping search serp user-agent-spoofer
Last synced: 14 Dec 2025
https://github.com/dimitar0528/crawlitics
An AI-powered Next.js and Python-based ecommerce web crawler, scraper and data-analyst platform that transforms scattered product data into clear market insights.
crawler nextjs product-analysis python scraper
Last synced: 08 Sep 2025
https://github.com/juan-kabbali/glassdoor-linkedin-web-scrapper
CLI application that acts as web scrapper to retrieve Glassdoor and LinkedIn information
Last synced: 29 Jan 2026
https://github.com/birdroad1/server-pinger
Server pinger for Minecraft written in C++
cpp crawler make minecraft minecraft-scanner postgres scanner server
Last synced: 14 Apr 2026
https://github.com/bujosa/aldebaran
Example use APP ENGINE with Python3, ThreadPool and webScraping
appengine crawler flask gcp python3 thread-pool
Last synced: 19 Oct 2025
https://github.com/fa7ad/aiub-notes-dl
Download all notes from AIUB's portal
Last synced: 12 Mar 2025
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 21 Mar 2025
https://github.com/igorbrizack/web-scraper
Aplicação de raspagem de dados HTML, construída em python.
crawler pytest python3 scraper
Last synced: 08 May 2026
https://github.com/hong539/acgbox_crawler
An web-crawler for gamer.com.tw/acgbox
beautifulsoup4 crawler pandas python requests scrapy sqlalchemy web-crawler
Last synced: 05 Apr 2025
https://github.com/russellsteadman/netscrape
A Node.js framework for creating good bots
bot crawler crawling exclusion rfc9309 scraper scraping web-scraping
Last synced: 20 Jun 2026
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Apr 2026
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 10 May 2026
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 29 Jan 2026
https://github.com/atasoglu/websense
A modular AI-powered web scraper for data pipelines.
ai automation crawler data-extraction llm parsing scraper structured-output web-scraping
Last synced: 31 Jan 2026
https://github.com/intina47/ee_error
implementation of a web crawler using c++
cpp crawler curl gumbo libcurl stanford-nlp web
Last synced: 31 Jan 2026
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 31 Jan 2026
https://github.com/ashwantmanikoth/intellilsearch
This is a AI powered crawler that can search the web for information based on your input.
crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation
Last synced: 15 Apr 2026
https://github.com/lucasromualdo/glassdoorcrawler
Crawler em Python para coletar vagas do Glassdoor e exportar para Excel
cli crawler glassdoor openpyxl pandas python web-scraping
Last synced: 25 Feb 2026
https://github.com/mevljas/gov.si-crawler-playwright
A standalone crawler that crawls only .gov.si web sites using Playwright.
crawler multithreading playwright sqlachemy
Last synced: 19 Jan 2026
https://github.com/constaf79/pycn
🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.
cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web
Last synced: 14 May 2026
https://github.com/dasantonym/node-cesspoll
:poop: Turd Miner Node Module
crawler news poopetry potty-humour
Last synced: 28 Oct 2025
https://github.com/huyduc1602/uniapp-crawler
Crawl và Dịch tài liệu Uni-app
Last synced: 25 Jan 2026
https://github.com/martincastroalvarez/web-to-pdf
Web crawlers using Python & Beautiful Soup
Last synced: 08 Apr 2025
https://github.com/zenixls2/2chpreprocess
Dump messages from 2ch with some preprocessing for ML analysis
Last synced: 26 Mar 2025
https://github.com/forattini-dev/crawlex
The stealth crawler that actually looks like Chrome.
Last synced: 14 May 2026
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 17 Nov 2025
https://github.com/laffrex/xiaolanben_crawler
一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。
Last synced: 23 Mar 2025
https://github.com/rafaelmoraes003/tech-news
Analysis and manipulation of news data from a technology website obtained through data scraping using Python.
crawler data-scraping https mongodb parsel pymongo python web-scraping
Last synced: 05 May 2026
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Apr 2026
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 15 Mar 2025
https://github.com/dinofizz/sitemapper
sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.
astradb cassandra concurrency crawler go golang kubernetes nats sitemap
Last synced: 16 Jan 2026
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 23 Mar 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 29 Apr 2026
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 01 Sep 2025
https://github.com/jimut123/leaderbehaviour
Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!
crawler leaderbehaviour newsscraper scrapy timesofindia
Last synced: 16 Jan 2026
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 23 Mar 2025
https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez
Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.
beautifulsoup crawler immigration web
Last synced: 16 Jun 2025
https://github.com/agucova/needs-seeding
🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.
Last synced: 12 Oct 2025
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 15 May 2025
https://github.com/sanskar107/c-subject-predictor
Predicts topic of a code.
Last synced: 14 Mar 2025
https://github.com/rabattkarte/free-domain-scanner
crawler dns domain domain-name domain-names go golang scanner whois
Last synced: 26 May 2026
https://github.com/leshniak/robotstxt-debug
A tool for debugging robots.txt
crawler debugger indexing robots-txt seo seo-optimization seo-tools tester
Last synced: 25 Jun 2025
https://github.com/lfsc09/crawl-this-go
Simple CLI tool for crawling pdf documents and html pages
Last synced: 18 Jun 2025
https://github.com/kweonminsung/crawl2toast
Real-time toast notification of crawled data with CSS selectors(Windows Only)
beautifulsoup4 crawler selenium tkinter toast-notifications
Last synced: 18 May 2026
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 22 May 2026
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 10 Jun 2026
https://github.com/alphadev3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 26 Dec 2025
https://github.com/crosscutsaw/iscsicrawler
iscsicrawler is a bash script that crawls files in the iscsi targets with ease.
crawler iscsi iscsi-target iscsiadm
Last synced: 16 Jan 2026
https://github.com/pengkobe/my-web-crawler
auto pull blog update from bloggers. dev based on angular2
Last synced: 18 May 2026
https://github.com/kartikmehta8/pycrawler
PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.
Last synced: 13 Sep 2025
https://github.com/moparisthebest/nginx-limit-crawlers
rate limit crawlers in nginx
Last synced: 14 Mar 2025
https://github.com/jonesrussell/pipelinex
Firecrawl-style web intelligence pipeline powered by North Cloud
Last synced: 09 Mar 2026
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 09 Aug 2025
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 18 Jun 2026
https://github.com/smikodanic/dex8-sdk
DEX8 SDK is software development kit for DEX8.com platform.
crawler crawler-engine data-extraction dex8 scraper scraping-websites spider
Last synced: 11 Jul 2025
https://github.com/m1/smap
smap is a site-mapping engine written in Go.
crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling
Last synced: 01 Jul 2025
https://github.com/jiusanzhou/reaper
Distributed Elegant Scraper and Crawler Framework for Rust.
crawler data-scraping rust scraper spider
Last synced: 24 Jul 2025
https://github.com/alphabs/navercafeclient
네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리
crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping
Last synced: 06 May 2026
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 12 Jun 2026
https://github.com/fusetim/bitcrawler
Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?
Last synced: 30 Mar 2025
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 31 Mar 2025
https://github.com/chenbingwei1201/threads_scraper
A Python package for scraping Threads posts.
chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites
Last synced: 03 Feb 2026
https://github.com/diegojromerolopez/relwrac
A basic crawler developed with python and asyncio
asyncio crawler page-rank python
Last synced: 11 Nov 2025
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 15 Mar 2025
https://github.com/tormol/zenphoto-dl
A script for recursively downloading all pictures from zenphoto-based photo albums.
Last synced: 30 Aug 2025
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 11 Nov 2025
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 12 Apr 2026
https://github.com/orkan/tlc
Simple PHP/cURL/FlareSolverr framework with Logger, Cache and more!
crawler curl flaresolverr net scrap
Last synced: 27 Aug 2025
https://github.com/kahsolt/qzone_mood_dumper
Dump your qzone mood(说说) history to local SQL database storage
Last synced: 25 Aug 2025
https://github.com/leegeunhyeok/python-gongucrawler
파이썬3 공유마당 이미지 및 상세정보 크롤러
Last synced: 24 Aug 2025
https://github.com/mohitk05/drstrange
A simple breadth-first search web crawler
Last synced: 22 Aug 2025