Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-07-02 00:06:49 UTC
- JSON Representation
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 20 Oct 2025
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 23 Jan 2026
https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper
Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.
console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox
Last synced: 09 May 2026
https://github.com/kgruiz/stealth-crawler
Asynchronous headless-Chrome web crawler that discovers internal links and optionally saves HTML, Markdown, screenshots, or PDFs. Built for scripting, inspection, and automation.
asyncio cli crawler headless-chrome html-scraper pydoll python web-crawler
Last synced: 25 Oct 2025
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 23 Feb 2026
https://github.com/bigmeech/mangaka
Crawl scanlation websites for manga pages
comic crawler manga scanlation webtoon
Last synced: 23 Jan 2026
https://github.com/bkdev98/ebooks-crawler
Ebooks crawler for personal purpose using ReactJS.
crawler material-ui nodejs reactjs
Last synced: 12 Apr 2026
https://github.com/dhchenx/quick-crawler
A toolkit for quickly performing crawler functions
Last synced: 27 Oct 2025
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 28 Oct 2025
https://github.com/amirespahbodi/url_crawler
Async Web Crawler for Website Title and Favicon
crawler fastapi pydantic python3 sqlalchemy
Last synced: 15 Apr 2026
https://github.com/citiususc/polypus
Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis
analytics bigdata crawler scraper sentiment-analysis twitter
Last synced: 09 Feb 2026
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 09 Feb 2026
https://github.com/mc256/node-static-webpage-crawler
download entire website with its directory structure.
cache-server crawler nodejs static-site
Last synced: 16 Apr 2026
https://github.com/jongwony/boardgame_finder
나무위키의 보드게임 카테고리를 모두 크롤링해서 특정 필터를 걸기 위한 프로젝트입니다.
asyncio crawler namuwiki python38
Last synced: 27 Feb 2026
https://github.com/dhsagaryt/multisearch
Search efficiently across different platforms with ease. Type your query and choose from multiple search engines, streamlining your experience.
browser crawler internet search search-algorithm search-engine searchbar searchengine webcrawler
Last synced: 14 Feb 2026
https://github.com/captain-woof/zhi-zhu
Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.
crawler crawler-python crawling-python python3
Last synced: 15 Feb 2026
https://github.com/seanghay/wpget
⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API
Last synced: 08 Feb 2026
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 28 May 2026
https://github.com/ecklf/reddit-clawler
A command-line tool written in Rust that crawls Reddit posts from a user or subreddit
cli crawler downloader downloader-for-reddit reddit
Last synced: 31 Mar 2025
https://github.com/linjonh/videowebsidesparser
This Project is used to parse a video web side to remove ads.
Last synced: 13 Jun 2025
https://github.com/jlenon7/sef_automation
📑 Crawler that automatically enrol in open vacancies in SEF website.
athenna crawler esm nodejs playwright portugal residence sef typescript
Last synced: 03 Mar 2026
https://github.com/danielemoraschi/go-sitemap-app
crawler golang sitemap sitemap-generator
Last synced: 29 Apr 2026
https://github.com/danielemoraschi/sitemap-common
Simple PHP Sitemap generator and crawler library.
crawler php php-library php-sitemap-generator sitemap
Last synced: 11 Mar 2026
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 10 May 2026
https://github.com/rsheremeta/web-crawler
A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output
crawler go golang web-crawler webcrawler
Last synced: 12 Jun 2026
https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen
Fetch Keskisuomalainen kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/raspi/scrapy-kuntavaalit2021-sanoma
Fetch Sanoma kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/raspi/scrapy-kuntavaalit2021-almamedia
Fetch Almamedia kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/dubniczky/bad-robot
This is a python crawler that disregards robots.txt rules and downloads disallowed resources
crawler osint-python osint-tool python robots-txt
Last synced: 31 Mar 2025
https://github.com/dubniczky/webmap
Website mapping crawler implemented in python
crawler mapping mapping-tools package python scraping security
Last synced: 31 Mar 2025
https://github.com/sedrubal/webcrawler
Crawl sites and search for security issues.
crawler script security website-auditing
Last synced: 17 Mar 2025
https://github.com/basemax/crawler-news-currency-gold-coins
PHP Crawler to get Persian news related to currency coin and gold.
crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler
Last synced: 05 Jul 2025
https://github.com/solracsf/perplexitybot-ips
Collected PerplexityBot IPs
bots crawler ip ipset perplexity
Last synced: 15 Feb 2026
https://github.com/der3318/daily-pixiv
Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations
crawler line-notify pixiv workflow
Last synced: 03 Mar 2025
https://github.com/basemax/okala-store-ids
A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.
crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala
Last synced: 10 Jun 2025
https://github.com/shentengtu/cht-yp-crawler
Simple Crawler of www.iyp.com.tw.
crawler node-js nodejs yellow-pages yellowpages
Last synced: 09 May 2026
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 12 Apr 2025
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 28 Mar 2025
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 24 Jun 2025
https://github.com/ymdarake/otenki-crawler
Yet another weather data scraper.
Last synced: 02 Feb 2026
https://github.com/mnoalett/cscrawler
BSc degree thesis - crawler for www.couchsurfing.org
bsc-thesis couchsurfing crawler data-analysis database python
Last synced: 02 May 2026
https://github.com/massongit/ibaraki-univ-circle-crawler
Crawls official circles in Ibaraki University from university's website
Last synced: 25 Mar 2025
https://github.com/w3labkr/ipynb-scraper
A collection of frequently used Jupiter notebook code.
crawler ipynb jupyter jupyter-notebook python scrapper
Last synced: 19 Apr 2026
https://github.com/hvtuananh/twitter_crawler
Daemon to call and get tweets from Twitter Public Stream API
crawler java streaming-api tweets twitter twitter-crawler
Last synced: 11 Mar 2025
https://github.com/cls1991/gank.io-go
A simple crawler for fetching pictures from http://gank.io, implemented in golang.
crawler gankio goquery pictures
Last synced: 27 Feb 2025
https://github.com/precioux/pacman
AI Course Projects - Fall 2022
adversial artificial-intelligence bfs-search crawler csp dfs mdp pacman-agent pacman-game pacman-projects reinforcement-learning ucs
Last synced: 28 May 2026
https://github.com/tjdsneto/jcnet-crawler
Extract (scrap) movie schedule info from JCNet movies page
Last synced: 11 Apr 2026
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 13 Apr 2026
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 27 Mar 2025
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 24 Mar 2025
https://github.com/ericc-ch/crawldown
Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.
Last synced: 05 Jul 2025
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 02 Jul 2026
https://github.com/lilchen96/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 28 Dec 2025
https://github.com/matheusfaustino/jazzmaster_crawler
It is a crawling for getting the audio programs from a specific radio program called Jazzmaster
Last synced: 14 Jun 2025
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 15 Mar 2025
https://github.com/faridfr/dribbble-crawler-php
Dribbble crawler with PHP
crawler dribbble dribbble-crawler php php-crawler user-interface
Last synced: 17 Mar 2025
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 31 Mar 2025
https://github.com/m1/smap
smap is a site-mapping engine written in Go.
crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling
Last synced: 01 Jul 2025
https://github.com/kartikmehta8/pycrawler
PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.
Last synced: 13 Sep 2025
https://github.com/pengkobe/my-web-crawler
auto pull blog update from bloggers. dev based on angular2
Last synced: 18 May 2026
https://github.com/kweonminsung/crawl2toast
Real-time toast notification of crawled data with CSS selectors(Windows Only)
beautifulsoup4 crawler selenium tkinter toast-notifications
Last synced: 18 May 2026
https://github.com/lfsc09/crawl-this-go
Simple CLI tool for crawling pdf documents and html pages
Last synced: 18 Jun 2025
https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez
Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.
beautifulsoup crawler immigration web
Last synced: 16 Jun 2025
https://github.com/jimut123/leaderbehaviour
Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!
crawler leaderbehaviour newsscraper scrapy timesofindia
Last synced: 16 Jan 2026
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 02 Apr 2025
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 15 Mar 2025
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Apr 2026
https://github.com/rafaelmoraes003/tech-news
Analysis and manipulation of news data from a technology website obtained through data scraping using Python.
crawler data-scraping https mongodb parsel pymongo python web-scraping
Last synced: 05 May 2026
https://github.com/laffrex/xiaolanben_crawler
一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。
Last synced: 23 Mar 2025
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 17 Nov 2025
https://github.com/dinofizz/sitemapper
sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.
astradb cassandra concurrency crawler go golang kubernetes nats sitemap
Last synced: 16 Jan 2026
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 13 Jun 2026
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 23 Mar 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 29 Apr 2026
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 01 Sep 2025
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 23 Mar 2025
https://github.com/agucova/needs-seeding
🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.
Last synced: 12 Oct 2025
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 15 May 2025
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 09 Apr 2025
https://github.com/sanskar107/c-subject-predictor
Predicts topic of a code.
Last synced: 14 Mar 2025