Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-07-02 00:06:49 UTC
- JSON Representation
https://github.com/illm4tic/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 21 Apr 2026
https://github.com/v-bible/crawler
A collection of web crawlers to crawl Catholic resources in Vietnamese language
catholic corpus-linguistics crawler nlp playwright
Last synced: 22 Apr 2026
https://github.com/thc1006/nycu_timtable_crawler
🎓 NYCU Course Data Crawler & Timetable System | 國立陽明交通大學課程爬蟲與選課系統 - Python web scraper for course schedules, syllabi & educational data analysis. Crawls 18K+ courses with 98% success rate. Features: interactive timetable, JSON API, Google Colab support, batch processing, resume capability.
academic course course-selection crawler data-analysis education educational-data google-colab json-api nycu open-data python schedule student-tools syllabus taiwan timetable university web-automation web-scraping
Last synced: 24 Apr 2026
https://github.com/theshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 24 Apr 2026
https://github.com/dnlzrgz/excursionist
Scrapy-powered flight price crawler.
crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling
Last synced: 24 Apr 2026
https://github.com/monumentality/ifiend
Check latest YouTube uploads without leaving the comfort of your terminal.
crawler headless-chrome terminal-based youtube yt-dlp
Last synced: 25 Apr 2026
https://github.com/liu233w/ojhunt-lite
A lightweight async Python tool for querying Online Judge (OJ) statistics across multiple platforms. Track your accepted problems (AC) and total submissions from 29+ competitive programming platforms.
acm-icpc codechef-api codeforces-api crawler spoj-api
Last synced: 05 May 2026
https://github.com/palpitate-xus/sge_data_insert
利用Github Actions实现自动获取sge数据并存入数据库
Last synced: 26 Apr 2026
https://github.com/bingxyz/btcethcrawler
telegram 比特幣、乙太幣廣播頻道
bash bash-script crawler telegram-bot
Last synced: 26 Apr 2026
https://github.com/taiizor/gocrawler
A high-performance web crawler with concurrent processing capabilities written in Go.
crawler csv go golang golang-application golang-library json storage url web
Last synced: 26 Apr 2026
https://github.com/mg98/ipfs-replicate
Replicate IPFS' distributed data structure locally, based on network traces.
crawler dag ipfs redisgraph scraper
Last synced: 02 May 2026
https://github.com/twknab/django_ajax_web_crawler
Web crawler which retrieves all links on any page. Python & Django-powered.
beautifulsoup4 crawler django-application
Last synced: 27 Apr 2026
https://github.com/dearvn/crawl-mortgage-broker
A script to crawl data from website https://findamortgagebroker.com/
crawler findamortgagebroker mortgage-lenders mortgage-loans nmls php7 python3 seleniumbase
Last synced: 28 Apr 2026
https://github.com/justserpapi/web-html
JustSerpAPI Crawl Webpage HTML API Python SDK examples, with related Google Search API, Google Lens API, Google Maps API, Google News API, Google Shopping API, Google Scholar API, Google Finance API, Google Trends API, Google Jobs API, Google Patents API, Google Hotels API, and Web APIs.
crawler google-finance-api google-hotels-api google-jobs-api google-lens-api google-maps-api google-news-api google-patents-api google-scholar-api google-search-api google-shopping-api google-trends-api html-api justserpapi python serp-api web-crawling web-html-api web-scraping
Last synced: 08 Jun 2026
https://github.com/kkuvam/web-scrape
Web Scraping Technology Evaluation - Evaluation of different web scraping technologies in Python, with a focus on Requests, BeautifulSoup, and Scrapy. Benchmarked each technology for ease of use, performance, scalability, and maintainability
beautifulsoup crawler requests scraping scrapy
Last synced: 28 Apr 2026
https://github.com/josepedrodias/naivebot
attempt to mimic googlebot behaviour in nodejs with nightmarejs
crawler googlebot nightmarejs nodejs robots
Last synced: 29 Apr 2026
https://github.com/chunkingz/youtubelinks-scraper
A python script that scrapes Youtube links from a predefined website of choice.
crawler python scraper spider websitescraper youtube
Last synced: 29 Apr 2026
https://github.com/ryu1kn/procedural-page-crawler
Page Crawler. Tell it where to go and what to look for.
Last synced: 30 Apr 2026
https://github.com/antoniowd/crawly
Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)
crawler got jsdom nodejs webcrawler webscraping
Last synced: 01 May 2026
https://github.com/zawlinnnaing/my-wiki-crawler
A simple program for crawling Burmese wikipedia using Media wiki API.
crawler myanmar-tools python wikipedia-api
Last synced: 01 May 2026
https://github.com/qqxs/usda_pomological_watercolors
爬取美国农业部果树水彩的数据
crawler koa2 nodejs watercolors
Last synced: 01 May 2026
https://github.com/luciopaiva/dicio-crawler
Node.js crawler for dicio.com.br.
Last synced: 02 May 2026
https://github.com/cold-bin/jwzx-mail
use golang to construct cqupt-jwzx crawler application
Last synced: 09 Jun 2026
https://github.com/alexnthnz/web-crawler
Scalable web crawler built with Python, Redis, and Cassandra, inspired by Alex Xu's design. Crawls, indexes, and stores web content with robots.txt compliance and duplicate detection.
Last synced: 03 May 2026
https://github.com/soffits/oogc-resource-index
Spreadsheet-ready OOGC resource indexing with incremental crawl, authenticated download URLs, and Seafile export.
agpl-3 automation cli crawler python uv
Last synced: 03 May 2026
https://github.com/rebrowser/iaai-dataset
IAAI salvage auction data: vehicle listings with loss types, damage codes, title brands, mileage, drivetrain, condition grades, and branch locations. Updated daily.
automotive-data crawler data-collection data-science dataset iaai insurance-auto-auction open-data parquet salvage-auction salvage-vehicles scraper total-loss vehicle-auction web-scraping
Last synced: 03 May 2026
https://github.com/rebrowser/iheart-dataset
iHeart radio station database: 3,600+ stations with call letters, formats, markets, cume audience, stream URLs, and 185M+ daily airplay records. Updated daily.
airplay crawler data-collection data-science dataset datasets iheart music-data open-data radio radio-stations scraper web-scraping
Last synced: 03 May 2026
https://github.com/oleksandr-moik/spring-boot-web-crawler
Web Crawler app on Spring Boot. Getting categories and relevant news category.
crawler gradle java spring-boot
Last synced: 03 May 2026
https://github.com/yann-github/webcrawler-http
Command line application to crawl a website and generate a report of internal linking structure
crawler csv-format javascript jest node report tdd
Last synced: 03 May 2026
https://github.com/qeqqe/cog
An MCP integerated intelligent RAG that gives relevent context to LLM's through crawled Docs
backend-api claude-desktop crawl4ai crawler fastapi mcp python rag sementic-chunking
Last synced: 04 May 2026
https://github.com/kareemsasa3/arachne
A resilient, concurrent web scraper service built in Go, featuring a REST API, Redis-backed job queue, and circuit breaker for fault tolerance.
asynchronous circuit-breaker concurrency crawler docker docker-compose go golang job-queue rate-limiting redis rest-api web-scraper web-scraping
Last synced: 04 May 2026
https://github.com/basemax/crawleryjc
This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.
crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news
Last synced: 05 May 2026
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 05 May 2026
https://github.com/lanesun/one-link
"One Link to rule them all."
crawler curl http svelte web-service
Last synced: 05 May 2026
https://github.com/jnbdz/xtamia-crawler
(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux
crawler electron foundation foundation-css javascript scraper vuejs xtamia
Last synced: 06 May 2026
https://github.com/hasdata/find-urls-from-any-domain
This repository provides practical examples of website link scraping using Python and Node.js.
ai-extraction crawler hasdata-api nodejs python sitemap-parser url-extraction web-crawling web-scraping
Last synced: 06 May 2026
https://github.com/pourmand1376/crawler
Simple Crawler, Indexer and Search Engine Web Application
crawler csharp csharp-code dotnet mvc
Last synced: 07 May 2026
https://github.com/tylpk1216/new-taipei-parkinfo
Find the available parking in New Taipei, Taiwan.
Last synced: 07 May 2026
https://github.com/zhqiang1989/youtube-graph-collector
A demo in python on how to collect youtube video engagement graph data
Last synced: 07 May 2026
https://github.com/ireddragonicy/booruprompt
A simple web application built with NextJS to extract tags from booru websites. Just paste the URL of a booru post, and this tool will fetch and display the associated tags, ready for you to copy.
booru cleaning-data crawler nextjs noobai tags typescript web
Last synced: 07 May 2026
https://github.com/wcygan/crawler
web crawler
crawler crawling tokio tokio-rs web-crawler
Last synced: 08 May 2026
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 09 May 2026
https://github.com/allotmentandy/socialmedialinkextractor
php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links
crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube
Last synced: 09 May 2026
https://github.com/basemax/okala-product-ids
A PHP script to fetch and save product IDs from Okala's online store API across multiple categories and store branches.
crawler crawler-okala crawler-php crawlers data database ids ir iran json okala okala-crawler php php-crawler product
Last synced: 09 May 2026
https://github.com/catbraaain/search-crawl
Search the web and crawl content stealthily, with optional extraction using LLMs.
crawl crawler fastapi playwright scrape scraping searxng
Last synced: 09 May 2026
https://github.com/a-b-z-b/web-spider
A Humble Web Crawler
crawler docker-compose go mongodb web-crawler
Last synced: 09 May 2026
https://github.com/victorbaumgartner/electron-crawler-ui
Desktop app with axios electron to crawl websites accross multiple servers
app axios crawler desktop electronjs macos multiple-servers multithreading
Last synced: 09 May 2026
https://github.com/machinecyc/lotteryinsight
Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.
crawler data docker lottery mysql-database python3 taiwan
Last synced: 09 May 2026
https://github.com/khanof89/twitter_scraper
Scrape tweet details from user profile using selenium
crawler scraper selenium twitter twitter-bot
Last synced: 11 May 2026
https://github.com/woshiluo/bilibilicomic-download
bilibili crawler downloader manga
Last synced: 11 May 2026
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 12 May 2026
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 12 May 2026
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 13 May 2026
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 14 May 2026
https://github.com/scrape-do/dotnet-example
Best Rotating Proxy & Scraping API Alternative. C# Example.
captcha captcha-solver crawler crawlers crawling data-mining data-science data-scraping free free-proxy free-proxy-list proxy proxy-list proxylist rotating-proxy scraper scraping scraping-api scraping-tool
Last synced: 12 Jun 2026
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 13 Jun 2026
https://github.com/soenneker/soenneker.playwrights.crawler
A configurable Playwright crawler with rich stealth and control options.
browser chrome chromium crawl crawler csharp dotnet playwright playwrightcrawler playwrights scrape scraper stealth util
Last synced: 14 Jun 2026
https://github.com/vhdm/twitter-hashtag-crawler
Twitter hashtag crawler by selenium, without using the Twitter API ;)
Last synced: 14 Jun 2026
https://github.com/tri613/nespresso
A mobile version for nespresso coffee website :coffee:
Last synced: 15 Jun 2026
https://github.com/arman-aminian/divar-text-exploring
The first practice of Dr. Asgari's NLP lesson - Data Exploration
crawler natural-language-processing nlp preprocessing scrapy
Last synced: 15 Jun 2026
https://github.com/zhanziyuan/webdownloader
Download elements from the specified website.
crawler downloader image image-downloader python python-crawler web
Last synced: 15 Jun 2026
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 16 Jun 2026
https://github.com/mach1el/openproject-crawler
Scraping data on OpenProject
crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling
Last synced: 17 Jun 2026
https://github.com/manchittlab/TheCrawler
Open-source web scraper + LLM-powered structured extraction. PDF/DOCX, markdown, JSON-LD, microdata, commerce data, forms, 16 analytics-tracker detection. Structured errors with retryable flags. Adaptive Cheerio->Playwright. CLI, npm, REST API, and MCP server. AGPL-3.0.
agpl apify cheerio crawler llm markdown mcp mcp-server model-context-protocol nodejs playwright rag scraper typescript web-scraping
Last synced: 20 Jun 2026
https://github.com/mirusu400/berryz-dl
Batch download berryz webshare files recursively!
berryz berryz-webshare crawler downloader scraper
Last synced: 22 Jun 2026
https://github.com/theognis1002/nimbus-crawler
Highly concurrent web crawler written in Go
crawler docker golang message-queue postgresql redis
Last synced: 23 Jun 2026
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 23 Jun 2026
https://github.com/gastonstat/simpsons-transcripts
Scraping The Simpsons Transcripts with R
crawler data-science r scraping scripts simpsons simpsons-dataset webscraping webscraping-data
Last synced: 23 Jun 2026
https://github.com/bandie91/extip
Fetch external IP from known ext. ip providers
address cli crawler external ip ipv4-address parallel
Last synced: 24 Jun 2026
https://github.com/poran-dip/frenderer
Execute client-side JavaScript and extract fully rendered HTML or text — without a browser.
crawler headless-browser prerender rendering seo spa ssr
Last synced: 25 Jun 2026
https://github.com/dots-suite/thunderdots
ThunderDoTS: a DTS Crawler via DoTS
api corpora crawler digital-humanities distributed-text-services dots dts humanities
Last synced: 25 Jun 2026
https://github.com/chrisabruce/scrapling-rs
Adaptive web scraping, built in Rust. A high-performance port of Python Scrapling.
ai ai-scraping automation crawler crawling crawling-rust data data-extraction mcp mcp-server playwright rust-lang scraping selectors stealth web-scraper web-scraping web-scraping-rust webscraping xpath
Last synced: 26 Jun 2026
https://github.com/disposable/public-dns-crawler
DNS and DoH resolver inventory
crawler dns dns-over-https doh python
Last synced: 28 Jun 2026