Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-22 00:06:47 UTC
- JSON Representation
https://github.com/vshulcz/youtube_crawler
A simple YouTube crawler, allows you to quickly collect data from channels, view and sort them in a table, perform SQL queries and advanced search by various parameters.
crawler database gui osint parser python requests reverse-engineering sql tkinter youtube
Last synced: 16 Jan 2026
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 15 Mar 2026
https://github.com/wangshouh/sdufelib_seat_crawler
SDUFE Library Reservation Seat Monitoring Crawler
Last synced: 12 Feb 2026
https://github.com/karambir/ugc-colleges
Python Script to extract college names from UGC, India website.
college crawler extract html-parser python python-script ugc
Last synced: 19 Apr 2025
https://github.com/juliandavidmr/raptor
Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.
Last synced: 20 Oct 2025
https://github.com/supadata-ai/js
Official TypeScript/JavaScript SDK for the Supadata API.
ai crawler llm markdown scraper transcript web-crawler youtube
Last synced: 22 Mar 2025
https://github.com/typingmonk/mnd_adiz_news_crawler
Web crawler that target to mnd.gov.tw post relate to ADIZ(防空識別區) report.
Last synced: 10 Jul 2025
https://github.com/robmch/mindfactory_crawling
A Python 3 Crawler for Mindfactory.de
crawler crawling data webcrawler webcrawling
Last synced: 07 May 2025
https://github.com/xcrypt0r/hyacinth
🌸 Dcinside image crawler with deadly simple structure
beautifulsoup4 crawler dcinside parsing pyqt5 pyside2
Last synced: 28 Apr 2025
https://github.com/pyaesoneaungrgn/2d-crawler
2D crawler for set.or.th
2d 2d-crawler crawler myanmar php
Last synced: 28 Apr 2025
https://github.com/archan937/webhead
An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.
api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing
Last synced: 25 Apr 2025
https://github.com/leomaurodesenv/smm-course-search
A package to searching courses - Super Mario Maker
bookmark-site crawler javascript json mario-game mario-maker nodejs
Last synced: 01 Apr 2025
https://github.com/cr0hn/feed-to-exporter
Get RSS Feed and export as Wordpress Post
Last synced: 30 Oct 2025
https://github.com/choi-jiwoo/naver-place-scraper
Scrape reviews from Naver Place
Last synced: 14 Jan 2026
https://github.com/librecodecoop/querido-diario-php
Brazilian government gazettes, accessible to everyone.
civic-tech crawler data-science gazette-crawler governments-gazettes govtech hacktoberfest open-data php php7 politics spider
Last synced: 18 Oct 2025
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 21 Mar 2025
https://github.com/spencerlepine/readme-crawler
A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL
crawler javascript node nodejs readme scraper web-crawler webcrawer
Last synced: 03 May 2025
https://github.com/hktalent/scrapysite
ScrapySite,go Web Crawler(spider), scraping,intelligence gathering
crawler elasticsearch go scraping site spider web
Last synced: 14 May 2025
https://github.com/aminehsan/crawler-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scarping
Last synced: 14 Oct 2025
https://github.com/leelow/nightmare-screenshot-selector
👻 📷 A Nightmare plugin to easily take screenshots.
crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler
Last synced: 12 Apr 2025
https://github.com/zebbern/dezcrwl
🕷️ | dezcrwl is a website history crawler gather hidden information and check vulnerabilities for extracted .js endpoints & much more!
crawl crawler crawler-python crawlers ctf-tools hacking historical-data information information-gathering information-retrieval information-security infosec osint osint-tool pentesting-tools python reconnaissance tool web website
Last synced: 14 Apr 2025
https://github.com/aprilnea/xjtlu
This is how to get all the network resources of XJTLU.
crawler gateway http-auth python spider web-crawler xjtlu
Last synced: 01 Aug 2025
https://github.com/natlee/myanimelist-comment-crawler
Crawl all reviews and infomation of Anime works on MyAnimeList. ;)
anime crawler data-analysis data-mining data-science kaggle kaggle-dataset myanimelist python requests scrapy-crawler sqlite
Last synced: 14 Apr 2025
https://github.com/farkaskid/webcrawler
Simple and fast web crawler.
crawler go golang goroutines web webcrawler
Last synced: 14 Jan 2026
https://github.com/holmofy/spring-spider
Spring Spider App Utility Library.
crawler java spider spring spring-spider
Last synced: 17 Mar 2025
https://github.com/filipefilardi/wpp-broadcaster
Crawler made with Selenium and Python to constantly receive video/audio from target and broadcast to a list of contacts.
broadcast crawler python selenium
Last synced: 30 May 2026
https://github.com/basemax/firstselenium
Some sample codes for using selenium in Python just for fun.
crawl crawler crawlers crawling python python-selenium python3 selenium selenium-example selenium-py selenium-python selenium-sample selenium-tests selenium-website
Last synced: 05 May 2025
https://github.com/xdk78/grabbi
grabbi a simple web scraper/crawler
crawler html scraper web-scraper
Last synced: 14 Apr 2025
https://github.com/kstrassheim/datawarehouse-crawler
This is a content and schema crawler tool to receive, update and import various kinds of data into a Onprem or Cloud based SQLServer or Azure-Synapse-Analysis (Azure Datawarehouse SQLServer). As source it supports SQLServer Tables, ODATA Endpoints, CSV Files or Excel Files. For multiple sources it can run in parallel mode where it would make a thread for each connection. The speciality of this crawler is that it creates the target tables by himself using the additional info from source.json. In case of Azure-Synapse-Analysis it would estimate the distribution type and keys. The syncing works completely without SQL Transactions by using a consistency correction algorithm for very frequent fact tables. There are 5 Syncing Algorithms (see Manual/Insert) which can be selected as well as one Update Algorithm.
azure-data-warehouse azure-synapse-analytics business-intelligence crawler csv data-import data-science datawarehouse datawarehousing docker dotnet-core-2 excel integration-testing odata parallel-computing sql
Last synced: 28 Apr 2026
https://github.com/oxylabs/web-crawler
Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.
api crawler github-python scraper web-crawler web-crawler-python web-scraping web-scraping-api webscraping
Last synced: 01 Aug 2025
https://github.com/mrmarble/mineseek
Minecraft server scanner
crawler minecraft minecraft-server scanner slp
Last synced: 07 Apr 2026
https://github.com/crwlrsoft/laravel-crawler
Laravel adapter for the crwlr/crawler package.
crawler crawling crawling-framework hacktoberfest laravel laravel-package php scraper scraping web-crawler web-crawling web-scraping
Last synced: 28 Feb 2025
https://github.com/giscafer/airlevel-crawler
a demo of crawler for air-level.com
Last synced: 28 Apr 2025
https://github.com/vitorebatista/horoscopefree
The Astrology API Rest daily horoscope
crawler horoscope horoscope-crawler horoscopes-api
Last synced: 14 Oct 2025
https://github.com/inishchith/python-scripts
Some Scripts & Projects
crawler python-script python3 scripts youtube
Last synced: 19 Jul 2025
https://github.com/basemax/instagramseleniumhashtagimagepython
Instagram Selenium Python: A selenium-based crawler to extract images from special hashtags on Instagram.
crawler crawler-python crawlers instagram python python-selenium selenium selenium-python
Last synced: 15 May 2026
https://github.com/hctilg/pinterest-crawler
Downloads all images suitable for search
Last synced: 12 Apr 2025
https://github.com/coghost/iparse
To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support
crawler parser parser-library python xkcd yaml
Last synced: 12 Oct 2025
https://github.com/fzdwx/go-pachong
go 爬虫,能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url
Last synced: 28 Apr 2025
https://github.com/yanliu1111/dashboard-flask-echarts
📊 Pandemic Monitoring Realtime Dashboard
crawler echarts flask mysql selenium-webdriver
Last synced: 20 Jun 2025
https://github.com/zurdi15/nbz
Bot to automate internet browsing
automation bot browser-automation browsermob-proxy crawler selenium testing web
Last synced: 10 Jun 2025
https://github.com/x-tropy/docroll
Turn complex programming knowledge 📚 into engaging, AI-powered video 📺 lessons.
ai animation course crawler documentation slides tutorial video
Last synced: 11 Oct 2025
https://github.com/jimmylaurent/node-crawling-framework
✨ NodeJs crawling & scraping framework heavily inspired by Scrapy
crawler crawling crawling-framework elasticsearch headless-chrome middleware mongodb nodejs-framework scraper scraping scraping-framework scrapy spider
Last synced: 15 Mar 2025
https://github.com/developerdavi/meli-crawler
Basic web crawler API for getting products from MercadoLibre (BRL | MLB)
api crawler meli-crawler mercadolibre mercadolibre-sdk mercadolivre mercadolivre-sdk nextjs now products react zeit
Last synced: 12 Apr 2025
https://github.com/danielmorell/se_bot_checker
Validate search engine user agents and IP addresses.
crawler googlebot python search-engine spider
Last synced: 15 Apr 2025
https://github.com/itszeeshan/crawlinit
A web crawler written in python3
appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web
Last synced: 13 Jun 2025
https://github.com/sujinleeme/koreamarathonapi
APIs of Marathon Events in Korea
crawler korea marathon-events python3
Last synced: 23 Jun 2025
https://github.com/hxr16f/ss-grabber
Automation script for downloading user screenshots.
automation crawler downloader grabber lightshot screenshot script
Last synced: 20 Jul 2025
https://github.com/giant-stone/gmq
一个支持自定义消费速率的简单消息队列 Simple, reliable, lightweight and efficient task queue in Go
crawler message-queue redis task-manager
Last synced: 12 Jan 2026
https://github.com/feedeo/youtube-channel-crawler
YouTube Channel :tv: Crawler
crawler youtube youtube-channel
Last synced: 06 Feb 2026
https://github.com/mirocow/yii2-crawler
Http concurrent crawler for Yii2
concurrency crawler guzzle yii2-extension
Last synced: 27 May 2026
https://github.com/spire-rs/spire
🗼 A flexible async framework for building high-performance crawlers and scrapers, designed for developers who need extensible pipelines, strong concurrency, and robust middleware support.
crawler framework scraper webdriver
Last synced: 21 Jan 2026
https://github.com/gatenlp/wpextract
Create datasets from WordPress sites for research or archiving
corpus crawler nlp text-extraction text-mining web-scraping wordpress
Last synced: 25 Jun 2025
https://github.com/zain-ul-din/lgu-crawler
LGU timetable Crawler
contribute crawler lahore-garrison-university lahore-garrison-university-timetable open-source
Last synced: 08 Aug 2025
https://github.com/synacktraa/crawl
Web crawler designed to efficiently retrieve unique href, script and form links from a web application.
bash crawler regex shell web-spidering
Last synced: 06 Apr 2026
https://github.com/kernelerr/pixivsync
Pixiv图片下载及同步工具
crawler pixiv pixiv-crawler python
Last synced: 14 May 2025
https://github.com/eished/tujigu_crawler
tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler
Last synced: 28 Apr 2026
https://github.com/firesjoeng/bfo
Bilibili Followers Observer | Bilibili实时粉丝数监视器
Last synced: 13 Apr 2025
https://github.com/samnoh/cliboards
⌨️ Surf your online communities on CLI
cli-application crawler javascript
Last synced: 17 Jan 2026
https://github.com/code-inside/sloader
Worker that loads and retrieves data from "slow" endpoints.
Last synced: 03 Sep 2025
https://github.com/serkan-ozal/driflyte-mcp-server
The Driflyte MCP Server exposes tools that allow AI assistants to query and retrieve topic-specific knowledge from recursively crawled and indexed web pages.
ai crawler mcp model-context-protocol opentelemetry rag
Last synced: 05 Oct 2025
https://github.com/liyifeng1994/go-crawler
基于golang的分布式爬虫项目
crawler elastic elasticsearch golang
Last synced: 01 May 2025
https://github.com/foolin/scrago
An simpe, fast, extensible crawl page framework for golang
Last synced: 24 Feb 2025
https://github.com/1uc1f3r616/dark-net-websites-dataset
Dataset of Onion Websites
crawler darknet data-analysis dataset onion search-engine website
Last synced: 27 Feb 2025
https://github.com/kulkultech/asos-crawler
Asos Crawler for Apify
apify asos crawler made-in-indonesia scrapper
Last synced: 27 Jan 2026
https://github.com/feliz-szk/berserk
Berserk: Crawler to increase web traffic(based on tor and privoxy)
anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser
Last synced: 21 Apr 2026
https://github.com/omerdogan3/kitapp-crawler
Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.
book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler
Last synced: 04 May 2026
https://github.com/birkhofflee/blizzard_forum.js
An unofficial Node.js API for Blizzard Forums. (works in 2019)
Last synced: 26 Apr 2026
https://github.com/haxzie-xx/crode.js-node-web-crawler
Node.js Crawler built for open FTP sites for movie link collection.
Last synced: 01 May 2026
https://github.com/bakhirev/assayo-crawler
📈 Visualization and analysis of your git repository data.
audit commit crawler data-visualization git report statistics
Last synced: 05 Mar 2026
https://github.com/somnisomni/trawler-csharp
The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#
crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper
Last synced: 06 May 2026
https://github.com/0000xffff/webgrab
web page: crawler / file scanner / downloader
crawler download downloader scrape scraper webcrawler
Last synced: 17 Apr 2026
https://github.com/zhaotianff/crawler-line
C# command-line crawler
command-line command-line-tool crawler csharp dotnet-core
Last synced: 07 Jun 2026
https://github.com/zhaotianff/qzone
想起那天夕阳下的奔跑,那是我逝去的青春
crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf
Last synced: 24 Apr 2026
https://github.com/achannarasappa/locust-cli
Developer tools to accelerate development of Locust jobs
cli crawler headless-chrome puppeteer scraper
Last synced: 26 Apr 2026
https://github.com/wujunchuan/xiamen-housing-data-collection
利用(计划) Github Actions 定时采集厦门市住房保障与房屋管理局的一手房/二手房网签情况
crawler docker docker-image github-actions nodejs ocr spider tesseract-ocr xiamen
Last synced: 04 Apr 2026
https://github.com/huzecong/film-spider
Spiders crawling for film listing websites.
Last synced: 09 Jun 2026
https://github.com/capturr/price-extract
Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.
amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript
Last synced: 30 Apr 2026
https://github.com/tokenmill/crawling-framework-example
Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
crawler crawling-framework elasticsearch storm-crawler
Last synced: 08 May 2026
https://github.com/idanhoro/nasa-heat-maps-prediction
In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms
beautifulsoup crawler machine-learning pillow prediction python sklearn
Last synced: 05 Apr 2026
https://github.com/crackcomm/go-google-search
Google search NSQ worker
crawler google google-search search
Last synced: 16 Feb 2026