Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-29 00:06:53 UTC
- JSON Representation
https://github.com/wenyalintw/job-scraper-bot
幫朋友做好玩的Telegram機器人,已部署到Heroku
amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping
Last synced: 13 Sep 2025
https://github.com/baraja-core/webcrawler
Simple crawling websites by following links.
bot crawler crawling-websites fast php robot speed
Last synced: 03 Sep 2025
https://github.com/vmarcosp/supervise-crawler
:male_detective: Supervise crawler
crawler esy ocaml reasonml webcrawler
Last synced: 13 May 2025
https://github.com/alexqi/webphantom
面向 Web 数据采集任务的开源爬虫框架,支持接口调用、任务调度、会话管理等核心功能,适用于构建具备一定反爬能力的自动化采集系统(抖音|小红书|淘宝|京东)
crawler douyin qps scheduler taobao xiaohonghsu
Last synced: 22 Jun 2026
https://github.com/windfarer/biu
biubiubiu~~ I'm a tiny web crawler framework
crawler python spider spider-framework web-crawler
Last synced: 23 Mar 2025
https://github.com/appliedsoul/promise-crawler
Promise support for node-crawler (Web Crawler/Spider for NodeJS + server-side jQuery)
crawler node-crawler nodejs promise-node-crawler spider
Last synced: 28 Feb 2026
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 14 Jan 2026
https://github.com/bfwg/node-tinycrawler
Tiny web-crawler in a nute shell for Node.js
Last synced: 10 Nov 2025
https://github.com/chrisweb/universal-nodejs-scraper
Universal node.js scraper, is a simple tool to crawl web pages and extract content that can then be stored in csv files (sheets) or directly into a database
crawler harvester javascript nodejs scraper typescript
Last synced: 13 Jul 2025
https://github.com/ntthanh2603/crawl-analysis-data-facebook
📊 Project: Analysis & Data Crawling for Two Football Pages – Manchester United & Liverpool FC ⚽🔍
ana crawler facebook-tools jupyter-notebook numpy pandas selenium
Last synced: 26 Jun 2025
https://github.com/alexmili/reachable
Check if a URL exists and is reachable
crawler health-check monitoring reachability webscraping
Last synced: 14 Aug 2025
https://github.com/markmelnic/mobile-de-crawler
A crawler for mobile.de to index all car listings on the website.
crawler requests scraper sqlite3
Last synced: 08 Oct 2025
https://github.com/stopka/fedicrawl
Collect feeds to follow on Fediverse nodes.
crawler docker fediverse nodejs prisma typescript
Last synced: 04 Apr 2025
https://github.com/bugfishtm/bugfish-image-downloader
🖥️ Windows 🚀 Effortless web image downloads, subsite exploration, and HD selection. Windows app, .NET 4.5, no registry usage.
bugfish bugfish-windows bugfishtm crawler downloader downloadmanager downloadtool gplv3 image imagedownloader imagedownloadertool imageprocessing portable-executable portableapps software utilityapp webscraping windows windows-desktop
Last synced: 26 Jan 2026
https://github.com/jacobsteves/crawlperl
A web crawler made with Perl. Great for grabbing or searching for data off the web, or ensuring that your own site files are secure and hidden.
crawler perl scripting web-crawler
Last synced: 14 Apr 2025
https://github.com/dori-dev/quotes-crawler
Quotes crawler using scrapy and python.
crawler crawling python scraping-python scraping-websites scrapy scrapy-crawler scrapy-spider web-scraper
Last synced: 08 Oct 2025
https://github.com/box-archived/vlive-py
VLIVE(vlive.tv) parser for python
api-wrapper crawler kpop parser python vlive
Last synced: 14 Jan 2026
https://github.com/rodyherrera/cdrake-se
✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).
bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube
Last synced: 19 Apr 2026
https://github.com/the1812/bingwallpapers
A tool for downloading wallpapers from Bing.
Last synced: 03 Apr 2025
https://github.com/yerkopalma/bash-crawler
:computer: Get a site links with bash
Last synced: 05 Aug 2025
https://github.com/xvc323/omnidocs
Automated documentation crawler that generates LLM-friendly Markdown from any docs site. Export as single or multi-file, ready for AI ingestion.
crawler documentation llm markdown
Last synced: 27 Jun 2025
https://github.com/bernabe9/render-it
Render any JavaScript content to create static sites ready for SEO
crawler javascript prerender prerenderio puppeteer render seo seo-tools server-side-rendering static-site static-site-generator
Last synced: 12 Jun 2025
https://github.com/idealchain/dhtcrawler-cluster
BitTorrent DHT crawling cluster
cluster crawler dht docker-images torrent
Last synced: 27 Sep 2025
https://github.com/hypervapor/bilibili-crawler
根据关键字列表爬取 Bilibili 视频信息的后端应用 / Backend application for crawling Bilibili video information based on a list of keywords.
bilibili crawler express nodejs
Last synced: 14 Apr 2025
https://github.com/mrrfv/webarchive
Crawls websites and saves found URLs to a file.
archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping
Last synced: 18 Mar 2025
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 06 Apr 2025
https://github.com/ivangrana/minerador-noticias-labsc
Raspador de notícias utilizando palavras-chaves // utilizando a biblioteca BeautifulSoup em Python
Last synced: 17 Oct 2025
https://github.com/rvegas/dota_crawler
Crawler for dotapedia. Fills a Mongo and a PG database with game data.
crawler dota dota2 flask mongodb postgresql python3 regex scrapy
Last synced: 05 Sep 2025
https://github.com/lucasboscatti/mercado-livre-crawler
A beginner data engineering project which involves scrapping offers from https://www.mercadolivre.com.br/ofertas, stores in a postgres database and analyze the data scrapped.
crawler docker docker-compose heroku mercado-livre postgresql python scrapy sqlalchemy
Last synced: 06 Mar 2025
https://github.com/brucewind/fear-and-greed-index-alarm
A notification reminder for indicating when the CNN Fear and Greed Index is out of range.
crawler fear-and-greed fear-greed-index investment sctock stock-market us-stock-market
Last synced: 21 Jul 2025
https://github.com/AmirAref/Torobot
an inline telegram robot to easy access and search in torob.com products from telegram.
crawler python python-telegram-bot scraper telegtam-bot
Last synced: 13 Jul 2025
https://github.com/akiosarkiz/manga-collector
The manga collector is a library designed to easily scrape manga content from various websites. This package is licensed under the MIT License and is fully test-covered
Last synced: 10 Jul 2025
https://github.com/xlisp/ai-auto-crawler
ai-auto-crawler: puppeteer + autogen
Last synced: 31 Aug 2025
https://github.com/systemfsoftware/youtube-autocomplete-scraper
YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.
actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api
Last synced: 25 Jun 2025
https://github.com/markelog/map
Simple site map generator, supports couple reporters, depth levels and etc
Last synced: 11 Apr 2025
https://github.com/beingvirus/jobminer
JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.
automation beautifulsoup career crawler data-collection data-mining hacktoberfest hacktoberfest-accepted hacktoberfest2025 job-scraper jobs open-source python selenium web-scraping
Last synced: 10 Oct 2025
https://github.com/arshadkazmi42/blc
Broken link checker
blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python
Last synced: 30 Oct 2025
https://github.com/frectonz/rampilo
A telegram crawler
crawler rust telegram telegram-crawler
Last synced: 07 Sep 2025
https://github.com/btlmd/thuhole_crawler
A crawler to save holes on the deceased thuhole
Last synced: 16 Jun 2025
https://github.com/mashukui/dy_trans_tool
用python开发的抖音转换gui界面软件工具,支持抖音号和主页链接uid相互转换、作品链接app端转为pc端等。抖音爬虫|抖音工具|抖音采集工具|抖音采集|抖音采集软件|抖音效率工具|抖音爬取数据|douyin|Douyin
crawler douyin douyin-api gui gui-application python3
Last synced: 04 Apr 2026
https://github.com/hctilg/taaghche-dl
Save books purchased from taaghche.com !
crawler downloader pillow-library python3 selenium taaghche
Last synced: 12 May 2025
https://github.com/basemax/googleplaydatabasemirror
Repository of designing a crawler script to update a mirror database from Google Play on PHP.
crawl crawl-pages crawler crawlers crawling database database-schema google-play mysql php
Last synced: 24 Sep 2025
https://github.com/simsso/vision-based-page-rank-estimation
Student research project on pagerank estimation with deep graph networks
cnn crawler deep-learning graph-networks page-rank student-research-project
Last synced: 24 Apr 2025
https://github.com/twtrubiks/pttcrawlercontent
PTT Crawler Content on python PTT文章爬蟲
Last synced: 15 Apr 2025
https://github.com/doroudi/imdb-crawler
imdb.com movies crawler in scrapy
crawler data-mining python scrapy
Last synced: 22 Jun 2025
https://github.com/bitscoper/bitscoper_cyber_toolbox
A Flutter application consisting of TCP Port Scanner, Route Tracer, Pinger, File Hash Calculator, String Hash Calculator, Base Encoder, Morse Code Translator, Open Graph Protocol Data Extractor, Series URI Crawler, DNS Record Retriever, and WHOIS Retriever.
android calculator crawler cybersecurity dart decoder docker encoder extractor flutter github-action ios mac retriever scanner tracer translator web windows
Last synced: 31 Jul 2025
https://github.com/ajcerejeira/base.gov.pt
A crawler that fetches data from base.gov.pt
Last synced: 14 Jul 2025
https://github.com/prdx23/async-crawler
A recursive async crawler which creates a graph of connected webpages
Last synced: 17 Jan 2026
https://github.com/poyea/coronaflight-hkg
😷 Crawler and history manager for dangerous, coronavirus-infected flights to Hong Kong (VHHH)
corona coronaflight-hkg coronavirus coronavirus-analysis coronavirus-info coronavirus-tracker coronavirus-tracking crawl crawler crawlers crawling hacktoberfest hong-kong hongkong javascript json json-api node node-js nodejs
Last synced: 24 Mar 2025
https://github.com/jonasgeiler/Iconmonstr-API
An unofficial API to access icons from iconmonstr.com
api collection collections crawler eps font icon icon-font iconmonstr iconmonstr-api icons image images png psd scraper svg unofficial vector vector-graphics
Last synced: 10 Mar 2025
https://github.com/jean-baptiste-camps/iiif-crawler
Interrogate IIIF servers and get images of manuscripts
crawler iiif iiif-image manuscripts
Last synced: 29 Oct 2025
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 10 Apr 2025
https://github.com/amirzenoozi/persian-news-crawler
Simple Script To Crawl Data From Persian News Agencies Including Fars, Mehr.
cli crawler database fars-news farsi-datasets kaggle-dataset mehr-news news news-agencies newspaper python python3 script shargh-news sqlite3 tensorflow tensorflow2
Last synced: 13 Apr 2025
https://github.com/dotenorio/freeloader-of-data
A simple crawler or scraper to get open graph and other meta data from any website.
crawler graph hacktoberfest meta-data open-graph scraper
Last synced: 13 Mar 2025
https://github.com/AmirAref/DivarCrawler
an script to crawl divar.ir and extract phone numbers
Last synced: 13 Jul 2025
https://github.com/meysam81/scry
Your website has problems you can't see. Scry finds them. Crawl your entire website across SEO, security, performance, and accessibility. No browser, no subscription.
accessibility cli command-line-tool crawler devops golang hreflang lighthouse link-checker pagespeed sarif security-headers seo seo-tools site-audit structured-data technical-seo web-performance web-security website-audit
Last synced: 14 Jun 2026
https://github.com/oldkingcone/pbandj
PasteBin Crawler, crawls the url https://pastebin.com/archive
crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver
Last synced: 26 Sep 2025
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 09 Apr 2025
https://github.com/integralist/go-web-crawler
A web crawler built in the Go programming language
concurrency crawler go golang web-crawler
Last synced: 26 Oct 2025
https://github.com/surelle-ha/dogma
Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.
Last synced: 22 Jun 2025
https://github.com/chusiang/crawler-book-info
A crawler for quick parser the book information
Last synced: 12 Apr 2025
https://github.com/neilblaze/smapviw
Sitemap Visualizer built upon D3.js
crawler sitemap sitemap-generator
Last synced: 06 Oct 2025
https://github.com/basemax/twitterbotcrawler
A bot to login in Twitter and process page with selenium using Python.
crawler crawler-twitter crawlers selenium-crawler selenium-example selenium-sample selenium-twitter twitter twitter-bot twitter-crawler twitter-py twitter-python twitter-selenium
Last synced: 05 May 2025
https://github.com/hybridx/webscraper
webcrawler made from Beautiful soup
crawler flask google-dorks javascript python3 search-engine
Last synced: 07 May 2025
https://github.com/lgraubner/node-w3c-validator-cli
Crawls a given site and checks for W3C validity.
Last synced: 13 Apr 2025
https://github.com/nobodxbodon/chromecrawlerwildspider
Chrome Extension to crawl web pages by loading them into browser tabs parallelly.
chrome-extension crawler localstorage spider
Last synced: 07 May 2025
https://github.com/holmofy/spring-spider
Spring Spider App Utility Library.
crawler java spider spring spring-spider
Last synced: 17 Mar 2025
https://github.com/aprilnea/xjtlu
This is how to get all the network resources of XJTLU.
crawler gateway http-auth python spider web-crawler xjtlu
Last synced: 01 Aug 2025
https://github.com/danielmorell/se_bot_checker
Validate search engine user agents and IP addresses.
crawler googlebot python search-engine spider
Last synced: 15 Apr 2025
https://github.com/lonsty/zcooldl
ZCool picture crawler. Download ZCool (https://www.zcool.com.cn/) designer's or user's pictures, photos and illustrations.
Last synced: 18 Jan 2026
https://github.com/foolin/scrago
An simpe, fast, extensible crawl page framework for golang
Last synced: 24 Feb 2025
https://github.com/hxr16f/ss-grabber
Automation script for downloading user screenshots.
automation crawler downloader grabber lightshot screenshot script
Last synced: 20 Jul 2025
https://github.com/crwlrsoft/laravel-crawler
Laravel adapter for the crwlr/crawler package.
crawler crawling crawling-framework hacktoberfest laravel laravel-package php scraper scraping web-crawler web-crawling web-scraping
Last synced: 28 Feb 2025