Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-22 00:06:47 UTC
- JSON Representation
https://github.com/adbenitez/smd
Simple Manga Downloader, a tool to search and download manga
bs4 command-line-tool crawler crawling downloader manga manga-downloader python python3 urllib
Last synced: 14 Jan 2026
https://github.com/amirzenoozi/insta-downloader
You Can Download Instagram Post With This Script
crawler crawling downloader instagram
Last synced: 20 Jul 2025
https://github.com/rodyherrera/codexdrake
An open source, privacy-first, self-hosting capable and blazing fast search engine written in JavaScript. Browse anonymously and safely without the need to pay third-party APIs. 👀
adblock books crawler google images javascript metasearch metasearch-engine news nodejs privacy-first search search-engine searchengine searx self-hosted videos webscraping websearch wikipedia
Last synced: 27 Mar 2026
https://github.com/wisdom-valley/planet-helper-release
A useful 知识星球 download helper and studying assistant.
crawler knowledgebase spider zsxq
Last synced: 09 Jun 2026
https://github.com/orangmuda/SECTOOL
sᴇᴀʀᴄʜ ᴇɴɢɪɴᴇ sᴄʀᴀᴘᴇʀ ᴛᴏᴏʟ (ʙᴀsʜ)
crawler crawling scraper website-scraper
Last synced: 22 May 2026
https://github.com/krolow/marsvin
Structural Crawler framework written in PHP
Last synced: 13 Oct 2025
https://github.com/timschneeb/app-crawler
Python script that searches GitHub, F-Droid and IzzySoft's F-Droid repo for apps with Shizuku support. Updated daily.
crawler f-droid github shizuku
Last synced: 07 May 2025
https://github.com/ototot/judgegirl-scoreboard
A Fancy Scoreboard for JudgeGirl
crawler judgegirl judgegirl-scoreboard php scoreboard tocas-ui tocasui vuejs vuejs2
Last synced: 15 Apr 2025
https://github.com/redco/goose-starter-kit
This is a starter kit for redco/goose-parser
crawler docker goose goose-parser parser starter-kit
Last synced: 04 Apr 2025
https://github.com/begrossi/anp-price-collector
ANP Price Collector
crawler experiment not-maintained scrapy-crawler
Last synced: 07 May 2025
https://github.com/byt3n33dl3/crawler_v2
Remote access Trojan based (Client) After the Malware hits the Kernel.
compiler crawler exploit offensive-security pentesting rat
Last synced: 13 Apr 2025
https://github.com/scrapingant/scrapingant-client-js
ScrapingAnt API client for JavaScript / Node.js.
crawler scraper scraping scrapingant webscraping
Last synced: 15 Aug 2025
https://github.com/kodjunkie/node-raspar
🕷️ Easily scrap the web for torrent and media files.
api api-rest api-wrapper cli crawler crawling crawling-tool docker expressjs javascript movies mp3 music node-js nodejs scraper series torrent torrent-downloader video
Last synced: 13 Apr 2025
https://github.com/geminidsystems/googlenewsscraper
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper
Last synced: 13 Aug 2025
https://github.com/catalyst/moodle-tool_crawler
A moodle link crawling robot, find broken, slow and oversized links
Last synced: 28 Feb 2026
https://github.com/petrpatek/airbnb-scraper
Apify public actor for scraping Airbnb homes.
airbnb airbnb-api apify crawler data-extraction scrape
Last synced: 20 Mar 2025
https://github.com/theritikchoure/crawlyx
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping
Last synced: 30 Oct 2025
https://github.com/elektrostudios/google-search-url-crawler
Desktop app that crawls urls from Google's search engine results
crawl crawler crawlers crawling dotnet google google-crawler google-search googlesearch hacking search search-engine searcher tool tools url url-crawler vbnet windows winforms
Last synced: 25 Jul 2025
https://github.com/aneesh-aparajit/reddit-crawler
Reddit Crawler API for collecting datasets from Reddit.
crawler nlp python reddit scraper web-crawler
Last synced: 16 Jan 2026
https://github.com/johansatge/psi-report
Crawls a website, gets PageSpeed Insights data for each page, and exports an HTML report.
cli crawler html-report pagespeed-insights
Last synced: 27 Mar 2025
https://github.com/qzcool/cpef
私募基金管理人查询数据接口。Chinese Private Equity Funds APIs.
china crawler data finance fund funds hedge-funds private-equity python python3 scraper scraping-websites spider
Last synced: 26 Feb 2026
https://github.com/hfrost0/simple-baidu-image-download
只有30行的百度图片爬虫,只用最简单的语句
Last synced: 11 Mar 2026
https://github.com/viclafouch/fetch-crawler
📌 A Node.JS Web crawler using the API Fetch to scrap static websites
cheerio crawler crawling-sites fetch-api nodejs promises scrapping
Last synced: 17 Mar 2026
https://github.com/thesoenke/news-crawler
Crawler that collects and extracts content of daily published news articles
Last synced: 07 May 2025
https://github.com/unistudents/saffron
A fairly intuitive & powerful framework that enables you to collect & save articles and news from all over the web.
aggregator announcements api-scraper articles crawler crawler-framework dynamic-scraping html-scraping javascript news parser rss rss-aggregator rss-feed rss-parser saffron scraping typescript wordpress-api
Last synced: 21 Feb 2026
https://github.com/siveci/javdb_magnet_spider
基于 Python 的 JavDB 磁力链接自动化爬虫。采用 curl_cffi 完美模拟浏览器 TLS 指纹绕过 Cloudflare 防火墙。支持多页列表抓取,根据“无码/中字/高清”等标签及文件大小,自动筛选并导出最优的磁力链接至 CSV 文件。
crawler data-extraction javdb magnet-links python python3 scraper spider
Last synced: 06 Jun 2026
https://github.com/utkucanbykl/sofpythonbot
This Telegram-Bot answers python questions by using stackoverflow subjects.
beautifulsoup crawler machine-learning mongodb naive-bayes-algorithm python telegram-bot
Last synced: 14 Aug 2025
https://github.com/torhamdev/death-engine
A powerful recon tool
crawler death-engine directory-search google-dorks hacking-tool information-gathering pentesting pentesting-tools port-scanning python3 recon recon-tools scanner web-hacking web-penetration-testing webhacking webpentest whois
Last synced: 12 Apr 2025
https://github.com/jacraig/spidey
A multi threaded web crawler library that is generic enough to allow different engines to be swapped in.
Last synced: 12 Aug 2025
https://github.com/sayyid5416/links-extractor
Extract links from any file or the website.
crawler extract-links extractor links-extraction scraper web-crawler web-scraper
Last synced: 21 Mar 2025
https://github.com/guillim/arachnida
App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.
crawler crawling framework headless-chrome javascipt meteor scraper scrapping
Last synced: 15 Jun 2025
https://github.com/cristipufu/scrapy-net
Scrapy the web scraping tool - a naive implementation in C#
Last synced: 28 Oct 2025
https://github.com/twtrubiks/google-play-store-spider-bs4-excel
Google-Play-Store-spider use Beautiful Soup on Python to EXCEL
beautifulsoup crawler google-play-store pyexcel python sql-database xlsx
Last synced: 15 Apr 2025
https://github.com/jpwahle/cs-insights-crawler
This repository implements the interaction with DBLP, information extraction and pre-processing of papers, and a client to store data to the cs-insights-backend.
crawler dblp dblp-dataset nlp semanticscholar
Last synced: 18 Apr 2026
https://github.com/ycrao/some-spider-code
some spider code 财经资讯以及基金股票外汇价格爬虫
ai crawler deep-seek economics fin-eco-news finance forex fund-value spider stock-price
Last synced: 29 Jun 2025
https://github.com/yggverse/yggo
YGGo! Distributed Web Search Engine
alt-web crawler curl distributed federative fts5 js-less mysql open-source parser pdo php privacy-oriented search-engine sphinx sphinxsearch spider web web-archive yggdrasil
Last synced: 08 Apr 2025
https://github.com/louis70109/pleaguebot
P+ League Chatbot(unofficial)(deprecated)
basketball chatbot crawler line
Last synced: 14 Apr 2025
https://github.com/fjcanyue/comic_downloader
🚀 轻量级命令行漫画下载器 (CLI),支持摩锐漫画、读漫屋、看漫画等热门平台。Python 实现,极简高效。
comic-downloader comics crawler manga manga-downloader
Last synced: 26 Jan 2026
https://github.com/tca166/ck3-history-extractor
A program designed for creating an encyclopedia of sorts containing your ck3 history
ck3 crawler python3 rust save-file save-files
Last synced: 04 Jul 2025
https://github.com/whitejoce/Get_Weather
通过获取IP定位,爬取当地的天气(不需要API)
crawler python3 spider weather-forecast
Last synced: 14 Apr 2025
https://github.com/lablnet/pakweather_scraper
A multi-threaded Pakistan Weather crawler written in JavaScript
crawler data mit-license open-source pakistan scraping weather weather-channel
Last synced: 22 Aug 2025
https://github.com/misaka10843/copymanga-nasdownloader
copymanga-downloader的mini ver,专为nas设计,不止于copymanga,支持多种平台!
comic copymanga crawler downloader python
Last synced: 16 Jan 2026
https://github.com/houtini-ai/seo-crawler-mcp
Crawl and analyse your website for errors and issues that affect your site's SEO inside a self contained MCP - interact in your AI assistant or in terminal for later AI SEO analysis in chat.
crawlee crawler librecrawl mcp seo seo-analysis sqlite technical-seo-audit
Last synced: 06 May 2026
https://github.com/piotrpdev/webuy-cex-price-tracker
A python script that gets the prices of certain Cex products and uploads them to google sheets
cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex
Last synced: 05 May 2025
https://github.com/root4loot/recrawl
A web crawler written in Go
bugbounty crawler discovery enumeration go golang recon reconnaissance web
Last synced: 11 Oct 2025
https://github.com/eugen1j/aioscrapy
Python asynchronous library for web scrapping
asyncio crawler python-crawler python37 webscraper
Last synced: 09 Oct 2025
https://github.com/nadar/crawler
A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.
crawler hacktoberfest html pdf php
Last synced: 13 Apr 2025
https://github.com/BroNils/GoogleSearch-CLI
Search anything on Google without captcha
captcha crawler google googlesearch googlesearch-cli recaptcha search-engine
Last synced: 26 Mar 2025
https://github.com/vinhlh/frontendmasters-crawler
A demo of a serverless crawler built on AWS Lambda (scheduled tasks) and store results in S3
aws crawler lambda s3 serverless
Last synced: 11 Aug 2025
https://github.com/basemax/stackoverflowcrawler
A web crawler which crawls the stackoverflow website.
crawler crawler-detector crawler-python crawler-testing crawlers crawling python-crawler stackoverflow stackoverflow-analyse stackoverflow-answer stackoverflow-api stackoverflow-crawler stackoverflow-get stackoverflow-questions stackoverflow-tags test-crawler text-processing text-processor web-crawler web-crawler-python
Last synced: 15 Sep 2025
https://github.com/doreanbyte/katswiri
A crawler to find job listings and aggregate them from multiple sources
assistant crawler employment-opportunities job-aggreg job-finder time-management
Last synced: 04 Sep 2025
https://github.com/ne-lexa/roach-php-bundle
Symfony bundle for roach-php/core
crawler php roach-php scrapy spider symfony symfony-bundle
Last synced: 10 Apr 2025
https://github.com/xfengyin/zhihu-salt-novel-downloader
知乎盐选小说下载器 - 多线程爬取知乎盐选专栏小说,支持CLI+GUI双模式、多种导出格式、代理配置、Cookie登录、断点续传
cli-tool crawler novel-downloader python zhihu
Last synced: 19 Jun 2026
https://github.com/basemax/stockexchangecrawler
A crawler program to extract all of the data and the price for symbols in the global stock exchange.
crawl-pages crawler crawlers crawling crawling-framework crawling-sites crawlspider stock stock-analysis stock-data stock-exchange stock-exchange-crawler stock-exchange-platform stock-exchange-simulator stock-exchanges stock-market stock-prediction stock-price-prediction stock-prices stock-trading
Last synced: 02 Sep 2025
https://github.com/rational-kunal/netflix-hotkeys
A Chrome extension to enhance your Netflix binging experience!
chrome-extension crawler netflix
Last synced: 10 Mar 2026
https://github.com/lablnet/web-spider
Multi threaded Web crawler
crawl crawler mit open-source package project python spider
Last synced: 27 Feb 2026
https://github.com/jtiala/wpdl
⬇️ Scrape pages, posts, images and other data from a WordPress instance.
crawler downloader scraper scraping wordpress
Last synced: 08 May 2025
https://github.com/gimnathperera/web-scraping-riyasewana.lk
Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan vehicle selling web sites.
crawler python scrapy spider webscraping
Last synced: 30 Apr 2025
https://github.com/mevljas/nepremicnine-discord-bot
A discord bot for notifying about new listings on the nepremicnine.net website.
Last synced: 19 Jan 2026
https://github.com/maengsanha/instacrawler
KMU CS Capstone Design project: Instagram Meta Search Engine
crawler go instagram metasearch
Last synced: 14 Jan 2026
https://github.com/bringyourownideas/laravel-sitemap
Simple crawler and sitemap generator for Laravel. No headless browser - just a crawler.
crawler laravel laravel-sitemap sitemap-generator sitemap-xml
Last synced: 01 May 2025
https://github.com/lucasayres/linkedin-crawler-connections
Linkedin crawler to search and collect my connections (profile picture, name, occupation, location, email and phone).
chromedriver connections crawler linkedin profile python scraper selenium
Last synced: 16 May 2025
https://github.com/lysandrejik/omegle-crawler-node
Node library to connect to and interact with the Omegle website.
Last synced: 06 Mar 2026
https://github.com/leonzucchini/Recipes
Project to get and analyse data on recipes from chefkoch.de
Last synced: 03 Apr 2025
https://github.com/confact/spider.cr
Spider.cr is a spider crawler in Crystal. It handles collecting, scraping, and parsing. So you can spend your time collecting the data you want on a big scale.
Last synced: 22 Apr 2025
https://github.com/mithro/fastsvncrawler
fast-svn-crawler / fastsvncrawler - A tool for listing SVN repository content
crawler export import subversion svn vcs
Last synced: 13 Apr 2025
https://github.com/bjoern-hempel/php-web-crawler
A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.
crawler mit-license php recursive webcrawler webscraper xpath
Last synced: 11 Apr 2025
https://github.com/sobak/scrawler
Declarative, scriptable web robot (crawler) and scrapper
crawler crawler-engine robots-txt scraper scraping-websites
Last synced: 25 Mar 2025
https://github.com/wetrycode/tegenaria
Tegenaria is a crawler framework based on golang
crawler crawler-engine crawler-framework framework go golang spider spiders
Last synced: 12 Jan 2026
https://github.com/proclnas/curl-rox
Just another curl wrapper for webCrawling purposes
Last synced: 01 Apr 2026
https://github.com/exp-codes/bilibili-plugin
哔哩哔哩插件姬
bilibili crawler live programming
Last synced: 16 Aug 2025
https://github.com/matheuscas/pycnpj-crawler
Mais um módulo para extrair dados de empresas a partir do CNPJ
Last synced: 03 Sep 2025
https://github.com/yowenter/stackshare
A simple Web crawler for stackshare.io using scrapy .
Last synced: 30 Oct 2025
https://github.com/sanix-darker/ziim
Let your CLI find available solutions for errors / exceptions online on commands you hit, for you, no need open a Browser. and find something yourself
cli crawler error-correcting-codes error-handling exception-handler exception-handling exceptions javascript python scraper stackoverflow stackoverflow-api stackoverflow-questions
Last synced: 13 Apr 2025
https://github.com/hironsan/japanese-news-crawler
A complete automated japanese news crawler built on the top of Scrapy framework
Last synced: 01 Apr 2026
https://github.com/twtrubiks/google-play-store-spider-selenium
Google-Play-Store-spider use Selenium +Beautiful Soup on Python
beautifulsoup chrome crawler firefox python selenium spider sqlite
Last synced: 15 Apr 2025
https://github.com/rango-tools/pexels-crawler-cli
A Simple Crawler For Pexels Website
cli crawler docopt downloader image pexels pexels-api pexels-downloader photo pip python scraper selenium
Last synced: 24 Jun 2025
https://github.com/logocomune/botdetector
BotDetector is a golang library that detects Bot/Spider/Crawler from user agent
botdetector bots crawler go golang golang-library spider user-agent
Last synced: 29 Apr 2025
https://github.com/pi-2r/devoxxfr2025-tock-studio-ia-gen
Projet issu du codelab Devoxx France 2025 “À la recherche du RAG perdu” : atelier de 3h pour apprendre à créer un chatbot IA Générative autonome, local et sans Internet, basé uniquement sur des frameworks open source
ai chatbot crawler devoxx devoxx-fr-2025 docker generative-ai jailbreak kotlin langchain langfuse localai mistral ollama open-source rag scrapoxy scrapy
Last synced: 07 Oct 2025