Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-15 00:06:49 UTC
- JSON Representation
https://github.com/twtrubiks/line-bot-tutorial
line-bot-tutorial use python flask
bot crawler heroku line ptt python-flask tutorial
Last synced: 16 May 2025
https://github.com/s0rg/crawley
The unix-way web crawler
cli crawler go golang golang-application pentest pentest-tool pentesting unix-way web-crawler web-scraping web-spider
Last synced: 16 May 2025
https://github.com/jairovadillo/pychromeless
Python Lambda Chrome Automation (naming pending)
automation aws-lambda chrome chromium crawler python selenium
Last synced: 12 Mar 2026
https://github.com/flairNLP/fundus
A very simple news crawler with a funny name
cc-news commoncrawl corpus crawler news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping
Last synced: 04 Mar 2025
https://github.com/GraySilver/wencai
This is a wencai crawler.(i问财的策略回测接口的Pythonic工具包)
crawler finance pandas quant quantitative-finance tushare wencai
Last synced: 27 Mar 2025
https://github.com/oppsec/pinkerton
🕵️ JavaScript file crawler and secret finder tool developed with Python
crawl crawler hacktoberfest javascript pentest python python3 redteam secrets
Last synced: 31 Mar 2025
https://github.com/oxylabs/python-web-scraping-tutorial
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping
Last synced: 16 May 2025
https://github.com/mustafadalga/instagram-bot
An Instagram bot developed using the Selenium Framework
automation automation-selenium bot bulk-comments bulk-unfollow crawler crawling download-stories instagram instagram-api instagram-bot instagram-downloader instagram-without-api mass-liking python python3 selenium selenium-framework selenium-python selenium-webdriver
Last synced: 02 Oct 2025
https://github.com/eight04/comiccrawler
An image crawler written in Python.
cli crawler gui image-crawler python tkinter
Last synced: 15 May 2025
https://github.com/viasite/site-audit-seo
Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx
audit cli crawl-site crawler lighthouse puppeteer scraper seo seo-audit seo-site-audit site-audit xlsx
Last synced: 14 Mar 2026
https://github.com/BlessedRebuS/Krawl
Krawl is a customizable lightweight cloud native web deception server and anti-crawler that creates fake web applications with low-hanging vulnerabilities and realistic, randomly generated decoy data
anti-crawling blue-team cloud-native crawler cybersecurity deception honeypot kubernetes security self-hosted spider web
Last synced: 11 Feb 2026
https://github.com/devanshbatham/Gorecon
Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into their arsenal
admin-panel-finder backups-finder cmsdetecter configurationfiles crawler directory-bruteforce dns dnsrecon email-hunter geo-ip nameserver recon reconaissance reverse-dns scanner subdomain-enumeration subdomain-scanner subnet-lookup whois-lookup wordpress-scanner
Last synced: 03 Apr 2025
https://github.com/eight04/ComicCrawler
An image crawler written in Python.
cli crawler gui image-crawler python tkinter
Last synced: 03 Aug 2025
https://github.com/Jasonnor/th-music-video-generator
Touhou Project random music video generator/player, crawling image and video from websites to generate MV.
crawler javascript music-video touhou web
Last synced: 27 Apr 2025
https://github.com/zhupingqi/RuiJi.Net
crawler framework, distributed crawler extractor
crawler extractor headless-chrome netcore owin scraper scrapy
Last synced: 04 May 2025
https://github.com/algolia/algoliasearch-netlify
Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler
algolia algolia-crawler algoliasearch crawler jamstack netlify netlify-plugin search
Last synced: 03 Oct 2025
https://github.com/glaucocustodio/tanakai
Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.
chrome-headless crawler kimurai scraper scrapy webscraping
Last synced: 28 Mar 2025
https://github.com/lucasjinreal/weibo_terminator_workflow
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
crawler nlp scraper sentiment-analysis weibo-terminator
Last synced: 05 Mar 2026
https://github.com/antchfx/antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
crawler crawling framework golang scraping web-crawler web-spider
Last synced: 14 Mar 2025
https://github.com/zntfdr/Selenops
A Swift Web Crawler 🕷
command-line-tool crawler scripting swift web
Last synced: 18 Jul 2025
https://github.com/outpoot/vyntr
Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com
crawler duckduckgo engine google python rust search tantivy web
Last synced: 15 May 2025
https://github.com/xyntax/filesensor
Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具
crawler fuzzing pentesting scrapy
Last synced: 03 Sep 2025
https://github.com/zrashwani/arachnid
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Last synced: 13 Jan 2026
https://github.com/zntfdr/selenops
A Swift Web Crawler 🕷
command-line-tool crawler scripting swift web
Last synced: 08 May 2025
https://github.com/6677-ai/tap4-ai-crawler
The crawler opened source by tap4.ai
aitoolkit aitools crawler crawler-engine crawler-python
Last synced: 16 May 2025
https://github.com/turnersoftware/infinitycrawler
A simple but powerful web crawler library for .NET
crawler robots-txt spider web-crawler web-crawling
Last synced: 21 Jun 2025
https://github.com/dwisiswant0/galer
A fast tool to fetch URLs from HTML attributes by crawl-in.
crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls
Last synced: 12 Apr 2025
https://github.com/amerkurev/scrapper
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
crawler crawler-python crawling headless readability scraper scraping web-parsers web-parsing web-scraping
Last synced: 08 May 2025
https://github.com/TurnerSoftware/InfinityCrawler
A simple but powerful web crawler library for .NET
crawler robots-txt spider web-crawler web-crawling
Last synced: 25 Mar 2025
https://github.com/sudheer-ranga/aliexpress-product-scraper
Get Aliexpress product details as a json response including feedbacks, variants, shipping info, description, images, etc.,
aliexpress aliexpress-api aliexpress-crawler aliexpress-product-json aliexpress-product-scraper aliexpress-scraper aliexpress-spider crawler dropship dropshipping hacktoberfest hacktoberfest19 hacktoberfest2019 product-json product-reviews product-scraper scraper spider
Last synced: 06 Apr 2025
https://github.com/vitorfs/woid
Simple news aggregator displaying top stories in real time
Last synced: 09 Apr 2025
https://github.com/mohammedcha/gplay-scraper
GPlay Scraper is a powerful Python Google Play scraper library for extracting comprehensive app data from the Google Play Store. Scrape Google Play Store apps to get ratings, install counts, reviews, ASO metrics, developer information, and 65+ data fields
android app-analytics crawler google google-play play-store play-store-api playstore scarper scraper
Last synced: 02 Mar 2026
https://github.com/ovnrain/javbus-api
一个自我托管的 JavBus API 服务
adults api api-server crawler docker javbus magnet nodejs spider typescript vercel vercel-deployment
Last synced: 09 Apr 2025
https://github.com/ptt-alertor/ptt-alertor
:loudspeaker: Ptt 文章通知機器人!Notify Ptt Article in Realtime
chatbot crawler linebot messenger-bot ptt telegram-bot
Last synced: 14 Jan 2026
https://github.com/dwisiswant0/gf-secrets
Secret and/or credential patterns used for gf.
alienvault-otx bugbounty crawler gau gf gitleaks infosec open-threat-exchange secrets-detection trufflehog trufflehog3 wayback wayback-machine waybackurl
Last synced: 20 Jul 2025
https://github.com/kong36088/ZhihuSpider
多线程知乎用户爬虫,基于python3
crawler multi-threading python python3 spider zhihu
Last synced: 19 Jul 2025
https://github.com/spatie/robots-txt
Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
Last synced: 14 May 2025
https://github.com/ScottSloan/Bili23-Downloader
下载 Bilibili 视频/番剧/电影/纪录片 等资源
bilibili crawler linux macos python videodownloader windows wxpython
Last synced: 16 Mar 2025
https://github.com/lgh06/web-page-monitor
Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。
change-alert change-detection change-monitor crawler monitor website-change-monitor website-monitoring
Last synced: 31 Oct 2025
https://github.com/R4yGM/dorkscout
DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets
bug-bounty crawler ghdb golang google-dorks osint scraper security
Last synced: 11 Jul 2025
https://github.com/zhaotianff/csharpcrawler
C#爬虫示例程序,想学习爬虫入门知识的可以看过来。后续会慢慢加入更多爬虫相关的知识。
Last synced: 09 Apr 2025
https://github.com/vormkracht10/laravel-seo-scanner
Scan your Laravel application routes for SEO improvements suggestions.
crawler laravel laravel-framework laravel-seo laravel-seo-scanner scanner seo seo-optimization seo-tools seotools
Last synced: 15 Apr 2025
https://github.com/tufayellus/linkedin-scraper
A LinkedIn Scraper to scrape up to 1k LinkedIn profiles(due to LinkedIn limit) from company profile links and save their e-mail addresses if available! (actively maintained, if anything doesn't work, open an issue in the repo)
crawler digital-marketing email-marketing email-scraper leads linkedin linkedin-bot linkedin-gui linkedin-scraper linkedin-scraper-gui scrape-email scrape-emails scraper scraper-engine
Last synced: 27 Oct 2025
https://github.com/crawlab-team/crawlab-lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
crawlab crawler crawler-management crawling-tasks platform scrapy scrapy-ui scrapyd scrapyd-ui spider web-crawler
Last synced: 28 Jan 2026
https://github.com/zhaow-de/rotating-tor-http-proxy
A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.
amd64 arm64 armv6 armv7 crawler docker-image dockerhub-image haproxy multi-platform privoxy-tor proxy tor
Last synced: 13 Feb 2026
https://github.com/kirralabs/indonesian-NLP-resources
data resource untuk NLP bahasa indonesia
corpus corpus-linguistics crawler dataset dependency-parser indonesian indonesian-language named-entity-recognition nlp parallel-corpus pos-tagging sentiment-analysis
Last synced: 15 Apr 2025
https://github.com/linkedtales/scrapedin-linkedin-crawler
Crawler for LinkedIn full profiles 2019
crawler linkedin linkedin-crawler
Last synced: 08 Apr 2025
https://github.com/forcefledgling/proxyhub
An advanced [Finder | Checker | Server] tool for proxy servers, supporting both HTTP(S) and SOCKS protocols. 🎭
anonymity anonymous crawler free-proxy free-proxy-list http-proxy privacy proxies proxy proxy-checker proxy-grabber proxy-list proxy-scraper proxy-scrapper proxy-server proxy-tool proxypool socks socks4 socks5
Last synced: 05 Oct 2025
https://github.com/crypto-crawler/crypto-crawler-rs
A rock-solid cryptocurrency crawler library.
crawler cryptocurrency websocket
Last synced: 12 Dec 2025
https://github.com/songtianyi/laosj
golang light-weight image crawler
aiss crawler douban downloader girls image meizitu sexy spiders
Last synced: 16 Jan 2026
https://github.com/Norconex/crawler
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
collector-fs collector-http crawler crawlers filesystem-crawler flexible java search-engine web-crawler
Last synced: 11 Jun 2026
https://github.com/macacajs/NoSmoke
A cross platform UI crawler which scans view trees then generate and execute UI test cases.
android crawler ios macaca smoke-tests test-automation webdriver
Last synced: 15 Apr 2025
https://github.com/mgleon08/instagram-crawler
Crawl instagram photos, posts and videos for download.
crawler gem instagram instagram-crawler instagram-scraper ruby rubygems scraper
Last synced: 06 Apr 2025
https://github.com/webysther/packagist-mirror
📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer
composer composer-packages crawler mirror packagist packagist-mirror php
Last synced: 30 Dec 2025
https://github.com/subins2000/search
An Open Source Search Engine
crawler php search search-engine
Last synced: 09 Apr 2025
https://github.com/Josue87/MetaFinder
Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata
Last synced: 12 Jul 2025
https://github.com/0xsha/chainwalker
Rapid Smart Contract Crawler
blockchain crawler dataset evm-bytecode geth security smart-contracts web3
Last synced: 09 Mar 2026
https://github.com/0xsha/ChainWalker
Rapid Smart Contract Crawler
blockchain crawler dataset evm-bytecode geth security smart-contracts web3
Last synced: 11 Jul 2025
https://github.com/Webysther/packagist-mirror
📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer
composer composer-packages crawler mirror packagist packagist-mirror php
Last synced: 02 Apr 2025
https://github.com/elliotxx/zhihu-crawler-people
A simple distributed crawler for zhihu && data analysis
crawler python python-crawler spider web-crawler web-spider
Last synced: 13 Apr 2025
https://github.com/cocrawler/cocrawler
CoCrawler is a versatile web crawler built using modern tools and concurrency.
aiohttp aiohttp-client async-python concurrency crawler pluggable-modules python3 screenshot warc
Last synced: 14 Dec 2025
https://github.com/gosom/scrapemate
Golang Crawling and scraping framework
crawler go go-framework golang scraper spider web-crawler web-scraping
Last synced: 31 Jan 2026
https://github.com/nfx/slrp
rotating open proxy multiplexer
crawler golang proxy proxy-checker proxy-list proxy-pool proxy-server
Last synced: 04 Apr 2025
https://github.com/mehmetozkaya/dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 11 May 2025
https://github.com/saeeddhqan/evine
Interactive CLI Web Crawler
cli crawler data-mining fuzzing go golang osint scraper web-crawler
Last synced: 12 Jan 2026
https://github.com/mehmetozkaya/DotnetCrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 18 Apr 2025
https://github.com/Jiramew/spoon
🥄 A package for building specific Proxy Pool for different Sites.
crawler distributed ip proxies proxy proxy-provider proxypool python redis spider spoon
Last synced: 07 Apr 2025
https://github.com/norconex/crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
collector-fs collector-http crawler crawlers filesystem-crawler flexible java search-engine web-crawler
Last synced: 09 Jun 2026
https://github.com/N0taN3rd/Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving
Last synced: 06 Apr 2025
https://github.com/guilhermecgs/ir
Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir
acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping
Last synced: 26 Apr 2025
https://github.com/n0tan3rd/squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving
Last synced: 13 Sep 2025
https://github.com/karust/gogetcrawl
Extract web archive data using Wayback Machine and Common Crawl
commoncrawl concurrency crawler golang wayback-machine webarchive
Last synced: 15 Jan 2026
https://github.com/nuhmanpk/WebScrapper
Powerful Telegram bot for web scraping and crawling. Fast, easy, and loved by thousands!
beautifulsoup4 crawler crawler-engine crawler-python hacktoberfest hacktoberfest-accepted hacktoberfest2023 pyrogram pyrogram-bot requests scraper scraping selenium telegram telegram-bot web-scraping webscraping webscrapper webscrapping webscrapping-python
Last synced: 22 Jul 2025
https://github.com/oscar-project/ungoliant
:spider: The pipeline for the OSCAR corpus
common-crawl commoncrawl corpus-linguistics crawler fasttext language-classification nlp oscar
Last synced: 03 Apr 2025
https://github.com/cytopia/urlbuster
Powerful mutable web directory fuzzer to bruteforce existing and/or hidden files or directories.
brute-force bruteforce bruteforce-attacks crawler cytopia-sec url-bruteforcer
Last synced: 09 Apr 2025
https://github.com/stulzq/HttpCode.Core
简单、易用、高效 一个有态度的开源.Net Http请求框架!可以用制作爬虫,api请求等等。
crawler httpcode httpmock httprequest net-core net-standard
Last synced: 04 May 2025
https://github.com/beb7/gflare-tk
Open-Source Python Based SEO Web Crawler
crawler python robots-txt scraper seo seo-crawler tkinter
Last synced: 07 May 2025
https://github.com/nuhmanpk/webscrapper
Powerful Telegram bot for web scraping and crawling. Fast, easy, and loved by thousands!
beautifulsoup4 crawler crawler-engine crawler-python hacktoberfest hacktoberfest-accepted hacktoberfest2023 pyrogram pyrogram-bot requests scraper scraping selenium telegram telegram-bot web-scraping webscraping webscrapper webscrapping webscrapping-python
Last synced: 12 Apr 2025
https://github.com/seart-group/ghs
GitHub Search: Platform used to crawl, store and present projects from GitHub, as well as any statistics related to them
bootstrap crawler csv-export dataset-generation docker-compose git github java-17 json-export mining-software-repositories msr mysql platform repository search-engine spring-boot spring-boot-application spring-boot-server sql-dump xml-export
Last synced: 04 Apr 2025