Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 05 Apr 2025
https://github.com/turtiesocks/zendriver-rs
Async-first, undetectable browser automation in Rust via the Chrome DevTools Protocol. Stealth-by-default port of zendriver — no WebDriver, no JS shim.
anti-detection async automation bot browser-automation cdp chrome-devtools-protocol chromium cloudflare-bypass crawler headless-chrome playwright-alternative rust scraping stealth tokio undetectable-chromedriver web-scraping web-testing zendriver
Last synced: 13 Jun 2026
https://github.com/coghost/crawlers
crawlers in one
crawler python3 staticimg weibo
Last synced: 10 Jul 2025
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 10 Jul 2025
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 27 Apr 2026
https://github.com/sudolife/shopify
An easy-to-use crawler to keep track of reviews of an app on Shopify.
Last synced: 16 May 2026
https://github.com/orsinium-labs/gpcc
Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)
Last synced: 19 Jun 2025
https://github.com/purrproof/smartcrawl
An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.
blockchain cli crawler explorer framework go golang hacktoberfest
Last synced: 16 May 2026
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 11 Jul 2025
https://github.com/matheusfelipeog/google-doodles
Mapeie e faça download dos Doodles do Google.
crawler google google-doodle python web-scraping
Last synced: 13 Jul 2025
https://github.com/tibiasolutions/sharp-parser
Tibia.com parser informations in C#
crawler nuget parsed-data tibia tibia-parser
Last synced: 17 May 2026
https://github.com/uranusx86/dcard-crawler-analyzer
get Dcard & Meteor forum content and analyze !
crawl crawler dcard nlp python word-cloud word-count word-frequency
Last synced: 14 Jul 2025
https://github.com/iamgideonidoko/web-crawler-with-php
Sample implementation of web crawler in PHP
Last synced: 21 Mar 2025
https://github.com/bytejoseph/osintgit
OSINT investigation tool for Github
crawler email github github-to-email hacking hacking-tool hacktoberfest hacktoberfest2024 latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp
Last synced: 23 Jul 2025
https://github.com/sirius-mhlee/naver-cafe-crawler
NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
beautifulsoup4 crawler pandas selenium tqdm
Last synced: 09 Mar 2026
https://github.com/adamfisher/scrapyrt.client
A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider
Last synced: 21 Mar 2025
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 03 Apr 2025
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 05 Apr 2026
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 01 Sep 2025
https://github.com/pavelsr/email-extractor
Fast email crawler
crawler email-crawler email-marketing perl telemarketing
Last synced: 18 Mar 2025
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 17 May 2026
https://github.com/duaraghav8/larry-crawler
Kayako Twitter challenge
crawler fetch-tweets hashtag nodejs pagination tweets twitter-api
Last synced: 17 May 2026
https://github.com/rix4uni/pathcrawler
Discover new paths via scanning html.
bug-bounty bugbounty bugbountytips crawler hacking infosec osint osint-resources osint-tool pathcrawler penetration-testing pentest-tool pentesting recon reconnaissance scrape security security-tools threat-intelligence
Last synced: 17 Feb 2026
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 24 Jun 2025
https://github.com/flavien-hugs/scrapy-test
Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.
crawler python scraping scrapy
Last synced: 29 Mar 2025
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 29 Mar 2025
https://github.com/richecr/pyhltv
Repository to extract information from the HLTV website.
crawler csgo hacktoberfest hltv hltv-api python3
Last synced: 21 May 2026
https://github.com/srx-2000/swaiter
a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序
crawler selenium selenium-python
Last synced: 18 May 2026
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 21 Mar 2025
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 18 May 2026
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 21 Mar 2025
https://github.com/zhs007/lottery-crawler
基于jarvis-task的爬虫,主要用来爬取lottery数据。
Last synced: 30 Oct 2025
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 18 May 2026
https://github.com/morungos/github-issue-crawler
Github crawler for public repositories, issues, and comments
Last synced: 30 Apr 2026
https://github.com/basemax/kashan-university-phone-directory
This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.
crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan
Last synced: 18 May 2026
https://github.com/maxiroellplenty/gs-robot
NodeJs tool to scrap gelbe-seiten
axios cheerio crawler gelbe-seiten nodejs scraper yargs
Last synced: 18 May 2026
https://github.com/hong539/acgbox_crawler
An web-crawler for gamer.com.tw/acgbox
beautifulsoup4 crawler pandas python requests scrapy sqlalchemy web-crawler
Last synced: 05 Apr 2025
https://github.com/igorbrizack/web-scraper
Aplicação de raspagem de dados HTML, construída em python.
crawler pytest python3 scraper
Last synced: 08 May 2026
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 21 Mar 2025
https://github.com/khadkarajesh/aptoide
Aptoide app crawler using beautifulsoup
beautifulsoup4 crawler flask python3 web-application
Last synced: 19 May 2026
https://github.com/dineshsprabu/concurrent-web-crawler
Flexible and concurrent web crawler implemented in 'go'
concurrent-web-crawler crawler go-crawler spider web-crawler
Last synced: 12 Jan 2026
https://github.com/basemax/doostihaacrawler
A PHP-implemented crawler for Doostihaa.com. (Database of thousands of movies)
crawler crawler-example crawler-php crawler-testing crawlers database-movie database-movies doostihaa doostihaa-com movie movie-database movie-database-api movie-database-website movie-db movies movies-database php php-crawler
Last synced: 13 Sep 2025
https://github.com/yordadev/fenrisjs
A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.
analysis crawler link-collection link-crawler nodejs nodejs-application
Last synced: 10 May 2026
https://github.com/arghyadipchak/craww
Gemini (protocol) crawler written in Rust
crawler gemini gemini-protocol rust
Last synced: 15 Jun 2026
https://github.com/eea/eea-crawler
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
airflow-dags crawler elasticsearch etl-pipeline indexing
Last synced: 20 May 2026
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 22 Mar 2025
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 31 Jul 2025
https://github.com/muhfalihr/pyxdtelebot
PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.
crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x
Last synced: 06 Apr 2025
https://github.com/krishealty/whoknows
All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.
bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance
Last synced: 11 Apr 2026
https://github.com/ryanking13/bellorin
Multi-threaded Social Media Crawler 🔍
crawler instagram social-media
Last synced: 29 Jun 2025
https://github.com/tasooshi/digslash
A site mapping and enumeration tool for Web applications analysis
crawler mapping sitemap spider
Last synced: 08 Apr 2026
https://github.com/shimech/pokemon-db-maker
Webクローリングでポケモン図鑑を生成
beautifulsoup crawler docker pokemon scraper
Last synced: 25 Jan 2026
https://github.com/rebrowser/autotrader-dataset
AutoTrader car listings database: new, used & CPO vehicles with make, model, trim, mileage, MSRP, KBB fair price range, deal rating, body style, fuel type, and seller state. Updated daily.
automotive autotrader car-listings car-prices crawler data-collection data-science dataset kbb open-data scraper used-cars vehicle-data web-scraping
Last synced: 03 May 2026
https://github.com/andrew-ld/wowroms-downloader
download all roms from wowroms
aiohttp asyncio crawler python3
Last synced: 17 Jan 2026
https://github.com/toannd96/chromedp-example-login
chromedp crawler golang goquery
Last synced: 21 May 2026
https://github.com/lucasbotang/project_financial_markets_text_mining
Predict stock market movement based on news
crawler data-science natural-language-processing python
Last synced: 21 May 2026
https://github.com/pymarcus/webscrapingiii
Um crawler que pega produtos em uma lista e percorre as páginas do mercado livre selecionando preços, o nome e o link para acessá-los.
crawler mercadolivre python webscraping
Last synced: 15 Sep 2025
https://github.com/vietdoo/sg-property-hub
SG Property Hub is a comprehensive platform for managing and analyzing property data.
airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping
Last synced: 08 Apr 2026
https://github.com/vlad1kudelko/2023.08.15-scraping
Crawler of cooking sites
cloudflare cloudflare-bypass crawler docker parsing python scraping selenium undetected-chromedriver
Last synced: 08 Apr 2026
https://github.com/nelcifranmagalhaes/web_crawler
A web crawler for all Naruto characters
anime beautifulsoup characters crawler naruto python
Last synced: 14 Jul 2025
https://github.com/khdxsohee/email-miner-pro
EMail Miner Pro is designed specifically for professionals scraping data from search engines like Google, ensuring that generic emails (e.g., Gmail, Yahoo) are correctly linked to their business websites found on the page.
chrome crawler crawling email email-extractor extension-chrome lead-generation miner scraper
Last synced: 03 Feb 2026
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 08 Mar 2026
https://github.com/somehowchris/swisslos-cralwer
(WIP) Crawler to access the current and history numbers of swisslos
crawler euromillions lotto rust swisslos
Last synced: 22 Mar 2025
https://github.com/deventerprisesoftware/scrapi-sdk-dotnet
The only web scraping service you'll ever need that offers advanced features that are simple to use for efficient data extraction.
browser-automation crawler scraper-api web-scraping webscraper
Last synced: 22 May 2026
https://github.com/thomashirtz/douban-crawler
A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.
Last synced: 14 May 2025
https://github.com/im-perativa/public_crawler
A collection of crawler project for Indonesia dataset
crawler indonesia indonesia-api scrapy
Last synced: 20 Mar 2025
https://github.com/willi-dev/dtcapp
dtcapp : distributed twitter crawler.
crawler distributed-systems hazelcast java twitter twitter-api
Last synced: 18 Sep 2025
https://github.com/cseas/shares-monitor
Web crawler to fetch and monitor shares details.
crawler python python3 scraper scraping-websites shares
Last synced: 27 Jul 2025
https://github.com/panakour/pkscraper
Extract structured data from the web
crawler crawling scraper scraping scraping-websites webcrawler
Last synced: 19 Feb 2026
https://github.com/leandrols/scliper
CLI Tool to make simple web scraping.
cli-scripts crawler golang scraping
Last synced: 01 Nov 2025
https://github.com/ghost---shadow/feature-extractor-from-codebase
Copies the target java file and all its dependencies recursively to another directory
Last synced: 22 Sep 2025
https://github.com/beanwei/zmt-post-crawler
Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend
Last synced: 08 Nov 2025
https://github.com/programming-with-love/skyeyesystem
天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。
crawler mysql redis skyeye skyeyewall springboot
Last synced: 25 Sep 2025
https://github.com/kangoo13/textbroker-author-article-picker
Bot that automatically lock an order into a textbroker's author account.
author-textbroker automation bot colly crawler go gocolly golang scrapper spider textbroker textbroker-author textbroker-order-picker textbroker-orders textbroker-scrapper
Last synced: 02 Aug 2025
https://github.com/arihantbansal/cybersec-python
Cybersec/CTF practice problems solved in Python
crawler cryptography ctf cybersecurity sockets webscraping
Last synced: 02 Aug 2025
https://github.com/udaykiran2017/seo-reports
📊 Generate and analyze SEO reports effortlessly to enhance your website's visibility and performance across search engines.
audit broken-links cli crawler extraction google-lighthouse hreflang-checker hreflang-matrix puppeteer scan-website searchengineoptimization seo seo-macroscope seo-manager seo-meta seo-optimization web-scraping webmaster
Last synced: 16 May 2026
https://github.com/tsoliangwu0130/ptt-search
A simple Python script to fetch PTT post from the command line.
Last synced: 08 Aug 2025
https://github.com/jfcherng/wiki-cgroup-crawler
此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。
crawler php-71 wiki-cgroup-crawler wikipedia
Last synced: 03 Oct 2025
https://github.com/mindfiredigital/deepscanbot
It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.
bot crawl crawler go golang google webcrawler
Last synced: 10 Aug 2025
https://github.com/ptthanh02/vietnam-news-crawler
crawler crawling-python newspaper text-data text-mining
Last synced: 11 Aug 2025
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 05 Oct 2025
https://github.com/win7user10/laraue.crawling
The set of tools for fast writing crawlers on the .NET
crawler csharp csharp-crawler parser
Last synced: 17 Aug 2025