Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/hackthedev/botnet
Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!
botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web
Last synced: 17 Mar 2025
https://github.com/jenting/compare-drugstore-price
Compare price between cosmeceutical shops
cosmed crawler golang poya side-project watsons
Last synced: 27 Mar 2025
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Apr 2026
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 21 Apr 2026
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 28 Feb 2025
https://github.com/brighteyekid/rendermw
Zero-dependency dynamic rendering middleware for Express. No Puppeteer. No external services. No cost. Bots get semantic HTML. Users get your SPA.
angular bots crawler dynamic-rendering express expressjs indexing middleware nodejs open-graph prerender react seo spa typescript vue
Last synced: 24 Jun 2026
https://github.com/pixlcrashr/stwhh-mensa
Better STWHH Mensa menu data / interface / notifier
api crawler data food studierendenwerk-hamburg university website
Last synced: 07 Aug 2025
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 19 Jul 2025
https://github.com/earelin/jwraith
A Java clone of the Wraith website comparison tool.
crawler screenshots screenshots-comparison selenium webtest
Last synced: 17 May 2026
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 17 May 2026
https://github.com/marcosvbras/twitton
A simple Python library to make Twitter Search API easily to use
crawler crawling python spider twitter twitter-api
Last synced: 27 Mar 2025
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 12 Apr 2026
https://github.com/robin98sun/structured-web-data-crawler
crawler multi-thread structured-web-data
Last synced: 16 Mar 2025
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 09 Jan 2026
https://github.com/yangxuhui/requests-google
A simple google related Parsing Package
Last synced: 14 Jan 2026
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 31 Dec 2025
https://github.com/usethisname1419/connectioncrawler
crawls a website and checks for connections
connection crawler http-headers reporting website-analyzer
Last synced: 06 Jul 2025
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 12 Apr 2025
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 14 May 2026
https://github.com/loko5ja/seed-gen
Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.
crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman
Last synced: 03 Apr 2025
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 16 May 2026
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 17 May 2026
https://github.com/jiusanzhou/reaper
Distributed Elegant Scraper and Crawler Framework for Rust.
crawler data-scraping rust scraper spider
Last synced: 24 Jul 2025
https://github.com/sssshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 01 Mar 2025
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 28 Mar 2025
https://github.com/rowyio/llm-web-crawler
Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.
ai automation crawler llm lowcode nocode scraper web web-crawler workflow
Last synced: 15 Jul 2025
https://github.com/allancapistrano/anime-sheets
Crawler que pega as informações dos animes e salva numa planilha.
anime crawler google-sheets google-sheets-api
Last synced: 16 Mar 2025
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 15 Mar 2025
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 13 Apr 2026
https://github.com/jplitza/urlsearch
Index typical webserver directory listings and then search for arbitrary terms
Last synced: 17 Mar 2025
https://github.com/moojing/coinmarketcap-crypto-crawler
A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.
Last synced: 01 Apr 2025
https://github.com/recepkizilarslan/console-tourist
Tourist is a simple tool that allows you to collect console messages, errors, unsuccessful requests of all your pages after the DOM loading with authentication support.
console-log crawler crawling crawling-tool error-monitoring error-reporting qa qa-automation qatools
Last synced: 24 Feb 2026
https://github.com/sanskar107/c-subject-predictor
Predicts topic of a code.
Last synced: 14 Mar 2025
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 16 May 2026
https://github.com/yggverse/pulsarss
RSS Aggregator for Gemini Protocol
aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust
Last synced: 13 Feb 2026
https://github.com/dinofizz/sitemapper
sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.
astradb cassandra concurrency crawler go golang kubernetes nats sitemap
Last synced: 16 Jan 2026
https://github.com/chamzzzzzz/supersimplesoup
a go package implements a super simple soup like DOM API
beatifulsoup crawler crawler-go dom go golang html-parser
Last synced: 28 Jan 2026
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 01 Apr 2025
https://github.com/iamkushvanth/real-time-data-analysis-using-kafka
In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.
athena aws aws-ec2 aws-s3 crawler glue kafka kafka-consumer python sql
Last synced: 18 Jun 2026
https://github.com/shivamsaraswat/webxcrawler
WebXCrawler is a fast static crawler to crawl a website and get all the links.
crawler crawling python scraping webcrawler webxcrawler
Last synced: 13 Feb 2026
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 10 Apr 2025
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 10 Apr 2025
https://github.com/blarc/windsurf-crawler
A simple crawler that collects windsurf boards offers from different sites.
Last synced: 10 Sep 2025
https://github.com/gn00678465/crawler
使用 Firecrawl API 的 Python CLI 工具,支援多種輸出格式的網頁爬取。
Last synced: 06 Feb 2026
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 24 Jun 2025
https://github.com/d7isme/pixiv-downloader-mod
Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.
chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader
Last synced: 14 May 2026
https://github.com/danielfillol/ab2l_crawler
Crawler for AB2L radar
brazil crawler lawtech legaltech
Last synced: 28 Jan 2026
https://github.com/mattmoony/webcrawler.py
A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍
beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler
Last synced: 29 Apr 2026
https://github.com/victorbaumgartner/electron_app
Testing electron app for macOS
crawl4ai crawler electron mac multithreading python3 scraping sitemaps
Last synced: 06 May 2026
https://github.com/dmarcosl/upshelf-technical-test
Technical test for Upshelf
crawler interview python scraping scrapy spider technical-test web-scraping
Last synced: 09 Apr 2025
https://github.com/is0383kk/ai-docs-sync-workflow
claude claude-code crawler github-actions python python-script workflow
Last synced: 16 May 2026
https://github.com/mnoalett/cscrawler
BSc degree thesis - crawler for www.couchsurfing.org
bsc-thesis couchsurfing crawler data-analysis database python
Last synced: 02 May 2026
https://github.com/m1/smap
smap is a site-mapping engine written in Go.
crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling
Last synced: 01 Jul 2025
https://github.com/russellsteadman/netscrape
A Node.js framework for creating good bots
bot crawler crawling exclusion rfc9309 scraper scraping web-scraping
Last synced: 20 Jun 2026
https://github.com/smikodanic/dex8-sdk
DEX8 SDK is software development kit for DEX8.com platform.
crawler crawler-engine data-extraction dex8 scraper scraping-websites spider
Last synced: 11 Jul 2025
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Apr 2026
https://github.com/jul10l1r4/objetive
This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*
bigdata crawler data-science security-tools web
Last synced: 12 Aug 2025
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 10 May 2026
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 13 Apr 2026
https://github.com/casoon/astro-crawler-policy
Policy-first crawler control for Astro — generates robots.txt and llms.txt with presets, per-bot rules, AI crawler registry, and build-time audits.
ai-crawler astro astro-integration crawler llms-txt robots-txt seo typescript
Last synced: 24 May 2026
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 29 Jan 2026
https://github.com/mehdieidi/offliner
Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.
concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread
Last synced: 14 Jan 2026
https://github.com/precioux/pacman
AI Course Projects - Fall 2022
adversial artificial-intelligence bfs-search crawler csp dfs mdp pacman-agent pacman-game pacman-projects reinforcement-learning ucs
Last synced: 28 May 2026
https://github.com/heitor57/astronomy-news
:telescope::newspaper: Astronomy News
crawler data-science news text-mining
Last synced: 06 Oct 2025
https://github.com/boatraceventureproject/boatracescraper
The BVP Crawler package for Boatrace.
boatrace crawler php php-library php8
Last synced: 17 Mar 2025
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 18 Jun 2026
https://github.com/b3j4y/unidisk
A Crawler to search for keywords and compare the score
comparison crawler nlp solr-client
Last synced: 17 Jan 2026
https://github.com/uinaf/lincrawl
Local-first Linear work-graph archive CLI
age-encryption archive cli crawler crawlkit linear sqlite
Last synced: 24 May 2026
https://github.com/burakkaygusuz/web-security-scanner
A Java-based web security browser, it detects common web vulnerabilities such as SQL Injection, XSS and sensitive information disclosure.
crawler java vulnerability-scanner web-security xss
Last synced: 16 May 2026
https://github.com/engineer2b/cure_crawl
Cure afvalbeheer kalender crawler
afval afvalwijzer browser crawler kalender
Last synced: 22 Oct 2025
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 23 Mar 2025
https://github.com/patrickschababerle/schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.
crawler puppeteer scraper scraping web-crawler
Last synced: 04 Apr 2025
https://github.com/semoal/pythoncrawler
Python crawler with XMLRPC & BeautifulSoap
beautifulsoup crawler python wordpress xmlrpc
Last synced: 15 Apr 2026