Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/ericc-ch/crawldown
Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.
Last synced: 05 Jul 2025
https://github.com/lucasromualdo/glassdoorcrawler
Crawler em Python para coletar vagas do Glassdoor e exportar para Excel
cli crawler glassdoor openpyxl pandas python web-scraping
Last synced: 25 Feb 2026
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 22 May 2026
https://github.com/matheusfaustino/jazzmaster_crawler
It is a crawling for getting the audio programs from a specific radio program called Jazzmaster
Last synced: 14 Jun 2025
https://github.com/jayzhan211/python-crawler-startups
python crawler learning
Last synced: 20 Mar 2025
https://github.com/tetreum/xupopter_runner
Executes crawling recipes coming from Xupopter Chrome Extension.
crawler scrapper scrapping webscraper
Last synced: 08 Aug 2025
https://github.com/tetreum/xupopter_client
Simple interface to manage Xupopter recipes aswell as it's runners.
crawler scrapper scrapping webscraper
Last synced: 04 Apr 2025
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 02 Apr 2025
https://github.com/krishpranav/gozap
⚡️ Multiple target ZAP Scanning made in go
cli crawler go go-crawler golang zap
Last synced: 27 Mar 2025
https://github.com/jonesrussell/north-cloud
A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.
Last synced: 25 Jan 2026
https://github.com/g-ongenae/morphalou-crawler
A Crawler for CNRTL's Morphologie words
crawler french lexical-databases list-of-words words
Last synced: 25 Feb 2025
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 10 Jun 2026
https://github.com/iamkushvanth/real-time-data-analysis-using-kafka
In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.
athena aws aws-ec2 aws-s3 crawler glue kafka kafka-consumer python sql
Last synced: 18 Jun 2026
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 19 Apr 2026
https://github.com/boatraceventureproject/boatracescraper
The BVP Crawler package for Boatrace.
boatrace crawler php php-library php8
Last synced: 17 Mar 2025
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 13 Jun 2026
https://github.com/grayhat12/grawler
A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.
crawler scraping scraping-websites scrapper scrapy-crawler
Last synced: 27 Mar 2025
https://github.com/hdevlinz/affiliate-chrome-extension
chrome-extension crawler tiktok
Last synced: 14 May 2025
https://github.com/lolyratul025/web-email-bundler
A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.
crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping
Last synced: 22 May 2026
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 09 Apr 2025
https://github.com/ismoreirakt/spyder
The web is changing. Spyder sees it.
alerts automation crawler monitor
Last synced: 01 Mar 2025
https://github.com/mnemocron/VPNNetworkShareCrawler
ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it
Last synced: 11 Mar 2025
https://github.com/jofaval/open-graph-visualizer
Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol
crawler javascript opengraph scraping web web-scraping
Last synced: 08 Sep 2025
https://github.com/onetail/crawler-with-kafka-docker
homework to crawler and anaylsis
Last synced: 18 Mar 2025
https://github.com/alphadev3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 26 Dec 2025
https://github.com/crosscutsaw/iscsicrawler
iscsicrawler is a bash script that crawls files in the iscsi targets with ease.
crawler iscsi iscsi-target iscsiadm
Last synced: 16 Jan 2026
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 21 Apr 2026
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 28 Feb 2025
https://github.com/xoraus/revieworacle
The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.
ai crawler datascience machinelearning scrappy selenium-webdriver
Last synced: 07 May 2026
https://github.com/waived/google-drive-crawler
Proxy-based crawler to expose public (shared) Google Drive links
crawler crawler-python file-crawler google-drive-api shared-folders web-spider
Last synced: 27 Mar 2025
https://github.com/matheusfaustino/phrawl
Phrawl: A web crawling framework in PHP (or it seems so)
crawler crawling crawling-framework php scraper wip
Last synced: 08 Sep 2025
https://github.com/avsbharadwaj/web_crawler
A basic web crawler that prints out the links and description present on a website rescursively
Last synced: 21 Apr 2026
https://github.com/kofj/octopus
Octopus an open source software to collect data from web pages.
Last synced: 15 May 2026
https://github.com/joaooliveirapro/trawlergo
TrawlerGo 🐛 is a basic HTTP crawler written in Go, designed to efficiently discover all URLs within a specified domain while capturing related HTTP request information.
Last synced: 09 Jun 2026
https://github.com/robin98sun/structured-web-data-crawler
crawler multi-thread structured-web-data
Last synced: 16 Mar 2025
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 09 Jan 2026
https://github.com/yangxuhui/requests-google
A simple google related Parsing Package
Last synced: 14 Jan 2026
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 31 Dec 2025
https://github.com/usethisname1419/connectioncrawler
crawls a website and checks for connections
connection crawler http-headers reporting website-analyzer
Last synced: 06 Jul 2025
https://github.com/jonasrenault/pubchem-api-crawler
Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.
chemistry crawler molecular-formula pubchem python
Last synced: 15 May 2026
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 14 May 2026
https://github.com/loko5ja/seed-gen
Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.
crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman
Last synced: 03 Apr 2025
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 09 Jul 2025
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 17 May 2026
https://github.com/sssshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 01 Mar 2025
https://github.com/moparisthebest/nginx-limit-crawlers
rate limit crawlers in nginx
Last synced: 14 Mar 2025
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 23 Feb 2025
https://github.com/allancapistrano/anime-sheets
Crawler que pega as informações dos animes e salva numa planilha.
anime crawler google-sheets google-sheets-api
Last synced: 16 Mar 2025
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 06 Apr 2025
https://github.com/pixlcrashr/stwhh-mensa
Better STWHH Mensa menu data / interface / notifier
api crawler data food studierendenwerk-hamburg university website
Last synced: 07 Aug 2025
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 13 Apr 2026
https://github.com/jonesrussell/pipelinex
Firecrawl-style web intelligence pipeline powered by North Cloud
Last synced: 09 Mar 2026
https://github.com/moojing/coinmarketcap-crypto-crawler
A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.
Last synced: 01 Apr 2025
https://github.com/luminovrym/crawler-tools-js
Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web
crawler crawler-js data js web-scraping
Last synced: 08 Sep 2025
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 09 Aug 2025
https://github.com/kenanbek/tutorial-python-crawler
Crawling website data using Python with requests and Beautiful Soup libraries
beautifulsoup crawler crawling miner parser python python-requests requests
Last synced: 30 Mar 2025
https://github.com/kestarumper/imagecrawler
Downloads images from given URL
Last synced: 28 Jun 2025
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 01 Apr 2025
https://github.com/wcygan/crawler
web crawler
crawler crawling tokio tokio-rs web-crawler
Last synced: 08 May 2026
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 09 May 2026
https://github.com/allotmentandy/socialmedialinkextractor
php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links
crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube
Last synced: 09 May 2026
https://github.com/basemax/okala-product-ids
A PHP script to fetch and save product IDs from Okala's online store API across multiple categories and store branches.
crawler crawler-okala crawler-php crawlers data database ids ir iran json okala okala-crawler php php-crawler product
Last synced: 09 May 2026
https://github.com/catbraaain/search-crawl
Search the web and crawl content stealthily, with optional extraction using LLMs.
crawl crawler fastapi playwright scrape scraping searxng
Last synced: 09 May 2026
https://github.com/a-b-z-b/web-spider
A Humble Web Crawler
crawler docker-compose go mongodb web-crawler
Last synced: 09 May 2026
https://github.com/victorbaumgartner/electron-crawler-ui
Desktop app with axios electron to crawl websites accross multiple servers
app axios crawler desktop electronjs macos multiple-servers multithreading
Last synced: 09 May 2026
https://github.com/metehan777/http-header-link-graph
Publish a site's link graph & heading map in HTTP response headers. Crawl 65k pages in 99 seconds without parsing one byte of HTML. Companion code for the SEO Week 2026 NYC experiment.
aeo answer-engine-optimization cloudflare-workers crawler generative-engine-optimization geo http-headers link-graph python rust seo site-architecture technical-seo
Last synced: 03 Jun 2026
https://github.com/nsalvacao/cli-plugins
OpenAPI for CLIs — Crawl any CLI's --help output and generate structured Claude Code plugins with expert command knowledge
ai-agent claude-code cli cli-reference crawler developer-tools help-parser llm plugin python
Last synced: 04 Mar 2026
https://github.com/machinecyc/lotteryinsight
Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.
crawler data docker lottery mysql-database python3 taiwan
Last synced: 09 May 2026
https://github.com/nabi-allenby/web-crawler
BFS web crawler
crawler docker k8s kubernetes reconnaissance rust rust-lang webcrawler
Last synced: 02 Mar 2026
https://github.com/khanof89/twitter_scraper
Scrape tweet details from user profile using selenium
crawler scraper selenium twitter twitter-bot
Last synced: 11 May 2026
https://github.com/woshiluo/bilibilicomic-download
bilibili crawler downloader manga
Last synced: 11 May 2026
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 12 May 2026
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 12 May 2026
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 13 May 2026
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 14 May 2026
https://github.com/manchittlab/TheCrawler
Open-source web scraper + LLM-powered structured extraction. PDF/DOCX, markdown, JSON-LD, microdata, commerce data, forms, 16 analytics-tracker detection. Structured errors with retryable flags. Adaptive Cheerio->Playwright. CLI, npm, REST API, and MCP server. AGPL-3.0.
agpl apify cheerio crawler llm markdown mcp mcp-server model-context-protocol nodejs playwright rag scraper typescript web-scraping
Last synced: 20 Jun 2026
https://github.com/scrape-do/dotnet-example
Best Rotating Proxy & Scraping API Alternative. C# Example.
captcha captcha-solver crawler crawlers crawling data-mining data-science data-scraping free free-proxy free-proxy-list proxy proxy-list proxylist rotating-proxy scraper scraping scraping-api scraping-tool
Last synced: 12 Jun 2026
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 13 Jun 2026
https://github.com/soenneker/soenneker.playwrights.crawler
A configurable Playwright crawler with rich stealth and control options.
browser chrome chromium crawl crawler csharp dotnet playwright playwrightcrawler playwrights scrape scraper stealth util
Last synced: 14 Jun 2026
https://github.com/vhdm/twitter-hashtag-crawler
Twitter hashtag crawler by selenium, without using the Twitter API ;)
Last synced: 14 Jun 2026
https://github.com/tri613/nespresso
A mobile version for nespresso coffee website :coffee:
Last synced: 15 Jun 2026