Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 07 Jun 2026
https://github.com/blarc/windsurf-crawler
A simple crawler that collects windsurf boards offers from different sites.
Last synced: 10 Sep 2025
https://github.com/claudio-code/nap-web-crawler
Created It crawler to find broken links in docs of framework and languages
Last synced: 07 Jul 2025
https://github.com/mattmoony/webcrawler.py
A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍
beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler
Last synced: 29 Apr 2026
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 24 Jul 2025
https://github.com/microlinkhq/cloudflare-bot-directory
CloudFlare Radar verified bots directory – 500+ web crawlers and user agents as JSON.
bot-detection bots cloudflare cloudflare-radar crawler crawlers dataset datasets googlebot user-agent user-agents user-agents- verified-bots web-crawler web-scraping
Last synced: 20 Apr 2026
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 06 May 2026
https://github.com/basemax/okala-database-crawler
A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.
crawler crawler-php curl data json okala okala-com okalacom php php-crawler scraper
Last synced: 01 May 2026
https://github.com/aminehsan/datamining-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scraping
Last synced: 25 Jul 2025
https://github.com/victorbaumgartner/electron_app
Testing electron app for macOS
crawl4ai crawler electron mac multithreading python3 scraping sitemaps
Last synced: 06 May 2026
https://github.com/prorobot-ai/worker
A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).
crawler golang grpc-server scraper
Last synced: 29 Jul 2025
https://github.com/eneax/web-crawler
A web crawler built in Node.js
crawler javascript nodejs web-crawler
Last synced: 15 Apr 2026
https://github.com/sauerbraten/monzter
Link crawler with configurable maximum depth and rate limiting
Last synced: 23 May 2026
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 12 Jun 2026
https://github.com/istador/mediaindexer
Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.
Last synced: 03 Jan 2026
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 08 Nov 2025
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 30 Jul 2025
https://github.com/imrany/spindle
An open-source, lightweight web crawler and scraper. It can discover links on the web (crawler) and extract structured data from webpages (scraper).
Last synced: 24 Sep 2025
https://github.com/dappros/site_crawler
Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.
crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing
Last synced: 20 Jan 2026
https://github.com/Mahdijamebozorg/CryptoFundamentalAnalyzer
An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.
crawler crypto cryptocurrency data-mining datamining information-retrieval llm python
Last synced: 25 Sep 2025
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 02 Aug 2025
https://github.com/is0383kk/ai-docs-sync-workflow
claude claude-code crawler github-actions python python-script workflow
Last synced: 16 May 2026
https://github.com/tom-draper/wiki-crawl
A game of path finding through Wikipedia topics.
api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search
Last synced: 09 Mar 2026
https://github.com/seart-group/github-keyword-crawler
A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints
api-mining crawler dockerized github-api miner mongodb-database python-script
Last synced: 04 Aug 2025
https://github.com/marceloneppel/crawler
Simple web crawler developed in Go.
Last synced: 07 Aug 2025
https://github.com/pixlcrashr/stwhh-mensa
Better STWHH Mensa menu data / interface / notifier
api crawler data food studierendenwerk-hamburg university website
Last synced: 07 Aug 2025
https://github.com/iamkushvanth/real-time-data-analysis-using-kafka
In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.
athena aws aws-ec2 aws-s3 crawler glue kafka kafka-consumer python sql
Last synced: 18 Jun 2026
https://github.com/jul10l1r4/objetive
This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*
bigdata crawler data-science security-tools web
Last synced: 12 Aug 2025
https://github.com/casoon/astro-crawler-policy
Policy-first crawler control for Astro — generates robots.txt and llms.txt with presets, per-bot rules, AI crawler registry, and build-time audits.
ai-crawler astro astro-integration crawler llms-txt robots-txt seo typescript
Last synced: 24 May 2026
https://github.com/uinaf/lincrawl
Local-first Linear work-graph archive CLI
age-encryption archive cli crawler crawlkit linear sqlite
Last synced: 24 May 2026
https://github.com/billy0402/tibame-python-data-analysis
A learning project from TibaMe Python data analysis course.
ai course crawler jupyter-notebook matplotlib pandas python requests
Last synced: 10 Apr 2026
https://github.com/hong539/ip_lookup
For ip_lookup with some Public or Private API
Last synced: 19 Aug 2025
https://github.com/luickk/vulnerability-crawler
Small python program meant to analyze random sites found on google for any vulnerabilities!
Last synced: 20 Aug 2025
https://github.com/mohitk05/drstrange
A simple breadth-first search web crawler
Last synced: 22 Aug 2025
https://github.com/leegeunhyeok/python-gongucrawler
파이썬3 공유마당 이미지 및 상세정보 크롤러
Last synced: 24 Aug 2025
https://github.com/kahsolt/qzone_mood_dumper
Dump your qzone mood(说说) history to local SQL database storage
Last synced: 25 Aug 2025
https://github.com/orkan/tlc
Simple PHP/cURL/FlareSolverr framework with Logger, Cache and more!
crawler curl flaresolverr net scrap
Last synced: 27 Aug 2025
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 12 Apr 2026
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 11 Nov 2025
https://github.com/tormol/zenphoto-dl
A script for recursively downloading all pictures from zenphoto-based photo albums.
Last synced: 30 Aug 2025
https://github.com/diegojromerolopez/relwrac
A basic crawler developed with python and asyncio
asyncio crawler page-rank python
Last synced: 11 Nov 2025
https://github.com/chenbingwei1201/threads_scraper
A Python package for scraping Threads posts.
chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites
Last synced: 03 Feb 2026
https://github.com/fusetim/bitcrawler
Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?
Last synced: 30 Mar 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 12 Jun 2026
https://github.com/alphabs/navercafeclient
네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리
crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping
Last synced: 06 May 2026
https://github.com/jiusanzhou/reaper
Distributed Elegant Scraper and Crawler Framework for Rust.
crawler data-scraping rust scraper spider
Last synced: 24 Jul 2025
https://github.com/burakkaygusuz/web-security-scanner
A Java-based web security browser, it detects common web vulnerabilities such as SQL Injection, XSS and sensitive information disclosure.
crawler java vulnerability-scanner web-security xss
Last synced: 16 May 2026
https://github.com/smikodanic/dex8-sdk
DEX8 SDK is software development kit for DEX8.com platform.
crawler crawler-engine data-extraction dex8 scraper scraping-websites spider
Last synced: 11 Jul 2025
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 18 Jun 2026
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 09 Aug 2025
https://github.com/jonesrussell/pipelinex
Firecrawl-style web intelligence pipeline powered by North Cloud
Last synced: 09 Mar 2026
https://github.com/moparisthebest/nginx-limit-crawlers
rate limit crawlers in nginx
Last synced: 14 Mar 2025
https://github.com/engineer2b/cure_crawl
Cure afvalbeheer kalender crawler
afval afvalwijzer browser crawler kalender
Last synced: 22 Oct 2025
https://github.com/patrickschababerle/schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.
crawler puppeteer scraper scraping web-crawler
Last synced: 04 Apr 2025
https://github.com/crosscutsaw/iscsicrawler
iscsicrawler is a bash script that crawls files in the iscsi targets with ease.
crawler iscsi iscsi-target iscsiadm
Last synced: 16 Jan 2026
https://github.com/alphadev3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 26 Dec 2025
https://github.com/tetreum/xupopter_chrome_extension
Extension to easily create crawling recipes
crawler scrapper scrapping webscraper
Last synced: 04 Apr 2025
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 10 Jun 2026
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 22 May 2026
https://github.com/abx123/crawler
Simple lambda function to crawl daily web novel updates.
crawler firebase-database golang lambda-functions
Last synced: 28 Mar 2025
https://github.com/leshniak/robotstxt-debug
A tool for debugging robots.txt
crawler debugger indexing robots-txt seo seo-optimization seo-tools tester
Last synced: 25 Jun 2025
https://github.com/rabattkarte/free-domain-scanner
crawler dns domain domain-name domain-names go golang scanner whois
Last synced: 26 May 2026
https://github.com/sanskar107/c-subject-predictor
Predicts topic of a code.
Last synced: 14 Mar 2025
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 15 May 2025
https://github.com/agucova/needs-seeding
🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.
Last synced: 12 Oct 2025
https://github.com/bockstaller/europarl-crawler
Crawler for the documents published by the European Parliament
crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union
Last synced: 16 May 2026
https://github.com/mohammadreza-mohammadi94/python-webscraper-projects
A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.
bs4 crawler object-oriented-programming python requests scrapy webscraping
Last synced: 13 Jul 2025
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 23 Mar 2025
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 01 Sep 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 29 Apr 2026
https://github.com/jayzhan211/python-crawler-startups
python crawler learning
Last synced: 20 Mar 2025
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 23 Mar 2025
https://github.com/dinofizz/sitemapper
sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.
astradb cassandra concurrency crawler go golang kubernetes nats sitemap
Last synced: 16 Jan 2026