Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-22 00:06:47 UTC
- JSON Representation
https://github.com/mindfiredigital/deepscanbot
It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.
bot crawl crawler go golang google webcrawler
Last synced: 10 Aug 2025
https://github.com/ptthanh02/vietnam-news-crawler
crawler crawling-python newspaper text-data text-mining
Last synced: 11 Aug 2025
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 05 Oct 2025
https://github.com/win7user10/laraue.crawling
The set of tools for fast writing crawlers on the .NET
crawler csharp csharp-crawler parser
Last synced: 17 Aug 2025
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 31 Jul 2025
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 22 Mar 2025
https://github.com/eea/eea-crawler
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
airflow-dags crawler elasticsearch etl-pipeline indexing
Last synced: 20 May 2026
https://github.com/arghyadipchak/craww
Gemini (protocol) crawler written in Rust
crawler gemini gemini-protocol rust
Last synced: 15 Jun 2026
https://github.com/amirzenoozi/aparat-videos-dataset
Some Simple Information About Aparat Videos for DataScientists
aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video
Last synced: 17 May 2026
https://github.com/sinipelto/repo-license-crawler
Collects and summarizes license information on Python and NPM packages into output files.
crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3
Last synced: 09 May 2026
https://github.com/ipanalytics/crawlerscope
Interactive crawler IP intelligence dashboard for search, AI, and user-triggered fetchers.
ai-bots ai-crawlers bingbot bot-detection cidr crawler crawler-detection data-visualization github-pages googlebot gptbot ip-ranges nginx open-data osint robots-txt threat-intelligence waf web-security
Last synced: 09 Jun 2026
https://github.com/yordadev/fenrisjs
A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.
analysis crawler link-collection link-crawler nodejs nodejs-application
Last synced: 10 May 2026
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 22 Jun 2026
https://github.com/basemax/doostihaacrawler
A PHP-implemented crawler for Doostihaa.com. (Database of thousands of movies)
crawler crawler-example crawler-php crawler-testing crawlers database-movie database-movies doostihaa doostihaa-com movie movie-database movie-database-api movie-database-website movie-db movies movies-database php php-crawler
Last synced: 13 Sep 2025
https://github.com/jovijovi/ether-crawler
A transaction crawler for the Ethereum ecosystem.
blockchain crawler ether ethereum transaction
Last synced: 08 May 2026
https://github.com/dineshsprabu/concurrent-web-crawler
Flexible and concurrent web crawler implemented in 'go'
concurrent-web-crawler crawler go-crawler spider web-crawler
Last synced: 12 Jan 2026
https://github.com/khadkarajesh/aptoide
Aptoide app crawler using beautifulsoup
beautifulsoup4 crawler flask python3 web-application
Last synced: 19 May 2026
https://github.com/vitaee/laravelandcrawlers
php web crawler examples with oop concept and laravel project
Last synced: 25 Apr 2026
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 04 Jan 2026
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 21 Mar 2025
https://github.com/igorbrizack/web-scraper
Aplicação de raspagem de dados HTML, construída em python.
crawler pytest python3 scraper
Last synced: 08 May 2026
https://github.com/rogerchappel/crawldeck
Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.
agent-tools cli crawler local-first queue typescript
Last synced: 26 May 2026
https://github.com/hong539/acgbox_crawler
An web-crawler for gamer.com.tw/acgbox
beautifulsoup4 crawler pandas python requests scrapy sqlalchemy web-crawler
Last synced: 05 Apr 2025
https://github.com/maxiroellplenty/gs-robot
NodeJs tool to scrap gelbe-seiten
axios cheerio crawler gelbe-seiten nodejs scraper yargs
Last synced: 18 May 2026
https://github.com/basemax/kashan-university-phone-directory
This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.
crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan
Last synced: 18 May 2026
https://github.com/basemax/css-properties
The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.
crawler css css-properties css-property css3
Last synced: 11 Jun 2026
https://github.com/ilsonlasmar/inovamind
Desafio Inovamind - Crawler em Ruby on Rails com Sidekiq + Redis
Last synced: 12 Sep 2025
https://github.com/milouk/web-crawler
Phoneutria Crawler
crawler crawlers database internet jar java spider web web-crawler
Last synced: 21 Apr 2026
https://github.com/khoinguyen2k/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 06 Mar 2025
https://github.com/morungos/github-issue-crawler
Github crawler for public repositories, issues, and comments
Last synced: 30 Apr 2026
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 18 May 2026
https://github.com/zhs007/lottery-crawler
基于jarvis-task的爬虫,主要用来爬取lottery数据。
Last synced: 30 Oct 2025
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 21 Mar 2025
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 26 May 2026
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 18 May 2026
https://github.com/buttermiilk/sentakusha
simple (and badly written express.js) crawler for the washing machine game.
api crawler imagegeneration maimai
Last synced: 07 Apr 2025
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 21 Mar 2025
https://github.com/srx-2000/swaiter
a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序
crawler selenium selenium-python
Last synced: 18 May 2026
https://github.com/richecr/pyhltv
Repository to extract information from the HLTV website.
crawler csgo hacktoberfest hltv hltv-api python3
Last synced: 21 May 2026
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 29 Mar 2025
https://github.com/xcrypt0r/xcrawler
✂️ A crawling example for maplestory with various languages using multi-threading
crawler crawling multithreading parsing regexp
Last synced: 14 Jun 2025
https://github.com/mazzasaverio/lean-jobs-crawler
(Let's build) A lean, high-performance web crawler specializing in job posting extraction directly from company websites. Uses LLM for intelligent URL discovery and data extraction.
crawler docker llm logfire neon openai python uv
Last synced: 15 Mar 2025
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 05 Jan 2026
https://github.com/wafflecomposite/yggdrasil-crawler-python
Small Yggdrasil network crawler with CLI, written in Python3
crawler mesh-networks no-dependencies python python3 yggdrasil yggdrasil-api yggdrasil-network
Last synced: 17 Nov 2025
https://github.com/flavien-hugs/scrapy-test
Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.
crawler python scraping scrapy
Last synced: 29 Mar 2025
https://github.com/danoctavian/proxy-master
manage a set of http proxies
crawler http-proxy node-proxy-server
Last synced: 27 May 2026
https://github.com/naveenaidu/google-crawler
Google Crawler - Curates the search results
Last synced: 27 May 2026
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 24 Jun 2025
https://github.com/taurusolson/jobscraper
Je cherche un poste de développeur en France
Last synced: 23 Jun 2025
https://github.com/rix4uni/pathcrawler
Discover new paths via scanning html.
bug-bounty bugbounty bugbountytips crawler hacking infosec osint osint-resources osint-tool pathcrawler penetration-testing pentest-tool pentesting recon reconnaissance scrape security security-tools threat-intelligence
Last synced: 17 Feb 2026
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 07 Apr 2025
https://github.com/duaraghav8/larry-crawler
Kayako Twitter challenge
crawler fetch-tweets hashtag nodejs pagination tweets twitter-api
Last synced: 17 May 2026
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 17 May 2026
https://github.com/pavelsr/email-extractor
Fast email crawler
crawler email-crawler email-marketing perl telemarketing
Last synced: 18 Mar 2025
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 01 Sep 2025
https://github.com/bac0id/wayback-machine-auto-save
A crawler to save web pages on list to Save Page Now of Internet Archive's Wayback Machine.
crawler internet-archive python save-page-now wayback-machine
Last synced: 28 May 2026
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 20 Jul 2025
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 15 May 2026
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 05 Apr 2026
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 03 Apr 2025
https://github.com/tanja-4732/od-get
A Rust tool for recursively crawling & downloading data from open directories
cli crawler open-directory open-directory-downloader rust
Last synced: 26 May 2026
https://github.com/adamfisher/scrapyrt.client
A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider
Last synced: 21 Mar 2025
https://github.com/liyun-li/meh-bot
Just a bot that clicks an image
bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api
Last synced: 20 Mar 2025
https://github.com/sirius-mhlee/naver-cafe-crawler
NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
beautifulsoup4 crawler pandas selenium tqdm
Last synced: 09 Mar 2026
https://github.com/bytejoseph/osintgit
OSINT investigation tool for Github
crawler email github github-to-email hacking hacking-tool hacktoberfest hacktoberfest2024 latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp
Last synced: 23 Jul 2025
https://github.com/iamgideonidoko/web-crawler-with-php
Sample implementation of web crawler in PHP
Last synced: 21 Mar 2025
https://github.com/fenying/huaban-crawler
A board-pins crawler for huaban.com, base on Node.js
Last synced: 02 Jul 2025
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 19 Apr 2025
https://github.com/uranusx86/dcard-crawler-analyzer
get Dcard & Meteor forum content and analyze !
crawl crawler dcard nlp python word-cloud word-count word-frequency
Last synced: 14 Jul 2025
https://github.com/tibiasolutions/sharp-parser
Tibia.com parser informations in C#
crawler nuget parsed-data tibia tibia-parser
Last synced: 17 May 2026
https://github.com/matheusfelipeog/google-doodles
Mapeie e faça download dos Doodles do Google.
crawler google google-doodle python web-scraping
Last synced: 13 Jul 2025
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 20 Nov 2025
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 11 Jul 2025
https://github.com/purrproof/smartcrawl
An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.
blockchain cli crawler explorer framework go golang hacktoberfest
Last synced: 16 May 2026
https://github.com/henkman/crawlers
:squirrel: some crawlers and downloaders
Last synced: 28 May 2026
https://github.com/homuchen/instagram-crawler
Instagram crawler
crawler instagram nodejs-crawler
Last synced: 24 Mar 2025
https://github.com/orsinium-labs/gpcc
Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)
Last synced: 19 Jun 2025
https://github.com/hudson-newey/user-web-crawler
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
Last synced: 27 Feb 2025
https://github.com/sudolife/shopify
An easy-to-use crawler to keep track of reviews of an app on Shopify.
Last synced: 16 May 2026
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 27 Apr 2026
https://github.com/vindecodex/automated-crawler-wget
Using wget to crawl site
Last synced: 03 Sep 2025