Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-30 00:06:54 UTC
- JSON Representation
https://github.com/somehowchris/swisslos-cralwer
(WIP) Crawler to access the current and history numbers of swisslos
crawler euromillions lotto rust swisslos
Last synced: 22 Mar 2025
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 19 Apr 2025
https://github.com/soulyma/web_crawler
A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.
beautifulsoup4 crawler csv data json python structured-data
Last synced: 15 May 2026
https://github.com/elky84/stock-crawler
Naver Stock Crawler & Mock Invest
asp-net asp-net-core crawler csharp dotnet
Last synced: 18 Apr 2026
https://github.com/amirespahbodi/url_crawler
Async Web Crawler for Website Title and Favicon
crawler fastapi pydantic python3 sqlalchemy
Last synced: 15 Apr 2026
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 08 Apr 2025
https://github.com/dean9703111/humandesign_nodejs
用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel
crawler googlesheetapi googlesheets nodejs
Last synced: 15 May 2026
https://github.com/tanja-4732/od-get
A Rust tool for recursively crawling & downloading data from open directories
cli crawler open-directory open-directory-downloader rust
Last synced: 26 May 2026
https://github.com/woorim960/nate.com-comments-crawler
nate.com-comments-crawler
chromedriver crawler python3 selenium
Last synced: 14 May 2026
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 08 Mar 2026
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 21 Mar 2025
https://github.com/srx-2000/swaiter
a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序
crawler selenium selenium-python
Last synced: 18 May 2026
https://github.com/0xpr03/clantool
CF Management & Data Analysis Tool, crawler backend in rust
backend-server crawler data-analysis rust
Last synced: 05 Feb 2026
https://github.com/captain-woof/zhi-zhu
Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.
crawler crawler-python crawling-python python3
Last synced: 15 Feb 2026
https://github.com/win7user10/laraue.crawling
The set of tools for fast writing crawlers on the .NET
crawler csharp csharp-crawler parser
Last synced: 17 Aug 2025
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 09 Feb 2026
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 14 May 2026
https://github.com/mc256/node-static-webpage-crawler
download entire website with its directory structure.
cache-server crawler nodejs static-site
Last synced: 16 Apr 2026
https://github.com/jorgeparavicini/medalytik-python
Python crawlers for a job mediation firm
Last synced: 07 Jul 2025
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 05 Jan 2026
https://github.com/mazzasaverio/lean-jobs-crawler
(Let's build) A lean, high-performance web crawler specializing in job posting extraction directly from company websites. Uses LLM for intelligent URL discovery and data extraction.
crawler docker llm logfire neon openai python uv
Last synced: 15 Mar 2025
https://github.com/xcrypt0r/xcrawler
✂️ A crawling example for maplestory with various languages using multi-threading
crawler crawling multithreading parsing regexp
Last synced: 14 Jun 2025
https://github.com/khdxsohee/email-miner-pro
EMail Miner Pro is designed specifically for professionals scraping data from search engines like Google, ensuring that generic emails (e.g., Gmail, Yahoo) are correctly linked to their business websites found on the page.
chrome crawler crawling email email-extractor extension-chrome lead-generation miner scraper
Last synced: 03 Feb 2026
https://github.com/wafflecomposite/yggdrasil-crawler-python
Small Yggdrasil network crawler with CLI, written in Python3
crawler mesh-networks no-dependencies python python3 yggdrasil yggdrasil-api yggdrasil-network
Last synced: 17 Nov 2025
https://github.com/javokhirbek1999/tez-spider
Distributed music scraper built in Go
concurrent crawler distributed-systems music-scraper
Last synced: 17 Jan 2026
https://github.com/injectrl/xhspicextractor
小红书原图提取工具
crawler dotnet7 minimalapi okteto xiaohongshu
Last synced: 20 Jun 2026
https://github.com/richecr/pyhltv
Repository to extract information from the HLTV website.
crawler csgo hacktoberfest hltv hltv-api python3
Last synced: 21 May 2026
https://github.com/buttermiilk/sentakusha
simple (and badly written express.js) crawler for the washing machine game.
api crawler imagegeneration maimai
Last synced: 07 Apr 2025
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 26 May 2026
https://github.com/nelcifranmagalhaes/web_crawler
A web crawler for all Naruto characters
anime beautifulsoup characters crawler naruto python
Last synced: 14 Jul 2025
https://github.com/vlad1kudelko/2023.08.15-scraping
Crawler of cooking sites
cloudflare cloudflare-bypass crawler docker parsing python scraping selenium undetected-chromedriver
Last synced: 08 Apr 2026
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 20 Jul 2025
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 29 Mar 2025
https://github.com/programming-with-love/skyeyesystem
天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。
crawler mysql redis skyeye skyeyewall springboot
Last synced: 25 Sep 2025
https://github.com/danoctavian/proxy-master
manage a set of http proxies
crawler http-proxy node-proxy-server
Last synced: 27 May 2026
https://github.com/vietdoo/sg-property-hub
SG Property Hub is a comprehensive platform for managing and analyzing property data.
airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping
Last synced: 08 Apr 2026
https://github.com/wondervictor/spiderman
2017 Software Course Project
crawler distribute-crawler zhihu-crawler
Last synced: 21 Apr 2026
https://github.com/khoinguyen2k/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 06 Mar 2025
https://github.com/ipanalytics/crawlerscope
Interactive crawler IP intelligence dashboard for search, AI, and user-triggered fetchers.
ai-bots ai-crawlers bingbot bot-detection cidr crawler crawler-detection data-visualization github-pages googlebot gptbot ip-ranges nginx open-data osint robots-txt threat-intelligence waf web-security
Last synced: 09 Jun 2026
https://github.com/milouk/web-crawler
Phoneutria Crawler
crawler crawlers database internet jar java spider web web-crawler
Last synced: 21 Apr 2026
https://github.com/naveenaidu/google-crawler
Google Crawler - Curates the search results
Last synced: 27 May 2026
https://github.com/basemax/css-properties
The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.
crawler css css-properties css-property css3
Last synced: 11 Jun 2026
https://github.com/flavien-hugs/scrapy-test
Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.
crawler python scraping scrapy
Last synced: 29 Mar 2025
https://github.com/buren/stupid_crawler
Stupid crawler that looks for URLs on a given site
Last synced: 09 Apr 2025
https://github.com/pymarcus/webscrapingiii
Um crawler que pega produtos em uma lista e percorre as páginas do mercado livre selecionando preços, o nome e o link para acessá-los.
crawler mercadolivre python webscraping
Last synced: 15 Sep 2025
https://github.com/anyparser/anyparser_core
Anyparser Python SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.
cache-augmented-generation crawler crewai etl-framework etl-pipeline knowledge-graph knowledgebase langchain langgraph llamaindex ms-office n8n ocr openai pdf python rag retrieval-augmented-generation search-engine typescript
Last synced: 05 Oct 2025
https://github.com/rogerchappel/crawldeck
Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.
agent-tools cli crawler local-first queue typescript
Last synced: 26 May 2026
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 15 May 2026
https://github.com/citiususc/polypus
Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis
analytics bigdata crawler scraper sentiment-analysis twitter
Last synced: 09 Feb 2026
https://github.com/lucasbotang/project_financial_markets_text_mining
Predict stock market movement based on news
crawler data-science natural-language-processing python
Last synced: 21 May 2026
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 24 Jun 2025
https://github.com/rix4uni/pathcrawler
Discover new paths via scanning html.
bug-bounty bugbounty bugbountytips crawler hacking infosec osint osint-resources osint-tool pathcrawler penetration-testing pentest-tool pentesting recon reconnaissance scrape security security-tools threat-intelligence
Last synced: 17 Feb 2026
https://github.com/vitaee/laravelandcrawlers
php web crawler examples with oop concept and laravel project
Last synced: 25 Apr 2026
https://github.com/duaraghav8/larry-crawler
Kayako Twitter challenge
crawler fetch-tweets hashtag nodejs pagination tweets twitter-api
Last synced: 17 May 2026
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 17 May 2026
https://github.com/pavelsr/email-extractor
Fast email crawler
crawler email-crawler email-marketing perl telemarketing
Last synced: 18 Mar 2025
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 01 Sep 2025
https://github.com/beanwei/zmt-post-crawler
Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend
Last synced: 08 Nov 2025
https://github.com/khilnani/spidey.py
Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.
cli crawler python scaper web-spider
Last synced: 25 Mar 2025
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 05 Apr 2026
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 03 Apr 2025
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 05 Oct 2025
https://github.com/toannd96/chromedp-example-login
chromedp crawler golang goquery
Last synced: 21 May 2026
https://github.com/pnguyen215/instagram-crawler
Instagram Crawler is a Python script to download posts from a specified Instagram account.
crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler
Last synced: 12 Jun 2026
https://github.com/taurusolson/jobscraper
Je cherche un poste de développeur en France
Last synced: 23 Jun 2025
https://github.com/adamfisher/scrapyrt.client
A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider
Last synced: 21 Mar 2025
https://github.com/andrew-ld/wowroms-downloader
download all roms from wowroms
aiohttp asyncio crawler python3
Last synced: 17 Jan 2026
https://github.com/pythoript/pgn-scraper
PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.
7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip
Last synced: 16 Mar 2025
https://github.com/mmqnym/pyppeteer-use-case
Show how to do web crawl via pyppeteer
crawl crawler pyppeteer python
Last synced: 24 Dec 2025
https://github.com/sirius-mhlee/naver-cafe-crawler
NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
beautifulsoup4 crawler pandas selenium tqdm
Last synced: 09 Mar 2026
https://github.com/bytejoseph/osintgit
OSINT investigation tool for Github
crawler email github github-to-email hacking hacking-tool hacktoberfest hacktoberfest2024 latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp
Last synced: 23 Jul 2025
https://github.com/noarche/darknoisy
Same as my Noisy but on TOR network. Logs links. Crawls onion sites.
crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks
Last synced: 08 Sep 2025
https://github.com/rebrowser/autotrader-dataset
AutoTrader car listings database: new, used & CPO vehicles with make, model, trim, mileage, MSRP, KBB fair price range, deal rating, body style, fuel type, and seller state. Updated daily.
automotive autotrader car-listings car-prices crawler data-collection data-science dataset kbb open-data scraper used-cars vehicle-data web-scraping
Last synced: 03 May 2026
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 29 Jun 2026
https://github.com/iamgideonidoko/web-crawler-with-php
Sample implementation of web crawler in PHP
Last synced: 21 Mar 2025
https://github.com/shimech/pokemon-db-maker
Webクローリングでポケモン図鑑を生成
beautifulsoup crawler docker pokemon scraper
Last synced: 25 Jan 2026
https://github.com/tasooshi/digslash
A site mapping and enumeration tool for Web applications analysis
crawler mapping sitemap spider
Last synced: 08 Apr 2026
https://github.com/ghost---shadow/feature-extractor-from-codebase
Copies the target java file and all its dependencies recursively to another directory
Last synced: 22 Sep 2025
https://github.com/leandrols/scliper
CLI Tool to make simple web scraping.
cli-scripts crawler golang scraping
Last synced: 01 Nov 2025
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 20 Nov 2025
https://github.com/ptthanh02/vietnam-news-crawler
crawler crawling-python newspaper text-data text-mining
Last synced: 11 Aug 2025
https://github.com/ryanking13/bellorin
Multi-threaded Social Media Crawler 🔍
crawler instagram social-media
Last synced: 29 Jun 2025
https://github.com/liyun-li/meh-bot
Just a bot that clicks an image
bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api
Last synced: 20 Mar 2025
https://github.com/krishealty/whoknows
All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.
bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance
Last synced: 11 Apr 2026