Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/juan-kabbali/glassdoor-linkedin-web-scrapper
CLI application that acts as web scrapper to retrieve Glassdoor and LinkedIn information
Last synced: 29 Jan 2026
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 10 Jul 2025
https://github.com/coghost/crawlers
crawlers in one
crawler python3 staticimg weibo
Last synced: 10 Jul 2025
https://github.com/turtiesocks/zendriver-rs
Async-first, undetectable browser automation in Rust via the Chrome DevTools Protocol. Stealth-by-default port of zendriver — no WebDriver, no JS shim.
anti-detection async automation bot browser-automation cdp chrome-devtools-protocol chromium cloudflare-bypass crawler headless-chrome playwright-alternative rust scraping stealth tokio undetectable-chromedriver web-scraping web-testing zendriver
Last synced: 13 Jun 2026
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 05 Apr 2025
https://github.com/khilnani/spidey.py
Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.
cli crawler python scaper web-spider
Last synced: 25 Mar 2025
https://github.com/noarche/darknoisy
Same as my Noisy but on TOR network. Logs links. Crawls onion sites.
crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks
Last synced: 08 Sep 2025
https://github.com/hamidrabedi/digikala-crawler
a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls
crawler digikala digikala-crawler django python scraper
Last synced: 16 May 2026
https://github.com/arshadkazmi42/gh-crawl
Crawler for Github repositories. Finds all the broken links from the repositories
bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python
Last synced: 20 Jan 2026
https://github.com/camilamaia/crawl4us
[WIP] A Python web crawler looking wildly for tables 🕵️♀️
beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping
Last synced: 28 Mar 2025
https://github.com/devidw/google-untitled-spam-spider
A spam spider which is targeting 'Untitled' spam pages from the Google search results.
crawler crawling crawling-algorithm crawling-python crawling-sites crawling-tool google-untitled python python3 spam spam-detection spammer untitled untitled-spam
Last synced: 28 Mar 2025
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 26 Dec 2025
https://github.com/sreejoy/crawlerfriend
A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.
crawler python-crawler python-scraper python27 scrapper
Last synced: 12 Jun 2025
https://github.com/loggerhead/dianping_crawler
基于 Scrapy (python 3.5) 的大众点评爬虫
Last synced: 14 Feb 2026
https://github.com/godbout/htmlpagedom
jQuery-inspired DOM manipulation extension for Symfony's Crawler
crawler dom html htmlpagedom php symfony
Last synced: 14 Jan 2026
https://github.com/shiritai/wallpaper_master
My first individual project!
crawler file-explorer javafx-application maven-shade mini-system wallpaper wallpaper-master
Last synced: 16 May 2026
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 02 Apr 2025
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 09 Apr 2025
https://github.com/basemax/rondircrawler
A crawler for extracting a list of top sim cards and tel numbers from the Rond.ir website. (PHP)
crawle-php crawler crawler-testing crawlers crawlers-php php php-crawler rondir
Last synced: 03 Apr 2025
https://github.com/gesugao-san/pcgw-crawler
Digital assistant for working hard on PCGW.
bad-code bad-coding-style crawler javascript js nodejs pcgamingwiki pcgw shitty spaghetti-code
Last synced: 12 Apr 2026
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 25 Mar 2025
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 07 Aug 2025
https://github.com/thiagopanini/datadelivery
Um módulo Terraform open source capaz de proporcionar um toolkit completo de infraestrutura para que usuários iniciem suas respectivas jornadas de exploração em serviços de Analytics na AWS.
analytics athena aws catalog crawler data datamesh glue s3 terraform
Last synced: 29 Nov 2025
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 27 Mar 2025
https://github.com/baerwang/sec_craw
一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。
crawler security security-tools threat threat-intelligence
Last synced: 04 Jul 2025
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 05 Sep 2025
https://github.com/phanikmr/linkcrawler
A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.
async crawler linkcrawler parse python scrapy spider
Last synced: 07 Feb 2026
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 11 Jul 2025
https://github.com/maxmindlin/swarm
Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.
Last synced: 04 May 2026
https://github.com/nirjharlo/complete-google-seo-scan
WordPress Plugin with inbuilt SEO crawler
crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin
Last synced: 12 Oct 2025
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 24 Jul 2025
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 08 Apr 2025
https://github.com/0xpr03/clantool
CF Management & Data Analysis Tool, crawler backend in rust
backend-server crawler data-analysis rust
Last synced: 05 Feb 2026
https://github.com/roswelly/solana-transaction-crawler
crawl & parse solana transaction
crawler parser rust solana transaction
Last synced: 15 May 2026
https://github.com/javokhirbek1999/tez-spider
Distributed music scraper built in Go
concurrent crawler distributed-systems music-scraper
Last synced: 17 Jan 2026
https://github.com/injectrl/xhspicextractor
小红书原图提取工具
crawler dotnet7 minimalapi okteto xiaohongshu
Last synced: 20 Jun 2026
https://github.com/captain-woof/zhi-zhu
Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.
crawler crawler-python crawling-python python3
Last synced: 15 Feb 2026
https://github.com/wondervictor/spiderman
2017 Software Course Project
crawler distribute-crawler zhihu-crawler
Last synced: 21 Apr 2026
https://github.com/buren/stupid_crawler
Stupid crawler that looks for URLs on a given site
Last synced: 09 Apr 2025
https://github.com/anyparser/anyparser_core
Anyparser Python SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.
cache-augmented-generation crawler crewai etl-framework etl-pipeline knowledge-graph knowledgebase langchain langgraph llamaindex ms-office n8n ocr openai pdf python rag retrieval-augmented-generation search-engine typescript
Last synced: 05 Oct 2025
https://github.com/thiiagoms/car-stealth
REST API to all cars that were stolen
Last synced: 16 Jun 2025
https://github.com/raphaelm22/crawling
Set of crawlers to find out something on the internet and whether it succeeds, it will send a notification.
caesb crawler growth-suplements gsuplementos
Last synced: 06 Mar 2026
https://github.com/filsuin/linkedin-crawler
A Python tool for automating job searches on LinkedIn based on user-defined keywords.
crawler crawler-python linkedin offer
Last synced: 16 Jun 2025
https://github.com/pnguyen215/instagram-crawler
Instagram Crawler is a Python script to download posts from a specified Instagram account.
crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler
Last synced: 12 Jun 2026
https://github.com/pythoript/pgn-scraper
PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.
7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip
Last synced: 16 Mar 2025
https://github.com/mmqnym/pyppeteer-use-case
Show how to do web crawl via pyppeteer
crawl crawler pyppeteer python
Last synced: 24 Dec 2025
https://github.com/supratikchatterjee16/serp_bot
A generic SERP bot, that can be used with just about any search engine.
bot crawler python requests scraping search serp user-agent-spoofer
Last synced: 14 Dec 2025
https://github.com/dimitar0528/crawlitics
An AI-powered Next.js and Python-based ecommerce web crawler, scraper and data-analyst platform that transforms scattered product data into clear market insights.
crawler nextjs product-analysis python scraper
Last synced: 08 Sep 2025
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 06 Jul 2025
https://github.com/fa7ad/aiub-notes-dl
Download all notes from AIUB's portal
Last synced: 12 Mar 2025
https://github.com/oglinuk/goccer
Go Concurrent Crawler Library
concurrency crawler go library
Last synced: 06 Jul 2025
https://github.com/ambersun1234/lotto_crawler
web crawler for fetching Taiwan lottery history data
Last synced: 15 Jun 2025
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 15 May 2026
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 16 Apr 2026
https://github.com/andmerk93/scrapy_parser_pep
Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах
Last synced: 17 Mar 2025
https://github.com/dangdungcntt/crawl-fb-v2
Simple script to detect email and phone from facebook comment.
Last synced: 26 Apr 2026
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 22 May 2026
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 20 Apr 2026
https://github.com/greycloudss/greave
Greave is a fast, multi-mode scanner for locating sensitive information in both local filesystems and Confluence pages.
armourer confluence crawler python reconnaissance security
Last synced: 07 Oct 2025
https://github.com/yowenter/career-roadmap
Oh, how I hate this living death which has swallowed all my teens, if I am cursed with any, will be worn away!
career crawler findjob job-crawler roadmap search-engine
Last synced: 10 Apr 2025
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 08 Oct 2025
https://github.com/zabuzard/wslotter
WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.
Last synced: 10 Oct 2025
https://github.com/bitscoper/bitscoper_crawler
Crawls the titles of webpages in series by number and creates a list of the available links.
Last synced: 27 Mar 2025
https://github.com/rflcnunes/crawler_email_py
In this project I'm creating a web crawler to check email boxes and handle incoming messages.
aws-bucket aws-bucket-s3 aws-s3 crawler crawler-python email python rabbitmq
Last synced: 10 Aug 2025
https://github.com/mdazlaanzubair/amazon-scraper-api
A web scraper to crawl on amazon to extract products information and return in JSON format.
amazon crawler expressjs json-api nodejs webscraping
Last synced: 14 Apr 2026
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 14 Oct 2025
https://github.com/soulyma/web_crawler
A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.
beautifulsoup4 crawler csv data json python structured-data
Last synced: 15 May 2026
https://github.com/elky84/stock-crawler
Naver Stock Crawler & Mock Invest
asp-net asp-net-core crawler csharp dotnet
Last synced: 18 Apr 2026
https://github.com/dean9703111/humandesign_nodejs
用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel
crawler googlesheetapi googlesheets nodejs
Last synced: 15 May 2026
https://github.com/somehowchris/swisslos-cralwer
(WIP) Crawler to access the current and history numbers of swisslos
crawler euromillions lotto rust swisslos
Last synced: 22 Mar 2025
https://github.com/woorim960/nate.com-comments-crawler
nate.com-comments-crawler
chromedriver crawler python3 selenium
Last synced: 14 May 2026
https://github.com/birdroad1/server-pinger
Server pinger for Minecraft written in C++
cpp crawler make minecraft minecraft-scanner postgres scanner server
Last synced: 14 Apr 2026
https://github.com/bujosa/aldebaran
Example use APP ENGINE with Python3, ThreadPool and webScraping
appengine crawler flask gcp python3 thread-pool
Last synced: 19 Oct 2025
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 20 Oct 2025
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 23 Jan 2026
https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper
Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.
console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox
Last synced: 09 May 2026
https://github.com/kgruiz/stealth-crawler
Asynchronous headless-Chrome web crawler that discovers internal links and optionally saves HTML, Markdown, screenshots, or PDFs. Built for scripting, inspection, and automation.
asyncio cli crawler headless-chrome html-scraper pydoll python web-crawler
Last synced: 25 Oct 2025
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 23 Feb 2026
https://github.com/bigmeech/mangaka
Crawl scanlation websites for manga pages
comic crawler manga scanlation webtoon
Last synced: 23 Jan 2026
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 14 May 2026