Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/amirzenoozi/aparat-videos-dataset
Some Simple Information About Aparat Videos for DataScientists
aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video
Last synced: 17 May 2026
https://github.com/sinipelto/repo-license-crawler
Collects and summarizes license information on Python and NPM packages into output files.
crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3
Last synced: 09 May 2026
https://github.com/ipanalytics/crawlerscope
Interactive crawler IP intelligence dashboard for search, AI, and user-triggered fetchers.
ai-bots ai-crawlers bingbot bot-detection cidr crawler crawler-detection data-visualization github-pages googlebot gptbot ip-ranges nginx open-data osint robots-txt threat-intelligence waf web-security
Last synced: 09 Jun 2026
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 22 Jun 2026
https://github.com/jovijovi/ether-crawler
A transaction crawler for the Ethereum ecosystem.
blockchain crawler ether ethereum transaction
Last synced: 08 May 2026
https://github.com/vitaee/laravelandcrawlers
php web crawler examples with oop concept and laravel project
Last synced: 25 Apr 2026
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 04 Jan 2026
https://github.com/rogerchappel/crawldeck
Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.
agent-tools cli crawler local-first queue typescript
Last synced: 26 May 2026
https://github.com/basemax/css-properties
The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.
crawler css css-properties css-property css3
Last synced: 11 Jun 2026
https://github.com/milouk/web-crawler
Phoneutria Crawler
crawler crawlers database internet jar java spider web web-crawler
Last synced: 21 Apr 2026
https://github.com/khoinguyen2k/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 06 Mar 2025
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 26 May 2026
https://github.com/buttermiilk/sentakusha
simple (and badly written express.js) crawler for the washing machine game.
api crawler imagegeneration maimai
Last synced: 07 Apr 2025
https://github.com/xcrypt0r/xcrawler
✂️ A crawling example for maplestory with various languages using multi-threading
crawler crawling multithreading parsing regexp
Last synced: 14 Jun 2025
https://github.com/mazzasaverio/lean-jobs-crawler
(Let's build) A lean, high-performance web crawler specializing in job posting extraction directly from company websites. Uses LLM for intelligent URL discovery and data extraction.
crawler docker llm logfire neon openai python uv
Last synced: 15 Mar 2025
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 05 Jan 2026
https://github.com/wafflecomposite/yggdrasil-crawler-python
Small Yggdrasil network crawler with CLI, written in Python3
crawler mesh-networks no-dependencies python python3 yggdrasil yggdrasil-api yggdrasil-network
Last synced: 17 Nov 2025
https://github.com/danoctavian/proxy-master
manage a set of http proxies
crawler http-proxy node-proxy-server
Last synced: 27 May 2026
https://github.com/naveenaidu/google-crawler
Google Crawler - Curates the search results
Last synced: 27 May 2026
https://github.com/taurusolson/jobscraper
Je cherche un poste de développeur en France
Last synced: 23 Jun 2025
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 07 Apr 2025
https://github.com/bac0id/wayback-machine-auto-save
A crawler to save web pages on list to Save Page Now of Internet Archive's Wayback Machine.
crawler internet-archive python save-page-now wayback-machine
Last synced: 28 May 2026
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 20 Jul 2025
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 15 May 2026
https://github.com/tanja-4732/od-get
A Rust tool for recursively crawling & downloading data from open directories
cli crawler open-directory open-directory-downloader rust
Last synced: 26 May 2026
https://github.com/liyun-li/meh-bot
Just a bot that clicks an image
bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api
Last synced: 20 Mar 2025
https://github.com/fenying/huaban-crawler
A board-pins crawler for huaban.com, base on Node.js
Last synced: 02 Jul 2025
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 19 Apr 2025
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 20 Nov 2025
https://github.com/henkman/crawlers
:squirrel: some crawlers and downloaders
Last synced: 28 May 2026
https://github.com/homuchen/instagram-crawler
Instagram crawler
crawler instagram nodejs-crawler
Last synced: 24 Mar 2025
https://github.com/hudson-newey/user-web-crawler
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
Last synced: 27 Feb 2025
https://github.com/vindecodex/automated-crawler-wget
Using wget to crawl site
Last synced: 03 Sep 2025
https://github.com/juan-kabbali/glassdoor-linkedin-web-scrapper
CLI application that acts as web scrapper to retrieve Glassdoor and LinkedIn information
Last synced: 29 Jan 2026
https://github.com/khilnani/spidey.py
Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.
cli crawler python scaper web-spider
Last synced: 25 Mar 2025
https://github.com/noarche/darknoisy
Same as my Noisy but on TOR network. Logs links. Crawls onion sites.
crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks
Last synced: 08 Sep 2025
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 26 Dec 2025
https://github.com/sreejoy/crawlerfriend
A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.
crawler python-crawler python-scraper python27 scrapper
Last synced: 12 Jun 2025
https://github.com/loggerhead/dianping_crawler
基于 Scrapy (python 3.5) 的大众点评爬虫
Last synced: 14 Feb 2026
https://github.com/godbout/htmlpagedom
jQuery-inspired DOM manipulation extension for Symfony's Crawler
crawler dom html htmlpagedom php symfony
Last synced: 14 Jan 2026
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 02 Apr 2025
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 09 Apr 2025
https://github.com/basemax/rondircrawler
A crawler for extracting a list of top sim cards and tel numbers from the Rond.ir website. (PHP)
crawle-php crawler crawler-testing crawlers crawlers-php php php-crawler rondir
Last synced: 03 Apr 2025
https://github.com/gesugao-san/pcgw-crawler
Digital assistant for working hard on PCGW.
bad-code bad-coding-style crawler javascript js nodejs pcgamingwiki pcgw shitty spaghetti-code
Last synced: 12 Apr 2026
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 25 Mar 2025
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 07 Aug 2025
https://github.com/thiagopanini/datadelivery
Um módulo Terraform open source capaz de proporcionar um toolkit completo de infraestrutura para que usuários iniciem suas respectivas jornadas de exploração em serviços de Analytics na AWS.
analytics athena aws catalog crawler data datamesh glue s3 terraform
Last synced: 29 Nov 2025
https://github.com/baerwang/sec_craw
一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。
crawler security security-tools threat threat-intelligence
Last synced: 04 Jul 2025
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 05 Sep 2025
https://github.com/phanikmr/linkcrawler
A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.
async crawler linkcrawler parse python scrapy spider
Last synced: 07 Feb 2026
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 11 Jul 2025
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 24 Jul 2025
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 08 Apr 2025
https://github.com/0xpr03/clantool
CF Management & Data Analysis Tool, crawler backend in rust
backend-server crawler data-analysis rust
Last synced: 05 Feb 2026
https://github.com/javokhirbek1999/tez-spider
Distributed music scraper built in Go
concurrent crawler distributed-systems music-scraper
Last synced: 17 Jan 2026
https://github.com/injectrl/xhspicextractor
小红书原图提取工具
crawler dotnet7 minimalapi okteto xiaohongshu
Last synced: 20 Jun 2026
https://github.com/wondervictor/spiderman
2017 Software Course Project
crawler distribute-crawler zhihu-crawler
Last synced: 21 Apr 2026
https://github.com/buren/stupid_crawler
Stupid crawler that looks for URLs on a given site
Last synced: 09 Apr 2025
https://github.com/anyparser/anyparser_core
Anyparser Python SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.
cache-augmented-generation crawler crewai etl-framework etl-pipeline knowledge-graph knowledgebase langchain langgraph llamaindex ms-office n8n ocr openai pdf python rag retrieval-augmented-generation search-engine typescript
Last synced: 05 Oct 2025
https://github.com/pnguyen215/instagram-crawler
Instagram Crawler is a Python script to download posts from a specified Instagram account.
crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler
Last synced: 12 Jun 2026
https://github.com/pythoript/pgn-scraper
PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.
7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip
Last synced: 16 Mar 2025
https://github.com/mmqnym/pyppeteer-use-case
Show how to do web crawl via pyppeteer
crawl crawler pyppeteer python
Last synced: 24 Dec 2025
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 06 Jul 2025
https://github.com/oglinuk/goccer
Go Concurrent Crawler Library
concurrency crawler go library
Last synced: 06 Jul 2025
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 16 Apr 2026
https://github.com/andmerk93/scrapy_parser_pep
Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах
Last synced: 17 Mar 2025
https://github.com/dangdungcntt/crawl-fb-v2
Simple script to detect email and phone from facebook comment.
Last synced: 26 Apr 2026
https://github.com/greycloudss/greave
Greave is a fast, multi-mode scanner for locating sensitive information in both local filesystems and Confluence pages.
armourer confluence crawler python reconnaissance security
Last synced: 07 Oct 2025
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 08 Oct 2025
https://github.com/zabuzard/wslotter
WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.
Last synced: 10 Oct 2025
https://github.com/mdazlaanzubair/amazon-scraper-api
A web scraper to crawl on amazon to extract products information and return in JSON format.
amazon crawler expressjs json-api nodejs webscraping
Last synced: 14 Apr 2026
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 14 Oct 2025
https://github.com/basemax/my-site-url-finders
A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.
crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder
Last synced: 15 Oct 2025
https://github.com/birdroad1/server-pinger
Server pinger for Minecraft written in C++
cpp crawler make minecraft minecraft-scanner postgres scanner server
Last synced: 14 Apr 2026
https://github.com/bujosa/aldebaran
Example use APP ENGINE with Python3, ThreadPool and webScraping
appengine crawler flask gcp python3 thread-pool
Last synced: 19 Oct 2025
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 20 Oct 2025
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 23 Jan 2026
https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper
Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.
console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox
Last synced: 09 May 2026
https://github.com/kgruiz/stealth-crawler
Asynchronous headless-Chrome web crawler that discovers internal links and optionally saves HTML, Markdown, screenshots, or PDFs. Built for scripting, inspection, and automation.
asyncio cli crawler headless-chrome html-scraper pydoll python web-crawler
Last synced: 25 Oct 2025
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 23 Feb 2026