Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-30 00:06:54 UTC
- JSON Representation
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 06 Jul 2025
https://github.com/muhfalihr/pyxdtelebot
PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.
crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x
Last synced: 06 Apr 2025
https://github.com/uranusx86/dcard-crawler-analyzer
get Dcard & Meteor forum content and analyze !
crawl crawler dcard nlp python word-cloud word-count word-frequency
Last synced: 14 Jul 2025
https://github.com/tibiasolutions/sharp-parser
Tibia.com parser informations in C#
crawler nuget parsed-data tibia tibia-parser
Last synced: 17 May 2026
https://github.com/matheusfelipeog/google-doodles
Mapeie e faça download dos Doodles do Google.
crawler google google-doodle python web-scraping
Last synced: 13 Jul 2025
https://github.com/panakour/pkscraper
Extract structured data from the web
crawler crawling scraper scraping scraping-websites webcrawler
Last synced: 19 Feb 2026
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 26 Dec 2025
https://github.com/elektrostudios/bt4g-torrent-magnet-scraper
Scrapes BT4G magnet links using configurable search and filtering rules.
bt4g command-line console-applications crawler dotnet magnet magnet-link scraper scraping searchengine torrent torrents vbnet web-crawler web-spider webcrawler webspider windows windows-10 windows-app
Last synced: 24 Jun 2026
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 11 Jul 2025
https://github.com/sreejoy/crawlerfriend
A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.
crawler python-crawler python-scraper python27 scrapper
Last synced: 12 Jun 2025
https://github.com/oglinuk/goccer
Go Concurrent Crawler Library
concurrency crawler go library
Last synced: 06 Jul 2025
https://github.com/purrproof/smartcrawl
An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.
blockchain cli crawler explorer framework go golang hacktoberfest
Last synced: 16 May 2026
https://github.com/henkman/crawlers
:squirrel: some crawlers and downloaders
Last synced: 28 May 2026
https://github.com/sinipelto/repo-license-crawler
Collects and summarizes license information on Python and NPM packages into output files.
crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3
Last synced: 09 May 2026
https://github.com/orsinium-labs/gpcc
Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)
Last synced: 19 Jun 2025
https://github.com/mindfiredigital/deepscanbot
It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.
bot crawl crawler go golang google webcrawler
Last synced: 10 Aug 2025
https://github.com/sudolife/shopify
An easy-to-use crawler to keep track of reviews of an app on Shopify.
Last synced: 16 May 2026
https://github.com/mlibre/clean-web-scraper
A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖
ai artificial-intelligence clean crawler data-preprocessing dataset fine-tuning llm recursive-crawling scraper training
Last synced: 17 Mar 2025
https://github.com/loggerhead/dianping_crawler
基于 Scrapy (python 3.5) 的大众点评爬虫
Last synced: 14 Feb 2026
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 27 Apr 2026
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 31 Jul 2025
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 16 Apr 2026
https://github.com/andmerk93/scrapy_parser_pep
Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах
Last synced: 17 Mar 2025
https://github.com/dangdungcntt/crawl-fb-v2
Simple script to detect email and phone from facebook comment.
Last synced: 26 Apr 2026
https://github.com/jfcherng/wiki-cgroup-crawler
此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。
crawler php-71 wiki-cgroup-crawler wikipedia
Last synced: 03 Oct 2025
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 10 Jul 2025
https://github.com/godbout/htmlpagedom
jQuery-inspired DOM manipulation extension for Symfony's Crawler
crawler dom html htmlpagedom php symfony
Last synced: 14 Jan 2026
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 22 Mar 2025
https://github.com/coghost/crawlers
crawlers in one
crawler python3 staticimg weibo
Last synced: 10 Jul 2025
https://github.com/turtiesocks/zendriver-rs
Async-first, undetectable browser automation in Rust via the Chrome DevTools Protocol. Stealth-by-default port of zendriver — no WebDriver, no JS shim.
anti-detection async automation bot browser-automation cdp chrome-devtools-protocol chromium cloudflare-bypass crawler headless-chrome playwright-alternative rust scraping stealth tokio undetectable-chromedriver web-scraping web-testing zendriver
Last synced: 13 Jun 2026
https://github.com/homuchen/instagram-crawler
Instagram crawler
crawler instagram nodejs-crawler
Last synced: 24 Mar 2025
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 05 Apr 2025
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 20 Apr 2026
https://github.com/eea/eea-crawler
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
airflow-dags crawler elasticsearch etl-pipeline indexing
Last synced: 20 May 2026
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 02 Apr 2025
https://github.com/amirzenoozi/aparat-videos-dataset
Some Simple Information About Aparat Videos for DataScientists
aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video
Last synced: 17 May 2026
https://github.com/cseas/shares-monitor
Web crawler to fetch and monitor shares details.
crawler python python3 scraper scraping-websites shares
Last synced: 27 Jul 2025
https://github.com/hamidrabedi/digikala-crawler
a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls
crawler digikala digikala-crawler django python scraper
Last synced: 16 May 2026
https://github.com/arshadkazmi42/gh-crawl
Crawler for Github repositories. Finds all the broken links from the repositories
bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python
Last synced: 20 Jan 2026
https://github.com/camilamaia/crawl4us
[WIP] A Python web crawler looking wildly for tables 🕵️♀️
beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping
Last synced: 28 Mar 2025
https://github.com/greycloudss/greave
Greave is a fast, multi-mode scanner for locating sensitive information in both local filesystems and Confluence pages.
armourer confluence crawler python reconnaissance security
Last synced: 07 Oct 2025
https://github.com/devidw/google-untitled-spam-spider
A spam spider which is targeting 'Untitled' spam pages from the Google search results.
crawler crawling crawling-algorithm crawling-python crawling-sites crawling-tool google-untitled python python3 spam spam-detection spammer untitled untitled-spam
Last synced: 28 Mar 2025
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 09 Apr 2025
https://github.com/shiritai/wallpaper_master
My first individual project!
crawler file-explorer javafx-application maven-shade mini-system wallpaper wallpaper-master
Last synced: 16 May 2026
https://github.com/dhsagaryt/multisearch
Search efficiently across different platforms with ease. Type your query and choose from multiple search engines, streamlining your experience.
browser crawler internet search search-algorithm search-engine searchbar searchengine webcrawler
Last synced: 14 Feb 2026
https://github.com/arghyadipchak/craww
Gemini (protocol) crawler written in Rust
crawler gemini gemini-protocol rust
Last synced: 15 Jun 2026
https://github.com/willi-dev/dtcapp
dtcapp : distributed twitter crawler.
crawler distributed-systems hazelcast java twitter twitter-api
Last synced: 18 Sep 2025
https://github.com/tsoliangwu0130/ptt-search
A simple Python script to fetch PTT post from the command line.
Last synced: 08 Aug 2025
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 08 Oct 2025
https://github.com/basemax/rondircrawler
A crawler for extracting a list of top sim cards and tel numbers from the Rond.ir website. (PHP)
crawle-php crawler crawler-testing crawlers crawlers-php php php-crawler rondir
Last synced: 03 Apr 2025
https://github.com/hudson-newey/user-web-crawler
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
Last synced: 27 Feb 2025
https://github.com/udaykiran2017/seo-reports
📊 Generate and analyze SEO reports effortlessly to enhance your website's visibility and performance across search engines.
audit broken-links cli crawler extraction google-lighthouse hreflang-checker hreflang-matrix puppeteer scan-website searchengineoptimization seo seo-macroscope seo-manager seo-meta seo-optimization web-scraping webmaster
Last synced: 16 May 2026
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 27 Mar 2025
https://github.com/gesugao-san/pcgw-crawler
Digital assistant for working hard on PCGW.
bad-code bad-coding-style crawler javascript js nodejs pcgamingwiki pcgw shitty spaghetti-code
Last synced: 12 Apr 2026
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 25 Mar 2025
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 07 Aug 2025
https://github.com/thiagopanini/datadelivery
Um módulo Terraform open source capaz de proporcionar um toolkit completo de infraestrutura para que usuários iniciem suas respectivas jornadas de exploração em serviços de Analytics na AWS.
analytics athena aws catalog crawler data datamesh glue s3 terraform
Last synced: 29 Nov 2025
https://github.com/maxmindlin/swarm
Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.
Last synced: 04 May 2026
https://github.com/yordadev/fenrisjs
A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.
analysis crawler link-collection link-crawler nodejs nodejs-application
Last synced: 10 May 2026
https://github.com/baerwang/sec_craw
一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。
crawler security security-tools threat threat-intelligence
Last synced: 04 Jul 2025
https://github.com/zabuzard/wslotter
WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.
Last synced: 10 Oct 2025
https://github.com/jovijovi/ether-crawler
A transaction crawler for the Ethereum ecosystem.
blockchain crawler ether ethereum transaction
Last synced: 08 May 2026
https://github.com/nirjharlo/complete-google-seo-scan
WordPress Plugin with inbuilt SEO crawler
crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin
Last synced: 12 Oct 2025
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 07 Apr 2025
https://github.com/basemax/doostihaacrawler
A PHP-implemented crawler for Doostihaa.com. (Database of thousands of movies)
crawler crawler-example crawler-php crawler-testing crawlers database-movie database-movies doostihaa doostihaa-com movie movie-database movie-database-api movie-database-website movie-db movies movies-database php php-crawler
Last synced: 13 Sep 2025
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 05 Sep 2025
https://github.com/mdazlaanzubair/amazon-scraper-api
A web scraper to crawl on amazon to extract products information and return in JSON format.
amazon crawler expressjs json-api nodejs webscraping
Last synced: 14 Apr 2026
https://github.com/dineshsprabu/concurrent-web-crawler
Flexible and concurrent web crawler implemented in 'go'
concurrent-web-crawler crawler go-crawler spider web-crawler
Last synced: 12 Jan 2026
https://github.com/bac0id/wayback-machine-auto-save
A crawler to save web pages on list to Save Page Now of Internet Archive's Wayback Machine.
crawler internet-archive python save-page-now wayback-machine
Last synced: 28 May 2026
https://github.com/roswelly/solana-transaction-crawler
crawl & parse solana transaction
crawler parser rust solana transaction
Last synced: 15 May 2026
https://github.com/im-perativa/public_crawler
A collection of crawler project for Indonesia dataset
crawler indonesia indonesia-api scrapy
Last synced: 20 Mar 2025
https://github.com/phanikmr/linkcrawler
A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.
async crawler linkcrawler parse python scrapy spider
Last synced: 07 Feb 2026
https://github.com/fenying/huaban-crawler
A board-pins crawler for huaban.com, base on Node.js
Last synced: 02 Jul 2025
https://github.com/thiiagoms/car-stealth
REST API to all cars that were stolen
Last synced: 16 Jun 2025
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 11 Jul 2025
https://github.com/vindecodex/automated-crawler-wget
Using wget to crawl site
Last synced: 03 Sep 2025
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 14 Oct 2025
https://github.com/raphaelm22/crawling
Set of crawlers to find out something on the internet and whether it succeeds, it will send a notification.
caesb crawler growth-suplements gsuplementos
Last synced: 06 Mar 2026
https://github.com/filsuin/linkedin-crawler
A Python tool for automating job searches on LinkedIn based on user-defined keywords.
crawler crawler-python linkedin offer
Last synced: 16 Jun 2025