Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-30 00:06:54 UTC
- JSON Representation
https://github.com/ma-pony/deepspider
智能爬虫工程平台 - 基于 DeepAgents + Patchright 的 AI 爬虫 Agent | Intelligent Web Scraping Platform - AI-powered Crawler Agent built on DeepAgents + Patchright
ai-agent anti-detect automation captcha crawler javascript reverse-engineering web-scraping
Last synced: 03 Apr 2026
https://github.com/949886/pixiv-crawler
Pixiv illustration info crawler to local MySQL database.
Last synced: 17 Apr 2026
https://github.com/nb3n/sitemap-indexer
CLI tool for generating and managing sitemap index files for large websites
crawler indexing python search-engine-optimization seo sitemap sitemap-generator
Last synced: 26 Jun 2026
https://github.com/davideferre/covid19-data-crawler-ita
Covid 19 italian data crawler
coronavirus covid19 crawler hacktoberfest hacktoberfest2021 python
Last synced: 03 Jun 2026
https://github.com/sysadmindoc/stock-video-collector
Headless browser crawler with a PyQt6 GUI for discovering, cataloging, and downloading stock video clips from Artlist, Pexels, Pixabay, Storyblocks, and more.
crawler gui pyqt6 python stock-video video
Last synced: 28 Jun 2026
https://github.com/octcarp/sustech_cs209a-java2_f24_proj
(Spring Boot + Vue3) Stack Overflow data crawling and visualization: Our project of CS209A 2024 Fall: Computer System Design and Applications A (a.k.a. Java 2), SUSTech. Taught by Dr. Yida Tao @yidatao .
crawler spring-boot stackexchange sustech visualization
Last synced: 10 May 2026
https://github.com/efishery/wpi-kkp-crawler
This is crawler for fisheries price on wpi.kkp.go.id
Last synced: 29 Jun 2026
https://github.com/muhfalihr/pycrawlconnect
Project to connect crawled data to Kafka and monitor using elasticsearch. Still under development, PLEASE UNDERSTAND. Haha:)
apache-kafka beginners books crawl crawler crawling crawling-python elasticsearch indonesian instagram movie news python-script python3 social-media twitter x
Last synced: 04 May 2026
https://github.com/basemax/buskool.com-crawler
This repository contains a PHP-based crawler and scraper designed to fetch and download all product data from the Buskool website (باسکول). The crawler is designed to handle large-scale data scraping efficiently and stores the collected data in JSON format.
buskool buskoolcom crawler crawler-php php php-crawler
Last synced: 03 May 2026
https://github.com/par7133/splash-bot-crawler
Splash Bot creates splash on the fly of your websites - GPL License 🔥
bot crawler gallery open-source opensource php splash
Last synced: 10 May 2026
https://github.com/puureya2/llm-powered-web-scraper
Big Data Web Scraper Framework, Internship Project
asyncio crawl4ai crawler crush-cli csv deepseek-r1 gemini-2-5-flash gemini-api gzip json llm pandas pydantic python selenium-webdriver seleniumwire web-scraper
Last synced: 29 Apr 2026
https://github.com/pxlrbt/website-diff
Utility tool that bundles a crawler and BackstopJS for visual regression testing.
backstopjs crawler visual-regression-testing
Last synced: 29 Apr 2026
https://github.com/ryanchao2012/okbot
A conversation retrieval engine based on PTT corpus
Last synced: 24 Apr 2026
https://github.com/rebrowser/seatgeek-dataset
SeatGeek ticket marketplace data: events with taxonomy and schedule status, listings with section/row and deal bucket, 15K+ performers, 12K+ venues with capacity and coordinates. Updated daily.
concerts crawler data-collection data-science dataset deal-score events open-data scraper seatgeek sports ticket-prices tickets web-scraping
Last synced: 03 May 2026
https://github.com/nagilum/focus
Simple CLI tool, written in C#, to crawl a site and log the responses.
cli crawl crawler csharp playwright
Last synced: 24 Apr 2026
https://github.com/luukalindgren/jobposts-utu
Web site for a database that holds job post data of IT jobs.
crawler docker fastapi mariadb react virtual-machine
Last synced: 29 Apr 2026
https://github.com/zukahai/formosa-views
View Formosa employee profile, salary, bonus year
bonus-year crawler css formosa html javascript nodejs python salary views
Last synced: 29 Apr 2026
https://github.com/sammwyy/craw
a website-crawler library for nodejs
crawler crawlers html javascript library node nodejs nodejs-module npm npm-module parser spider website
Last synced: 29 Apr 2026
https://github.com/tctien342/simple-doc-crawler
Craw all sub page from given URL to markdown
Last synced: 03 Mar 2026
https://github.com/emarifer/search-engine
A mini Google. Custom web crawler & indexer written in Golang.
crawler dashboard deep-first-search fiber-framework full-text-search golang gorm-orm htmx htmx-go hyperscript indexer inverted-index response-caching search-engine templ worker-pool
Last synced: 16 Apr 2026
https://github.com/dizys/weibo-crawler
A nodejs weibo crawler
crawler nodejs typescript weibo-spider
Last synced: 19 Apr 2026
https://github.com/leomaurodesenv/smm-maker-profile
A package to fetching the maker profile - Super Mario Maker
crawler javascript json mario-maker nodejs
Last synced: 08 May 2026
https://github.com/redco/goose-phantom-environment
Environment for Goose parser which allows to run it in PhantomJS
crawler environment goose goose-parser nodejs parse parser phantomjs scraper
Last synced: 30 Apr 2026
https://github.com/manku27/webscrapping
Crawls and scraped a website to get rental listings as per my custom needs which the website wasnt providing, and to directly scrape necessary information like Property owner's phone number for quick use.
beautifulsoup crawler python scraper
Last synced: 30 Apr 2026
https://github.com/priyakdey/github-api-crawler
A crawler to crawl and save the APIs found in the Public APIs github repo - https://github.com/public-apis/public-apis. Visit README for details.
Last synced: 04 May 2026
https://github.com/dean9703111/ithelp_total_count
計算 IT邦幫忙文章的瀏覽/Like/留言總數
crawler ithelp total-likes total-responses total-views
Last synced: 07 May 2026
https://github.com/openpj/manifoldcf-sdk
Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions
apache crawler docker ecm extensions integrations manifoldcf migration sdk search
Last synced: 07 May 2026
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 01 May 2026
https://github.com/ndoolan360/go-crawler
A simple web crawling program written in Go in an afternoon. 🕷️🕸️
afternoon-project crawler scraper
Last synced: 21 Apr 2026
https://github.com/bl4ck0w1/swmap
Service Worker security scanner that maps scope, caching, routes & Workbox behavior into actionable risk static-first with optional AST/headless.
app-sec bug-bounty crawler dynamic-analysis penetration-testing playwright pwa recon security-tools service-worker static-analysis web-security work-box
Last synced: 21 Apr 2026
https://github.com/siddhantsharma24/stock-market-scraper-jsoup
A web crawler application made using Jsoup Library for scraping Stock Market data from a webpage.
crawler java jsoup jsoup-html jsoup-library web-scraping
Last synced: 13 Jun 2026
https://github.com/chenty2333/tiktok-youtube_commentscraper
This tool allows you to collect public comments from TikTok and YouTube videos, either via direct video URLs or keyword-based search. It's useful for data analysis, opinion mining, and building datasets for machine learning tasks.一个轻量级的 TikTok 与 YouTube 评论爬虫工具,支持通过视频链接或关键词批量获取评论数据,适用于情感分析、文本挖掘、机器学习等数据收集任务。
comment crawler nlp scraper sentiment-analysis tiktok youtube
Last synced: 20 Apr 2026
https://github.com/siddhantsharma24/web-scraping-application-jsoup
A web crawler application made using Jsoup Library for scraping information from a webpage.
crawler java jsoup jsoup-crawler scraping
Last synced: 13 Jun 2026
https://github.com/scrape-do/python-sample
Best Rotating Proxy & Scraping API Alternative. Python Example.
capcha-solver captcha crawler crawlers data-mining data-science data-scraping free freeproxy freeproxylist proxy proxy-list rotating-proxy scraper scraping scraping-api scraping-tool web-scrapers web-scrapping
Last synced: 12 Jun 2026
https://github.com/christopher-besch/therapy_search
Compute Call Times from arztsuche-bw into a Calendar.
appointments calendar crawler gatsby therapy time-management typescript
Last synced: 01 May 2026
https://github.com/sarvarbekup/searchin_v1
Seach system
backend crawler crawling crawling-go frontend go golang nextjs search-engine searching
Last synced: 16 Apr 2026
https://github.com/tsaohucn/crawler_fb_group
This is crawler use selenium for facebook groups
crawler facebook-groups rails ruby
Last synced: 27 Apr 2026
https://github.com/poodle64/supacrawl
Zero-infrastructure web scraping for the terminal
cli crawler llm markdown playwright python scraper terminal web-scraping
Last synced: 04 Mar 2026
https://github.com/dan3002/imdb-crawler
A powerful Python-based web crawler that collects comprehensive movie information from IMDb using both GraphQL API and web scraping techniques. This tool can gather detailed movie data including basic information, reviews, and ratings for any type of movies based on customizable filters.
crawler imdb imdb-dataset selenium
Last synced: 27 Apr 2026
https://github.com/tsoliangwu0130/ex-dividend-date-notification
crawler email-notification python3 stock-market vanguard
Last synced: 11 Jun 2026
https://github.com/danielemoraschi/go-sitemap-common
Simple GO sitemap generator and crawler.
crawler golang sitemap sitemap-generator
Last synced: 17 Jun 2026
https://github.com/alizdavoodi/mcpdocsearch
This project provides a toolset to crawl websites, generate Markdown documentation, and make that documentation searchable via a Model Context Protocol (MCP) server, designed for integration with tools like Cursor.
Last synced: 13 May 2026
https://github.com/suddi/fundscraper
Collection of web crawlers to scrape fund data using Scrapy
Last synced: 06 Jun 2026
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 18 Jun 2026
https://github.com/m98/email-extractor-crawler
A minimal Node crawler to find emails used inside a website content, this crawler follows links in the website and tries to find an email in the content of the page
crawler email javascript lowdb node-crawler nodejs scraper
Last synced: 25 Apr 2026
https://github.com/alatiera/ellinofreneia-crawler
Crawler of ellinofreneianet.gr for offline content consumption
Last synced: 19 Jun 2026
https://github.com/lukasherz/22fs-sc-twitter-crawler
used for a research project in social computing @ uzh (fs22)
crawler crawling database twitter twitter-api-v2
Last synced: 02 May 2026
https://github.com/chen0040/ios-stock-tracker
Stock tracker implemented using Objective-C for iOS
crawler ios-app objective-c stock-prices
Last synced: 20 Jun 2026
https://github.com/juangesino/gazette
A personal news aggregator application using Meteor.
crawler meteor meteorjs news news-aggregator news-feed scraper
Last synced: 17 Apr 2026
https://github.com/luthfan98/screenshoot-crawl-web-automation
Automated full-website screenshot capture and internal link crawler using Puppeteer. Organized output with full-page screenshots, link discovery, retries, and AEST timestamp logs.
automation crawler nodejs puppeteer puppeteer-screenshot web-scraping
Last synced: 27 Apr 2026
https://github.com/karantyagi/web-crawler
BFS and DFS implementations for a wikipedia crawler
Last synced: 05 Jun 2026
https://github.com/kfess/cf_dashboard_crawler
Crawl Codeforces API and web site.
api codeforces competitive-programming crawler github-actions python
Last synced: 25 Apr 2026
https://github.com/igorbrizack/web-scraper
Aplicação de raspagem de dados HTML, construída em python.
crawler pytest python3 scraper
Last synced: 08 May 2026
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 20 Oct 2025
https://github.com/thomashirtz/douban-crawler
A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.
Last synced: 14 May 2025
https://github.com/hong539/acgbox_crawler
An web-crawler for gamer.com.tw/acgbox
beautifulsoup4 crawler pandas python requests scrapy sqlalchemy web-crawler
Last synced: 05 Apr 2025
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 23 Jan 2026
https://github.com/ambersun1234/lotto_crawler
web crawler for fetching Taiwan lottery history data
Last synced: 15 Jun 2025
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 15 May 2026
https://github.com/maxiroellplenty/gs-robot
NodeJs tool to scrap gelbe-seiten
axios cheerio crawler gelbe-seiten nodejs scraper yargs
Last synced: 18 May 2026
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 22 May 2026
https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper
Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.
console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox
Last synced: 09 May 2026
https://github.com/kgruiz/stealth-crawler
Asynchronous headless-Chrome web crawler that discovers internal links and optionally saves HTML, Markdown, screenshots, or PDFs. Built for scripting, inspection, and automation.
asyncio cli crawler headless-chrome html-scraper pydoll python web-crawler
Last synced: 25 Oct 2025
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 22 Jun 2026
https://github.com/arihantbansal/cybersec-python
Cybersec/CTF practice problems solved in Python
crawler cryptography ctf cybersecurity sockets webscraping
Last synced: 02 Aug 2025
https://github.com/deventerprisesoftware/scrapi-sdk-dotnet
The only web scraping service you'll ever need that offers advanced features that are simple to use for efficient data extraction.
browser-automation crawler scraper-api web-scraping webscraper
Last synced: 22 May 2026
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 23 Feb 2026
https://github.com/basemax/kashan-university-phone-directory
This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.
crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan
Last synced: 18 May 2026
https://github.com/bigmeech/mangaka
Crawl scanlation websites for manga pages
comic crawler manga scanlation webtoon
Last synced: 23 Jan 2026
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 04 Jan 2026
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 24 Jul 2025
https://github.com/yowenter/career-roadmap
Oh, how I hate this living death which has swallowed all my teens, if I am cursed with any, will be worn away!
career crawler findjob job-crawler roadmap search-engine
Last synced: 10 Apr 2025
https://github.com/morungos/github-issue-crawler
Github crawler for public repositories, issues, and comments
Last synced: 30 Apr 2026
https://github.com/bkdev98/ebooks-crawler
Ebooks crawler for personal purpose using ReactJS.
crawler material-ui nodejs reactjs
Last synced: 12 Apr 2026
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 18 May 2026
https://github.com/bitscoper/bitscoper_crawler
Crawls the titles of webpages in series by number and creates a list of the available links.
Last synced: 27 Mar 2025
https://github.com/zhs007/lottery-crawler
基于jarvis-task的爬虫,主要用来爬取lottery数据。
Last synced: 30 Oct 2025
https://github.com/dhchenx/quick-crawler
A toolkit for quickly performing crawler functions
Last synced: 27 Oct 2025
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 21 Mar 2025
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 28 Oct 2025
https://github.com/rflcnunes/crawler_email_py
In this project I'm creating a web crawler to check email boxes and handle incoming messages.
aws-bucket aws-bucket-s3 aws-s3 crawler crawler-python email python rabbitmq
Last synced: 10 Aug 2025
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 18 May 2026
https://github.com/kangoo13/textbroker-author-article-picker
Bot that automatically lock an order into a textbroker's author account.
author-textbroker automation bot colly crawler go gocolly golang scrapper spider textbroker textbroker-author textbroker-order-picker textbroker-orders textbroker-scrapper
Last synced: 02 Aug 2025