Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2026-06-23 00:06:44 UTC
JSON Representation

https://github.com/hasdata/find-urls-from-any-domain

This repository provides practical examples of website link scraping using Python and Node.js.

ai-extraction crawler hasdata-api nodejs python sitemap-parser url-extraction web-crawling web-scraping

Last synced: 06 May 2026

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 07 May 2026

https://github.com/theshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 24 Apr 2026

https://github.com/thc1006/nycu_timtable_crawler

🎓 NYCU Course Data Crawler & Timetable System | 國立陽明交通大學課程爬蟲與選課系統 - Python web scraper for course schedules, syllabi & educational data analysis. Crawls 18K+ courses with 98% success rate. Features: interactive timetable, JSON API, Google Colab support, batch processing, resume capability.

academic course course-selection crawler data-analysis education educational-data google-colab json-api nycu open-data python schedule student-tools syllabus taiwan timetable university web-automation web-scraping

Last synced: 24 Apr 2026

https://github.com/tylpk1216/new-taipei-parkinfo

Find the available parking in New Taipei, Taiwan.

crawler golang goverment-data

Last synced: 07 May 2026

https://github.com/theognis1002/nimbus-crawler

Highly concurrent web crawler written in Go

crawler docker golang message-queue postgresql redis

Last synced: 23 Jun 2026

https://github.com/zhqiang1989/youtube-graph-collector

A demo in python on how to collect youtube video engagement graph data

crawler graph video youtube

Last synced: 07 May 2026

https://github.com/ireddragonicy/booruprompt

A simple web application built with NextJS to extract tags from booru websites. Just paste the URL of a booru post, and this tool will fetch and display the associated tags, ready for you to copy.

booru cleaning-data crawler nextjs noobai tags typescript web

Last synced: 07 May 2026

https://github.com/v-bible/crawler

A collection of web crawlers to crawl Catholic resources in Vietnamese language

catholic corpus-linguistics crawler nlp playwright

Last synced: 22 Apr 2026

https://github.com/rodrigorvsn/ace

🔥 Receiving an email of hottest promotions every day

crawler cronjob nextjs prisma puppeteer react-email resend

Last synced: 17 Apr 2026

https://github.com/illm4tic/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 21 Apr 2026

https://github.com/landrisek/contentbot

Create simple content (discussion posts and products description) from previously used data or crawl them from public data.

content crawler golang php php72

Last synced: 17 Apr 2026

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 05 Jun 2026

https://github.com/flaribbit/pixiv-favorites-list

爬取P站收藏夹保存为json格式

crawler pixiv python

Last synced: 21 Apr 2026

https://github.com/wcygan/crawler

web crawler

crawler crawling tokio tokio-rs web-crawler

Last synced: 08 May 2026

https://github.com/tsaohucn/crawler_fb_page

This is crawler use selenium for facebook pages

crawler facebook-page rails ruby selenium

Last synced: 09 May 2026

https://github.com/allotmentandy/socialmedialinkextractor

php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links

crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube

Last synced: 09 May 2026

https://github.com/basemax/okala-product-ids

A PHP script to fetch and save product IDs from Okala's online store API across multiple categories and store branches.

crawler crawler-okala crawler-php crawlers data database ids ir iran json okala okala-crawler php php-crawler product

Last synced: 09 May 2026

https://github.com/xenia101/coro.na

A Web Map Service for the Corona-virus

coronavirus covid-19 crawler crawling flask flask-api json python requests urllib

Last synced: 09 May 2026

https://github.com/catbraaain/search-crawl

Search the web and crawl content stealthily, with optional extraction using LLMs.

crawl crawler fastapi playwright scrape scraping searxng

Last synced: 09 May 2026

https://github.com/a-b-z-b/web-spider

A Humble Web Crawler

crawler docker-compose go mongodb web-crawler

Last synced: 09 May 2026

https://github.com/victorbaumgartner/electron-crawler-ui

Desktop app with axios electron to crawl websites accross multiple servers

app axios crawler desktop electronjs macos multiple-servers multithreading

Last synced: 09 May 2026

https://github.com/brianbruggeman/vax

A vaccination signup tool

covid-19 crawler signup vaccination

Last synced: 21 Apr 2026

https://github.com/ravenastar-js/ravpagelinks

🚀 RavPageLinks 🕷️ Ferramenta básica de Enumeração de URLs em Páginas Web

axios chalk crawler links playwright ravenastar scraping url-enumeration

Last synced: 20 Apr 2026

https://github.com/machinecyc/lotteryinsight

Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.

crawler data docker lottery mysql-database python3 taiwan

Last synced: 09 May 2026

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 20 Apr 2026

https://github.com/nsalvacao/cli-plugins

OpenAPI for CLIs — Crawl any CLI's --help output and generate structured Claude Code plugins with expert command knowledge

ai-agent claude-code cli cli-reference crawler developer-tools help-parser llm plugin python

Last synced: 04 Mar 2026

https://github.com/gesiscss/github_traffic_crawler

Retrieve the data information from the repositories (insight, usage, commits)

crawler github traffic

Last synced: 20 Apr 2026

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具，可以自定义抽取自己所需要的字段内容，简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 10 May 2026

https://github.com/marshallvoid/affiliate-chrome-extension

chrome-extension crawler tiktok

Last synced: 29 Apr 2026

https://github.com/igorbrizack/crawler-web

Aplicação de coleta de dados Web com ReactJS e Python - API Rest

beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper

Last synced: 16 Apr 2026

https://github.com/olostep-api/olostep-cli

CLI for the Olostep API — scrape, map, crawl, answer, batch the web from your terminal. Pure JS rewrite of olostep-cli.

ai-agents cli crawler mcp nodejs npm olostep scraping typescript web-scraping

Last synced: 03 Jun 2026

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 23 Jun 2026

https://github.com/khanof89/twitter_scraper

Scrape tweet details from user profile using selenium

crawler scraper selenium twitter twitter-bot

Last synced: 11 May 2026

https://github.com/woshiluo/bilibilicomic-download

bilibili crawler downloader manga

Last synced: 11 May 2026

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 12 May 2026

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 10 Jun 2026

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 12 May 2026

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 13 May 2026

https://github.com/maxonary/simple-crawler

Streamlit Webscraper

crawler streamlit webscraping

Last synced: 20 Jun 2026

https://github.com/manchittlab/TheCrawler

Open-source web scraper + LLM-powered structured extraction. PDF/DOCX, markdown, JSON-LD, microdata, commerce data, forms, 16 analytics-tracker detection. Structured errors with retryable flags. Adaptive Cheerio->Playwright. CLI, npm, REST API, and MCP server. AGPL-3.0.

agpl apify cheerio crawler llm markdown mcp mcp-server model-context-protocol nodejs playwright rag scraper typescript web-scraping

Last synced: 20 Jun 2026

https://github.com/thamindur/ir-project

Search Engine for Sri Lankan MPs

crawler elasticsearch python scraping search-engine

Last synced: 19 Apr 2026

https://github.com/capturr/json-deep-equal

Check if json objects contains the same values (ignoring arrays order).

array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript

Last synced: 19 Apr 2026

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 14 May 2026

https://github.com/theabbie/shopcrawler

Crawler for Discovering Product URLs on E-commerce Websites (assignment)

crawler

Last synced: 18 Apr 2026

https://github.com/triekai/review-radar

An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.

crawler firebase gemini google-maps nextjs openai pwa react

Last synced: 04 Apr 2026

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 16 Apr 2026

https://github.com/lig8t555/ecommerce

MERN Stack Ecommerce Store | Running In Production | MVP

baidu-tieba baotu bootstrap crawler douban-music ecommerce-platform fofa mongoose quanjing redux shopping-cart shopping-cart-solution stripe taobao-spider

Last synced: 04 Apr 2026

https://github.com/scrape-do/dotnet-example

Best Rotating Proxy & Scraping API Alternative. C# Example.

captcha captcha-solver crawler crawlers crawling data-mining data-science data-scraping free free-proxy free-proxy-list proxy proxy-list proxylist rotating-proxy scraper scraping scraping-api scraping-tool

Last synced: 12 Jun 2026

https://github.com/mirusu400/berryz-dl

Batch download berryz webshare files recursively!

berryz berryz-webshare crawler downloader scraper

Last synced: 22 Jun 2026

https://github.com/ryu1kn/procedural-page-crawler

Page Crawler. Tell it where to go and what to look for.

crawler npm-package scraper

Last synced: 30 Apr 2026

https://github.com/antash-mishra/huskyai

Democratizing News Feed

celery crawler flask llama news nextjs

Last synced: 29 Apr 2026

https://github.com/bandie91/extip

Fetch external IP from known ext. ip providers

address cli crawler external ip ipv4-address parallel

Last synced: 08 Jun 2026

https://github.com/chunkingz/youtubelinks-scraper

A python script that scrapes Youtube links from a predefined website of choice.

crawler python scraper spider websitescraper youtube

Last synced: 29 Apr 2026

https://github.com/nabi-allenby/web-crawler

BFS web crawler

crawler docker k8s kubernetes reconnaissance rust rust-lang webcrawler

Last synced: 02 Mar 2026

https://github.com/metehan777/http-header-link-graph

Publish a site's link graph & heading map in HTTP response headers. Crawl 65k pages in 99 seconds without parsing one byte of HTML. Companion code for the SEO Week 2026 NYC experiment.

aeo answer-engine-optimization cloudflare-workers crawler generative-engine-optimization geo http-headers link-graph python rust seo site-architecture technical-seo

Last synced: 03 Jun 2026

https://github.com/josepedrodias/naivebot

attempt to mimic googlebot behaviour in nodejs with nightmarejs

crawler googlebot nightmarejs nodejs robots

Last synced: 29 Apr 2026

https://github.com/antoniowd/crawly

Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)

crawler got jsdom nodejs webcrawler webscraping

Last synced: 01 May 2026

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 01 May 2026

https://github.com/jurooravec/knwldg

Datasets, scrapers, pipelines

companies crawler data dataset non-profit-organizations scraper scrapy

Last synced: 13 Jun 2026

https://github.com/frobware/grawler

Web Crawler

crawler go

Last synced: 08 Jun 2026

https://github.com/cseas/crawler

Recursive web crawler

crawler python seed-webpage

Last synced: 28 Apr 2026

https://github.com/kkuvam/web-scrape

Web Scraping Technology Evaluation - Evaluation of different web scraping technologies in Python, with a focus on Requests, BeautifulSoup, and Scrapy. Benchmarked each technology for ease of use, performance, scalability, and maintainability

beautifulsoup crawler requests scraping scrapy

Last synced: 28 Apr 2026

https://github.com/justserpapi/web-html

JustSerpAPI Crawl Webpage HTML API Python SDK examples, with related Google Search API, Google Lens API, Google Maps API, Google News API, Google Shopping API, Google Scholar API, Google Finance API, Google Trends API, Google Jobs API, Google Patents API, Google Hotels API, and Web APIs.

crawler google-finance-api google-hotels-api google-jobs-api google-lens-api google-maps-api google-news-api google-patents-api google-scholar-api google-search-api google-shopping-api google-trends-api html-api justserpapi python serp-api web-crawling web-html-api web-scraping

Last synced: 08 Jun 2026

https://github.com/soenneker/soenneker.playwrights.crawler

A configurable Playwright crawler with rich stealth and control options.

browser chrome chromium crawl crawler csharp dotnet playwright playwrightcrawler playwrights scrape scraper stealth util

Last synced: 14 Jun 2026

https://github.com/qqxs/usda_pomological_watercolors

爬取美国农业部果树水彩的数据

crawler koa2 nodejs watercolors

Last synced: 01 May 2026

https://github.com/vhdm/twitter-hashtag-crawler

Twitter hashtag crawler by selenium, without using the Twitter API ;)

crawler python tor twitter

Last synced: 14 Jun 2026

https://github.com/luciopaiva/dicio-crawler

Node.js crawler for dicio.com.br.

crawler nodejs scraper

Last synced: 02 May 2026

https://github.com/zzzzer91/crash

通用多线程爬虫框架。

crawler framework python

Last synced: 28 Apr 2026

https://github.com/dearvn/crawl-mortgage-broker

A script to crawl data from website https://findamortgagebroker.com/

crawler findamortgagebroker mortgage-lenders mortgage-loans nmls php7 python3 seleniumbase

Last synced: 28 Apr 2026

https://github.com/cold-bin/jwzx-mail

use golang to construct cqupt-jwzx crawler application

crawler golang

Last synced: 09 Jun 2026

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 15 Jun 2026

https://github.com/moonyfringers/ladon

crawler data-pipeline ladon ladon-framework llm python training-data web-crawler web-scraping

Last synced: 17 Apr 2026

https://github.com/abdymm/abtelegrambot-sample

sample using Telegram Bot

crawler football php scheduler telegram-bot webhook

Last synced: 15 Jun 2026

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 02 May 2026

https://github.com/raspi/scrapy-corsair

Web crawler for Corsair (corsair.com)

crawler hardware memory scrapy spider

Last synced: 15 Jun 2026

https://github.com/martinkennelly/websitesearchcrawler

Website Crawler

crawler java website

Last synced: 27 Apr 2026

https://github.com/twknab/django_ajax_web_crawler

Web crawler which retrieves all links on any page. Python & Django-powered.

beautifulsoup4 crawler django-application

Last synced: 27 Apr 2026

https://github.com/alexnthnz/web-crawler

Scalable web crawler built with Python, Redis, and Cassandra, inspired by Alex Xu's design. Crawls, indexes, and stores web content with robots.txt compliance and duplicate detection.

crawler python

Last synced: 03 May 2026

https://github.com/soffits/oogc-resource-index

Spreadsheet-ready OOGC resource indexing with incremental crawl, authenticated download URLs, and Seafile export.

agpl-3 automation cli crawler python uv

Last synced: 03 May 2026

https://github.com/mg98/ipfs-replicate

Replicate IPFS' distributed data structure locally, based on network traces.

crawler dag ipfs redisgraph scraper

Last synced: 02 May 2026

https://github.com/rebrowser/iaai-dataset

IAAI salvage auction data: vehicle listings with loss types, damage codes, title brands, mileage, drivetrain, condition grades, and branch locations. Updated daily.

automotive-data crawler data-collection data-science dataset iaai insurance-auto-auction open-data parquet salvage-auction salvage-vehicles scraper total-loss vehicle-auction web-scraping

Last synced: 03 May 2026

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-fingerprinting 74 awesome-digital-preservation 96 awesome-web-scraping 62

Crawler Categories

2.6 机器学习 78 Research 37 1.1 语言基础 35 Core Libraries 28 Web Archiving 26 2.4 Web 前端 23 Python 22 Browser Automation 21 2.1 爬虫基础 20 Java 20 3\. 数据库 18 1.2 语言进阶 16 Anti-Bot Solutions 16 2.5 数据分析 15 Fingerprinting Evasion 14 2.2 Flask 框架 14 JavaScript 13 Specialized Tools 13 Sites 13 Go 12