Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/yggverse/pulsarss
RSS Aggregator for Gemini Protocol
aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust
Last synced: 13 Feb 2026
https://github.com/ashwantmanikoth/intellilsearch
This is a AI powered crawler that can search the web for information based on your input.
crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation
Last synced: 15 Apr 2026
https://github.com/dinofizz/sitemapper
sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.
astradb cassandra concurrency crawler go golang kubernetes nats sitemap
Last synced: 16 Jan 2026
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 21 Apr 2026
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 28 Feb 2025
https://github.com/chamzzzzzz/supersimplesoup
a go package implements a super simple soup like DOM API
beatifulsoup crawler crawler-go dom go golang html-parser
Last synced: 28 Jan 2026
https://github.com/shivamsaraswat/webxcrawler
WebXCrawler is a fast static crawler to crawl a website and get all the links.
crawler crawling python scraping webcrawler webxcrawler
Last synced: 13 Feb 2026
https://github.com/luminovrym/crawler-tools-js
Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web
crawler crawler-js data js web-scraping
Last synced: 08 Sep 2025
https://github.com/gn00678465/crawler
使用 Firecrawl API 的 Python CLI 工具,支援多種輸出格式的網頁爬取。
Last synced: 06 Feb 2026
https://github.com/rsheremeta/web-crawler
A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output
crawler go golang web-crawler webcrawler
Last synced: 12 Jun 2026
https://github.com/chenbingwei1201/threads_scraper
A Python package for scraping Threads posts.
chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites
Last synced: 03 Feb 2026
https://github.com/danielfillol/ab2l_crawler
Crawler for AB2L radar
brazil crawler lawtech legaltech
Last synced: 28 Jan 2026
https://github.com/dubniczky/bad-robot
This is a python crawler that disregards robots.txt rules and downloads disallowed resources
crawler osint-python osint-tool python robots-txt
Last synced: 31 Mar 2025
https://github.com/zenixls2/2chpreprocess
Dump messages from 2ch with some preprocessing for ML analysis
Last synced: 26 Mar 2025
https://github.com/robin98sun/structured-web-data-crawler
crawler multi-thread structured-web-data
Last synced: 16 Mar 2025
https://github.com/dubniczky/webmap
Website mapping crawler implemented in python
crawler mapping mapping-tools package python scraping security
Last synced: 31 Mar 2025
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 09 Jan 2026
https://github.com/yangxuhui/requests-google
A simple google related Parsing Package
Last synced: 14 Jan 2026
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 31 Dec 2025
https://github.com/usethisname1419/connectioncrawler
crawls a website and checks for connections
connection crawler http-headers reporting website-analyzer
Last synced: 06 Jul 2025
https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance
Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics
arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics
Last synced: 28 Apr 2026
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 14 May 2026
https://github.com/loko5ja/seed-gen
Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.
crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman
Last synced: 03 Apr 2025
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 17 May 2026
https://github.com/russellsteadman/netscrape
A Node.js framework for creating good bots
bot crawler crawling exclusion rfc9309 scraper scraping web-scraping
Last synced: 20 Jun 2026
https://github.com/sssshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 01 Mar 2025
https://github.com/sedrubal/webcrawler
Crawl sites and search for security issues.
crawler script security website-auditing
Last synced: 17 Mar 2025
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 13 May 2026
https://github.com/basemax/okala-store-ids
A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.
crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala
Last synced: 10 Jun 2025
https://github.com/suconghou/sitemap
a simple sitemap generator and page crawler
crawler html-parser nim-lang scraper sitemap spiders
Last synced: 15 May 2026
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 30 Oct 2025
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 18 Jun 2026
https://github.com/fusetim/bitcrawler
Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?
Last synced: 30 Mar 2025
https://github.com/surister/scrupy
Python library to create web Crawlers which aims to be powerful yet simple.
crawler crawling-framework crawling-python http library python scraping
Last synced: 15 May 2026
https://github.com/jplitza/urlsearch
Index typical webserver directory listings and then search for arbitrary terms
Last synced: 17 Mar 2025
https://github.com/allancapistrano/anime-sheets
Crawler que pega as informações dos animes e salva numa planilha.
anime crawler google-sheets google-sheets-api
Last synced: 16 Mar 2025
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 03 Feb 2026
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 13 Apr 2026
https://github.com/moojing/coinmarketcap-crypto-crawler
A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.
Last synced: 01 Apr 2025
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Apr 2026
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 08 Nov 2025
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 29 Jan 2026
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 12 Apr 2025
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 15 Mar 2025
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Apr 2026
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 01 Apr 2025
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 28 Mar 2025
https://github.com/xprnvd/makdi
Website crawler created for pentest exercises like HTB.
crawler htb htb-scripts pentest python
Last synced: 20 Jul 2025
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 14 Sep 2025
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 10 Apr 2025
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 10 Apr 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 12 Jun 2026
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 24 Jun 2025
https://github.com/apurvsikka/mediaverse
MediaVerse is a versatile search engine for various media types such as anime, books and drama
anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv
Last synced: 29 Mar 2025
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 30 Oct 2025
https://github.com/mnoalett/cscrawler
BSc degree thesis - crawler for www.couchsurfing.org
bsc-thesis couchsurfing crawler data-analysis database python
Last synced: 02 May 2026
https://github.com/rafaelmoraes003/tech-news
Analysis and manipulation of news data from a technology website obtained through data scraping using Python.
crawler data-scraping https mongodb parsel pymongo python web-scraping
Last synced: 05 May 2026
https://github.com/laffrex/xiaolanben_crawler
一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。
Last synced: 23 Mar 2025
https://github.com/isaqueveras/scrape-google-results
Scrape Google Results in Golang
crawler golang google scraper webcrawler
Last synced: 21 Mar 2025
https://github.com/andrepradika/scrape-medrecruit.medworld.com
🛠 A Playwright-based web scraper that extracts job listings from MedRecruit, including job title, department, location, job type, duration, and job URL, saving the data to an Excel file.
Last synced: 17 Mar 2025
https://github.com/nextlevelshit/adonis-crawler
A free web crawler on top of the incredibile AdonisJS Framework
adonisjs crawler javascript nodejs regex spider websocket
Last synced: 22 May 2026
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 17 Nov 2025
https://github.com/andrepradika/scrape-xpel.com
📌 A Playwright-based web scraper that extracts installer details from XPEL’s Installer Locator and saves them to CSV and Excel files.
Last synced: 17 Mar 2025
https://github.com/precioux/pacman
AI Course Projects - Fall 2022
adversial artificial-intelligence bfs-search crawler csp dfs mdp pacman-agent pacman-game pacman-projects reinforcement-learning ucs
Last synced: 28 May 2026
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 13 Apr 2026
https://github.com/javapuppteernodejs/bypass-cloudflare-turnstile-crawl4ai
Learn how to integrate Crawl4AI with CapSolver to automatically solve Cloudflare Turnstile challenges.
automation capsolver captcha captcha-solver cloudflare-turnstile cloudflare-turnstile-bypass cloudflare-turnstile-solver crawl4ai crawler data-extraction python turnstile web-scraping
Last synced: 17 May 2026
https://github.com/jauharibill/animeindo-crawler
this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage
Last synced: 08 Jul 2025
https://github.com/alphabs/navercafeclient
네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리
crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping
Last synced: 06 May 2026
https://github.com/mehdieidi/offliner
Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.
concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread
Last synced: 14 Jan 2026
https://github.com/tjdsneto/jcnet-crawler
Extract (scrap) movie schedule info from JCNet movies page
Last synced: 11 Apr 2026
https://github.com/heitor57/astronomy-news
:telescope::newspaper: Astronomy News
crawler data-science news text-mining
Last synced: 06 Oct 2025
https://github.com/pixlcrashr/stwhh-mensa
Better STWHH Mensa menu data / interface / notifier
api crawler data food studierendenwerk-hamburg university website
Last synced: 07 Aug 2025
https://github.com/bruce-lee-ly/crawler
Several fun crawler cases implemented in Python.
Last synced: 27 Jun 2025
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 10 May 2026
https://github.com/b3j4y/unidisk
A Crawler to search for keywords and compare the score
comparison crawler nlp solr-client
Last synced: 17 Jan 2026
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 19 Apr 2026
https://github.com/marceloneppel/crawler
Simple web crawler developed in Go.
Last synced: 07 Aug 2025
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 10 Jun 2026
https://github.com/gxjansen/website-to-pdf
Creates a PDF based on the content of a website/subomain
claude-3-sonnet crawler python3
Last synced: 30 Mar 2025
https://github.com/semoal/pythoncrawler
Python crawler with XMLRPC & BeautifulSoap
beautifulsoup crawler python wordpress xmlrpc
Last synced: 15 Apr 2026
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 18 May 2026