Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/mlibre/clean-web-scraper
A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖
ai artificial-intelligence clean crawler data-preprocessing dataset fine-tuning llm recursive-crawling scraper training
Last synced: 17 Mar 2025
https://github.com/jorgeparavicini/medalytik-python
Python crawlers for a job mediation firm
Last synced: 07 Jul 2025
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 18 May 2026
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 14 Oct 2025
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 24 Jul 2025
https://github.com/noarche/darknoisy
Same as my Noisy but on TOR network. Logs links. Crawls onion sites.
crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks
Last synced: 08 Sep 2025
https://github.com/cseas/shares-monitor
Web crawler to fetch and monitor shares details.
crawler python python3 scraper scraping-websites shares
Last synced: 27 Jul 2025
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 14 May 2026
https://github.com/uranusx86/dcard-crawler-analyzer
get Dcard & Meteor forum content and analyze !
crawl crawler dcard nlp python word-cloud word-count word-frequency
Last synced: 14 Jul 2025
https://github.com/ambersun1234/lotto_crawler
web crawler for fetching Taiwan lottery history data
Last synced: 15 Jun 2025
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 15 May 2026
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 30 Oct 2025
https://github.com/javapuppteernodejs/bypass-cloudflare-turnstile-crawl4ai
Learn how to integrate Crawl4AI with CapSolver to automatically solve Cloudflare Turnstile challenges.
automation capsolver captcha captcha-solver cloudflare-turnstile cloudflare-turnstile-bypass cloudflare-turnstile-solver crawl4ai crawler data-extraction python turnstile web-scraping
Last synced: 17 May 2026
https://github.com/apurvsikka/mediaverse
MediaVerse is a versatile search engine for various media types such as anime, books and drama
anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv
Last synced: 29 Mar 2025
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 19 Apr 2026
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 18 May 2026
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 29 Oct 2025
https://github.com/moj124/web_crawler
The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.
crawler crawler-python links-spider
Last synced: 13 Mar 2025
https://github.com/knguyen780/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 25 Jun 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 11 May 2025
https://github.com/xyk2002/aqistudy-crawler
关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件
Last synced: 22 Aug 2025
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 18 May 2026
https://github.com/m-taghizadeh/persian_question_answering_voice2voice_ai
This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.
artificial-intelligence corpus-linguistics crawler deep-learning farsi farsi-datasets large-language-models machine-learning natural-language-processing persian python question-answering scraping-python speech-to-text text-to-speech transformer-architecture word2vec
Last synced: 04 May 2026
https://github.com/abx123/coronachan
Simple lambda function to crawl MKN twitter account for daily Malaysia COVID-19 updates.
crawler lambda-functions python
Last synced: 28 Mar 2025
https://github.com/adham90/github_user_crawler
GeekHub: github username crawler
Last synced: 21 Mar 2025
https://github.com/xprnvd/makdi
Website crawler created for pentest exercises like HTB.
crawler htb htb-scripts pentest python
Last synced: 20 Jul 2025
https://github.com/jesseokeya/linkedin-scraper
Selenium webDriver used to get information from linkedIn
chromedriver crawler linkedin os python scraper selenium-webdriver
Last synced: 29 Apr 2026
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 30 Oct 2025
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 14 May 2025
https://github.com/timzatko/fiit-vinf-1
School project - data crawling, storing using ElasticSearch and visualisation.
Last synced: 16 Jan 2026
https://github.com/azshurith/depth-crawler
A simple yet powerful Python web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.
Last synced: 20 Apr 2026
https://github.com/maddevsio/spiderwoman
"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources
big-data count-links crawler golang
Last synced: 19 May 2026
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 10 Jun 2026
https://github.com/isaqueveras/scrape-google-results
Scrape Google Results in Golang
crawler golang google scraper webcrawler
Last synced: 21 Mar 2025
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 22 Jul 2025
https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
Last synced: 13 Mar 2025
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 06 Apr 2025
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 19 May 2026
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 06 May 2026
https://github.com/andresayac/cuevana3
Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.
Last synced: 05 Apr 2025
https://github.com/kiranjisonawane143/blockchain-data-crawler
🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 06 May 2026
https://github.com/pranavj1001/webcrawler
A simple Web Crawler
crawler java javascript nodejs web-crawler
Last synced: 11 May 2026
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 20 May 2026
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 19 Jul 2025
https://github.com/nextlevelshit/adonis-crawler
A free web crawler on top of the incredibile AdonisJS Framework
adonisjs crawler javascript nodejs regex spider websocket
Last synced: 22 May 2026
https://github.com/humbertodias/go-nie-crawler
Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.
Last synced: 03 Mar 2025
https://github.com/earelin/jwraith
A Java clone of the Wraith website comparison tool.
crawler screenshots screenshots-comparison selenium webtest
Last synced: 17 May 2026
https://github.com/mstephen19/apify-click-events
Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to
apify apify-sdk crawler scraper web-automation
Last synced: 23 Aug 2025
https://github.com/licoy/win4000-images-crawler
基于scrapy爬取&下载win4000.com的图片壁纸
Last synced: 28 Mar 2025
https://github.com/bruce-lee-ly/crawler
Several fun crawler cases implemented in Python.
Last synced: 27 Jun 2025
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 17 May 2026
https://github.com/sgeisler/fishbones2epub
fetches the fishbones novel and outputs an epub
Last synced: 22 Mar 2025
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 12 Apr 2026
https://github.com/gxjansen/website-to-pdf
Creates a PDF based on the content of a website/subomain
claude-3-sonnet crawler python3
Last synced: 30 Mar 2025
https://github.com/peterbencze/silene
Silene is an open source web crawler framework built upon Pyppeteer.
crawler framework pypp python scraper webcrawler
Last synced: 12 Jan 2026
https://github.com/patrik-fredon/python_wallpaper_crawler
Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.
crawler crawling-python image image-recognition images python scraping-websites scrapper selenium-python uv
Last synced: 14 Sep 2025
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 16 May 2026
https://github.com/ryoii/hook
A declarative Java crawler framework
crawler declarative java java-crawler-framework jdk11
Last synced: 18 Mar 2025
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 22 Jun 2025
https://github.com/rowyio/llm-web-crawler
Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.
ai automation crawler llm lowcode nocode scraper web web-crawler workflow
Last synced: 15 Jul 2025
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 14 Sep 2025
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 03 Feb 2026
https://github.com/kettou/silentscraper
SilentScraper is a web scraping solution built with advanced stealth protocols. It operates undetectably in the background, bypassing anti-scraping mechanisms to collect structured data at scale. It's lightwight architecture mimics humans browsing patterns, rotating IP addresses, spoofing user agents, and more
beautifulsoup beautifulsoup4 crawler datastructures datastructures-algorithms python webautomation webscraper webscraping
Last synced: 23 Jul 2025
https://github.com/g-ongenae/morphalou-crawler
A Crawler for CNRTL's Morphologie words
crawler french lexical-databases list-of-words words
Last synced: 25 Feb 2025
https://github.com/hdevlinz/affiliate-chrome-extension
chrome-extension crawler tiktok
Last synced: 14 May 2025
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 16 May 2026
https://github.com/lolyratul025/web-email-bundler
A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.
crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping
Last synced: 22 May 2026
https://github.com/xoraus/revieworacle
The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.
ai crawler datascience machinelearning scrappy selenium-webdriver
Last synced: 07 May 2026
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 06 Apr 2025
https://github.com/kenanbek/tutorial-python-crawler
Crawling website data using Python with requests and Beautiful Soup libraries
beautifulsoup crawler crawling miner parser python python-requests requests
Last synced: 30 Mar 2025
https://github.com/kestarumper/imagecrawler
Downloads images from given URL
Last synced: 28 Jun 2025
https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance
Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics
arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics
Last synced: 28 Apr 2026
https://github.com/wilmsn/simple_deye_crawler
A simple crawler to get data from the Deye Inverter using the status webpage
crawler deye fhem inverter shell-script
Last synced: 27 May 2026
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 30 Mar 2025