Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 18 May 2026
https://github.com/xyk2002/aqistudy-crawler
关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件
Last synced: 22 Aug 2025
https://github.com/ecklf/reddit-clawler
A command-line tool written in Rust that crawls Reddit posts from a user or subreddit
cli crawler downloader downloader-for-reddit reddit
Last synced: 31 Mar 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 11 May 2025
https://github.com/kweonminsung/crawl2toast
Real-time toast notification of crawled data with CSS selectors(Windows Only)
beautifulsoup4 crawler selenium tkinter toast-notifications
Last synced: 18 May 2026
https://github.com/knguyen780/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 25 Jun 2025
https://github.com/lfsc09/crawl-this-go
Simple CLI tool for crawling pdf documents and html pages
Last synced: 18 Jun 2025
https://github.com/moj124/web_crawler
The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.
crawler crawler-python links-spider
Last synced: 13 Mar 2025
https://github.com/isaqueveras/scrape-google-results
Scrape Google Results in Golang
crawler golang google scraper webcrawler
Last synced: 21 Mar 2025
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 22 Jul 2025
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 29 Oct 2025
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 18 May 2026
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 28 May 2026
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 19 Apr 2026
https://github.com/javapuppteernodejs/bypass-cloudflare-turnstile-crawl4ai
Learn how to integrate Crawl4AI with CapSolver to automatically solve Cloudflare Turnstile challenges.
automation capsolver captcha captcha-solver cloudflare-turnstile cloudflare-turnstile-bypass cloudflare-turnstile-solver crawl4ai crawler data-extraction python turnstile web-scraping
Last synced: 17 May 2026
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 24 Mar 2025
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 30 Oct 2025
https://github.com/balintpethe/laravel-universal-scraper
Universal Scraper for Laravel
crawler laravel scraper web-scraper
Last synced: 13 Jan 2026
https://github.com/apurvsikka/mediaverse
MediaVerse is a versatile search engine for various media types such as anime, books and drama
anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv
Last synced: 29 Mar 2025
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 16 Jul 2025
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 06 Apr 2025
https://github.com/phatpham9/scraper.fun
Building, using & sharing HTML scraper are way funnier!
Last synced: 24 Mar 2025
https://github.com/xprnvd/makdi
Website crawler created for pentest exercises like HTB.
crawler htb htb-scripts pentest python
Last synced: 20 Jul 2025
https://github.com/longluo/spider
My Python Spider / Crawler
crawler python spider twitter weibo weibo-crawler weibo-spider
Last synced: 11 Jun 2025
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 30 Oct 2025
https://github.com/huyduc1602/uniapp-crawler
Crawl và Dịch tài liệu Uni-app
Last synced: 25 Jan 2026
https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez
Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.
beautifulsoup crawler immigration web
Last synced: 16 Jun 2025
https://github.com/jimut123/leaderbehaviour
Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!
crawler leaderbehaviour newsscraper scrapy timesofindia
Last synced: 16 Jan 2026
https://github.com/billy0402/python-application
A learning project from the book 'Python 技術者們'.
course crawler matplotlib opencv pandas python requests selenium sklearn
Last synced: 12 Apr 2026
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 15 Mar 2025
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Apr 2026
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 06 Jan 2026
https://github.com/azshurith/depth-crawler
A simple yet powerful Python web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.
Last synced: 20 Apr 2026
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 19 May 2026
https://github.com/rafaelmoraes003/tech-news
Analysis and manipulation of news data from a technology website obtained through data scraping using Python.
crawler data-scraping https mongodb parsel pymongo python web-scraping
Last synced: 05 May 2026
https://github.com/laffrex/xiaolanben_crawler
一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。
Last synced: 23 Mar 2025
https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler
StackOverFlow Tag Generator Using a WebCrawler.
Last synced: 08 Apr 2025
https://github.com/eklem/vinmonopolet-crawler
Crawling Vinmonopolet-data and indexing it to a norch search index
crawler dataset javascript norch search-engine
Last synced: 26 Mar 2025
https://github.com/recepkizilarslan/console-tourist
Tourist is a simple tool that allows you to collect console messages, errors, unsuccessful requests of all your pages after the DOM loading with authentication support.
console-log crawler crawling crawling-tool error-monitoring error-reporting qa qa-automation qatools
Last synced: 24 Feb 2026
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 17 Nov 2025
https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
Last synced: 13 Mar 2025
https://github.com/andresayac/cuevana3
Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.
Last synced: 05 Apr 2025
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 06 May 2026
https://github.com/pranavj1001/webcrawler
A simple Web Crawler
crawler java javascript nodejs web-crawler
Last synced: 11 May 2026
https://github.com/kiranjisonawane143/blockchain-data-crawler
🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 06 May 2026
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 19 Jul 2025
https://github.com/earelin/jwraith
A Java clone of the Wraith website comparison tool.
crawler screenshots screenshots-comparison selenium webtest
Last synced: 17 May 2026
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 17 May 2026
https://github.com/shivamsaraswat/webxcrawler
WebXCrawler is a fast static crawler to crawl a website and get all the links.
crawler crawling python scraping webcrawler webxcrawler
Last synced: 13 Feb 2026
https://github.com/allancapistrano/steam.py
An API wrapper for Steam written in Python.
Last synced: 16 Mar 2025
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 12 Apr 2026
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 20 May 2026
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 07 Jul 2025
https://github.com/bradsec/gomine
A Go CLI tool to quickly crawl and mine (download) specific file types from websites.
cli crawler golang terminal-based
Last synced: 09 Apr 2025
https://github.com/martincastroalvarez/web-to-pdf
Web crawlers using Python & Beautiful Soup
Last synced: 08 Apr 2025
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 16 May 2026
https://github.com/chamzzzzzz/supersimplesoup
a go package implements a super simple soup like DOM API
beatifulsoup crawler crawler-go dom go golang html-parser
Last synced: 28 Jan 2026
https://github.com/rowyio/llm-web-crawler
Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.
ai automation crawler llm lowcode nocode scraper web web-crawler workflow
Last synced: 15 Jul 2025
https://github.com/montenegrodr/letmecrawl
Curated free proxies
crawler proxy proxy-server proxypool scraper
Last synced: 18 Jan 2026
https://github.com/gn00678465/crawler
使用 Firecrawl API 的 Python CLI 工具,支援多種輸出格式的網頁爬取。
Last synced: 06 Feb 2026
https://github.com/dinofizz/sitemapper
sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.
astradb cassandra concurrency crawler go golang kubernetes nats sitemap
Last synced: 16 Jan 2026
https://github.com/danielfillol/ab2l_crawler
Crawler for AB2L radar
brazil crawler lawtech legaltech
Last synced: 28 Jan 2026
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 16 May 2026
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 23 Mar 2025
https://github.com/nextlevelshit/adonis-crawler
A free web crawler on top of the incredibile AdonisJS Framework
adonisjs crawler javascript nodejs regex spider websocket
Last synced: 22 May 2026
https://github.com/davelongdev/link-report-crawler
A web crawler using Node.js that crawls a site and returns a report showing all internal links.
crawler crawling javascript seo seo-tools
Last synced: 16 Jun 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 29 Apr 2026
https://github.com/ahsouza/iquizz-api
API RESTfull developed in Node.Js with MongoDB
animations cluster crawler docker docker-compose ejs-templates es8 font-awesome grunt-task helmet-detection heroku javascript jquery material-design mongodb nodejs passport-strategy passportjs pusher token-authetication
Last synced: 12 Apr 2026
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 01 Sep 2025