Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/splorg/sage
A scraper to get every quote from a book off of Goodreads.
books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping
Last synced: 12 Jun 2025
https://github.com/forattini-dev/crawlex
The stealth crawler that actually looks like Chrome.
Last synced: 14 May 2026
https://github.com/aminehsan/datamining-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scraping
Last synced: 25 Jul 2025
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 06 Apr 2025
https://github.com/rcmilan/ex-web-scraping
Web Scraping com F#
crawler f-sharp fsharp fsharp-data scraper web-scraping xplot
Last synced: 27 May 2026
https://github.com/diegojromerolopez/relwrac
A basic crawler developed with python and asyncio
asyncio crawler page-rank python
Last synced: 11 Nov 2025
https://github.com/danielemoraschi/sitemap-app
Sitemap generator command line application using dmoraschi/sitemap-common library
crawler php php-library sitemap sitemap-generator
Last synced: 19 Oct 2025
https://github.com/longluo/spider
My Python Spider / Crawler
crawler python spider twitter weibo weibo-crawler weibo-spider
Last synced: 11 Jun 2025
https://github.com/phatpham9/scraper.fun
Building, using & sharing HTML scraper are way funnier!
Last synced: 24 Mar 2025
https://github.com/eghuro/crawlcheck
Extensible web crawler
configuration crawler http plugin python robots-txt sitemap
Last synced: 12 Apr 2026
https://github.com/agucova/needs-seeding
🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.
Last synced: 12 Oct 2025
https://github.com/andrepradika/scrape-medrecruit.medworld.com
🛠 A Playwright-based web scraper that extracts job listings from MedRecruit, including job title, department, location, job type, duration, and job URL, saving the data to an Excel file.
Last synced: 17 Mar 2025
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 22 Jul 2025
https://github.com/isaqueveras/scrape-google-results
Scrape Google Results in Golang
crawler golang google scraper webcrawler
Last synced: 21 Mar 2025
https://github.com/andrepradika/scrape-xpel.com
📌 A Playwright-based web scraper that extracts installer details from XPEL’s Installer Locator and saves them to CSV and Excel files.
Last synced: 17 Mar 2025
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 10 Jun 2026
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 10 Jun 2026
https://github.com/maddevsio/spiderwoman
"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources
big-data count-links crawler golang
Last synced: 19 May 2026
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 16 Jul 2025
https://github.com/balintpethe/laravel-universal-scraper
Universal Scraper for Laravel
crawler laravel scraper web-scraper
Last synced: 13 Jan 2026
https://github.com/ronierisonmaciel/crawler
Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.
beautifulsoup4 crawler crawling python python3
Last synced: 26 Mar 2025
https://github.com/casatrick/solana-transaction-crawler
crawl & parse solana transaction
crawler parser rust solana transaction
Last synced: 20 Jun 2026
https://github.com/bersegosx/exparic
Web parser via yaml config
crawler parser yaml-configuration
Last synced: 21 Oct 2025
https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper
Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.
codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider
Last synced: 01 Jun 2026
https://github.com/athulmurali/flickr-api-docs-crawler
A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus
api beautifulsoup4 crawler documentation python3
Last synced: 18 Jun 2026
https://github.com/seanghay/wpget
⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API
Last synced: 08 Feb 2026
https://github.com/prorobot-ai/worker
A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).
crawler golang grpc-server scraper
Last synced: 29 Jul 2025
https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb
Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.
crawler fastapi google-scholar mongodb python selenium
Last synced: 16 Apr 2026
https://github.com/chenbingwei1201/threads_scraper
A Python package for scraping Threads posts.
chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites
Last synced: 03 Feb 2026
https://github.com/eneax/web-crawler
A web crawler built in Node.js
crawler javascript nodejs web-crawler
Last synced: 15 Apr 2026
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 15 Mar 2025
https://github.com/sauerbraten/monzter
Link crawler with configurable maximum depth and rate limiting
Last synced: 23 May 2026
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 28 May 2026
https://github.com/amirsorouri00/crawler
Page-Rank Public python2 projects whice have been turned into python3.
Last synced: 05 Sep 2025
https://github.com/c17an/grade-tracer
👨💻 항공대 성적변동 추적 크롤러 🏑
concurrently crawler es6 express nodejs nodemon puppeteer react
Last synced: 13 Apr 2026
https://github.com/istador/mediaindexer
Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.
Last synced: 03 Jan 2026
https://github.com/fusetim/bitcrawler
Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?
Last synced: 30 Mar 2025
https://github.com/gabrielolobo/crawley
This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.
crawler poetry python scrapping
Last synced: 22 Jun 2025
https://github.com/jauharibill/animeindo-crawler
this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage
Last synced: 08 Jul 2025
https://github.com/timzatko/fiit-vinf-1
School project - data crawling, storing using ElasticSearch and visualisation.
Last synced: 16 Jan 2026
https://github.com/ggteixeira/motorcycle-simulator
A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..
crawler motorcycle olx scraper
Last synced: 28 Feb 2025
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 08 Nov 2025
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 10 May 2026
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 17 Nov 2025
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 14 May 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 12 Jun 2026
https://github.com/guillempuche/news_crawler
Scrape news from Olot town hall (https://www.olot.cat) with TypeScript and Crawlee. Collects summaries and full articles, stored in separate datasets.
biomejs crawlee crawler news-crawler olot townhall yarn-berry
Last synced: 23 Oct 2025
https://github.com/kweonminsung/crawl2toast
Real-time toast notification of crawled data with CSS selectors(Windows Only)
beautifulsoup4 crawler selenium tkinter toast-notifications
Last synced: 18 May 2026
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 31 Mar 2025
https://github.com/linjonh/videowebsidesparser
This Project is used to parse a video web side to remove ads.
Last synced: 13 Jun 2025
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 30 Jul 2025
https://github.com/rebrowser/iheart-dataset
iHeart radio station database: 3,600+ stations with call letters, formats, markets, cume audience, stream URLs, and 185M+ daily airplay records. Updated daily.
airplay crawler data-collection data-science dataset datasets iheart music-data open-data radio radio-stations scraper web-scraping
Last synced: 03 May 2026
https://github.com/arman-aminian/divar-text-exploring
The first practice of Dr. Asgari's NLP lesson - Data Exploration
crawler natural-language-processing nlp preprocessing scrapy
Last synced: 15 Jun 2026
https://github.com/zhanziyuan/webdownloader
Download elements from the specified website.
crawler downloader image image-downloader python python-crawler web
Last synced: 15 Jun 2026
https://github.com/oleksandr-moik/spring-boot-web-crawler
Web Crawler app on Spring Boot. Getting categories and relevant news category.
crawler gradle java spring-boot
Last synced: 03 May 2026
https://github.com/yann-github/webcrawler-http
Command line application to crawl a website and generate a report of internal linking structure
crawler csv-format javascript jest node report tdd
Last synced: 03 May 2026
https://github.com/taiizor/gocrawler
A high-performance web crawler with concurrent processing capabilities written in Go.
crawler csv go golang golang-application golang-library json storage url web
Last synced: 26 Apr 2026
https://github.com/bingxyz/btcethcrawler
telegram 比特幣、乙太幣廣播頻道
bash bash-script crawler telegram-bot
Last synced: 26 Apr 2026
https://github.com/qeqqe/cog
An MCP integerated intelligent RAG that gives relevent context to LLM's through crawled Docs
backend-api claude-desktop crawl4ai crawler fastapi mcp python rag sementic-chunking
Last synced: 04 May 2026
https://github.com/palpitate-xus/sge_data_insert
利用Github Actions实现自动获取sge数据并存入数据库
Last synced: 26 Apr 2026
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 16 Jun 2026
https://github.com/kareemsasa3/arachne
A resilient, concurrent web scraper service built in Go, featuring a REST API, Redis-backed job queue, and circuit breaker for fault tolerance.
asynchronous circuit-breaker concurrency crawler docker docker-compose go golang job-queue rate-limiting redis rest-api web-scraper web-scraping
Last synced: 04 May 2026
https://github.com/bennettdams/vace-it-crawler
Python (Scrapy) crawler to access data of FACEIT.com
Last synced: 03 Jun 2026
https://github.com/liu233w/ojhunt-lite
A lightweight async Python tool for querying Online Judge (OJ) statistics across multiple platforms. Track your accepted problems (AC) and total submissions from 29+ competitive programming platforms.
acm-icpc codechef-api codeforces-api crawler spoj-api
Last synced: 05 May 2026
https://github.com/basemax/crawleryjc
This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.
crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news
Last synced: 05 May 2026
https://github.com/monumentality/ifiend
Check latest YouTube uploads without leaving the comfort of your terminal.
crawler headless-chrome terminal-based youtube yt-dlp
Last synced: 25 Apr 2026
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 05 May 2026
https://github.com/lanesun/one-link
"One Link to rule them all."
crawler curl http svelte web-service
Last synced: 05 May 2026
https://github.com/dnlzrgz/excursionist
Scrapy-powered flight price crawler.
crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling
Last synced: 24 Apr 2026
https://github.com/mach1el/openproject-crawler
Scraping data on OpenProject
crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling
Last synced: 17 Jun 2026
https://github.com/jnbdz/xtamia-crawler
(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux
crawler electron foundation foundation-css javascript scraper vuejs xtamia
Last synced: 06 May 2026