Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-07-01 00:06:39 UTC
- JSON Representation
https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
Last synced: 13 Mar 2025
https://github.com/tjdsneto/jcnet-crawler
Extract (scrap) movie schedule info from JCNet movies page
Last synced: 11 Apr 2026
https://github.com/precioux/pacman
AI Course Projects - Fall 2022
adversial artificial-intelligence bfs-search crawler csp dfs mdp pacman-agent pacman-game pacman-projects reinforcement-learning ucs
Last synced: 28 May 2026
https://github.com/andresayac/cuevana3
Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.
Last synced: 05 Apr 2025
https://github.com/mnoalett/cscrawler
BSc degree thesis - crawler for www.couchsurfing.org
bsc-thesis couchsurfing crawler data-analysis database python
Last synced: 02 May 2026
https://github.com/jplitza/urlsearch
Index typical webserver directory listings and then search for arbitrary terms
Last synced: 17 Mar 2025
https://github.com/pranavj1001/webcrawler
A simple Web Crawler
crawler java javascript nodejs web-crawler
Last synced: 11 May 2026
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 24 Jun 2025
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 28 Mar 2025
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 12 Apr 2025
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 22 May 2026
https://github.com/brighteyekid/rendermw
Zero-dependency dynamic rendering middleware for Express. No Puppeteer. No external services. No cost. Bots get semantic HTML. Users get your SPA.
angular bots crawler dynamic-rendering express expressjs indexing middleware nodejs open-graph prerender react seo spa typescript vue
Last synced: 24 Jun 2026
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 19 Jul 2025
https://github.com/basemax/okala-store-ids
A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.
crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala
Last synced: 10 Jun 2025
https://github.com/sedrubal/webcrawler
Crawl sites and search for security issues.
crawler script security website-auditing
Last synced: 17 Mar 2025
https://github.com/earelin/jwraith
A Java clone of the Wraith website comparison tool.
crawler screenshots screenshots-comparison selenium webtest
Last synced: 17 May 2026
https://github.com/dubniczky/webmap
Website mapping crawler implemented in python
crawler mapping mapping-tools package python scraping security
Last synced: 31 Mar 2025
https://github.com/dubniczky/bad-robot
This is a python crawler that disregards robots.txt rules and downloads disallowed resources
crawler osint-python osint-tool python robots-txt
Last synced: 31 Mar 2025
https://github.com/rsheremeta/web-crawler
A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output
crawler go golang web-crawler webcrawler
Last synced: 12 Jun 2026
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 17 May 2026
https://github.com/jlenon7/sef_automation
📑 Crawler that automatically enrol in open vacancies in SEF website.
athenna crawler esm nodejs playwright portugal residence sef typescript
Last synced: 03 Mar 2026
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 12 Apr 2026
https://github.com/maddevsio/spiderwoman
"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources
big-data count-links crawler golang
Last synced: 19 May 2026
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 10 Jun 2026
https://github.com/ecklf/reddit-clawler
A command-line tool written in Rust that crawls Reddit posts from a user or subreddit
cli crawler downloader downloader-for-reddit reddit
Last synced: 31 Mar 2025
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 16 May 2026
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 07 Jun 2026
https://github.com/rowyio/llm-web-crawler
Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.
ai automation crawler llm lowcode nocode scraper web web-crawler workflow
Last synced: 15 Jul 2025
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 10 Jun 2026
https://github.com/alphadev3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 26 Dec 2025
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 28 May 2026
https://github.com/crosscutsaw/iscsicrawler
iscsicrawler is a bash script that crawls files in the iscsi targets with ease.
crawler iscsi iscsi-target iscsiadm
Last synced: 16 Jan 2026
https://github.com/d7isme/pixiv-downloader-mod
Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.
chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader
Last synced: 14 May 2026
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 16 May 2026
https://github.com/isaqueveras/scrape-google-results
Scrape Google Results in Golang
crawler golang google scraper webcrawler
Last synced: 21 Mar 2025
https://github.com/seanghay/wpget
⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API
Last synced: 08 Feb 2026
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 22 Jul 2025
https://github.com/balintpethe/laravel-universal-scraper
Universal Scraper for Laravel
crawler laravel scraper web-scraper
Last synced: 13 Jan 2026
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 16 Jul 2025
https://github.com/moparisthebest/nginx-limit-crawlers
rate limit crawlers in nginx
Last synced: 14 Mar 2025
https://github.com/phatpham9/scraper.fun
Building, using & sharing HTML scraper are way funnier!
Last synced: 24 Mar 2025
https://github.com/longluo/spider
My Python Spider / Crawler
crawler python spider twitter weibo weibo-crawler weibo-spider
Last synced: 11 Jun 2025
https://github.com/blarc/windsurf-crawler
A simple crawler that collects windsurf boards offers from different sites.
Last synced: 10 Sep 2025
https://github.com/billy0402/python-application
A learning project from the book 'Python 技術者們'.
course crawler matplotlib opencv pandas python requests selenium sklearn
Last synced: 12 Apr 2026
https://github.com/mattmoony/webcrawler.py
A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍
beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler
Last synced: 29 Apr 2026
https://github.com/victorbaumgartner/electron_app
Testing electron app for macOS
crawl4ai crawler electron mac multithreading python3 scraping sitemaps
Last synced: 06 May 2026
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 06 Jan 2026
https://github.com/is0383kk/ai-docs-sync-workflow
claude claude-code crawler github-actions python python-script workflow
Last synced: 16 May 2026
https://github.com/jonesrussell/pipelinex
Firecrawl-style web intelligence pipeline powered by North Cloud
Last synced: 09 Mar 2026
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 09 Aug 2025
https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler
StackOverFlow Tag Generator Using a WebCrawler.
Last synced: 08 Apr 2025
https://github.com/eklem/vinmonopolet-crawler
Crawling Vinmonopolet-data and indexing it to a norch search index
crawler dataset javascript norch search-engine
Last synced: 26 Mar 2025
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 06 Apr 2025
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 18 Jun 2026
https://github.com/ronierisonmaciel/crawler
Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.
beautifulsoup4 crawler crawling python python3
Last synced: 26 Mar 2025
https://github.com/smikodanic/dex8-sdk
DEX8 SDK is software development kit for DEX8.com platform.
crawler crawler-engine data-extraction dex8 scraper scraping-websites spider
Last synced: 11 Jul 2025
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 19 May 2026
https://github.com/jiusanzhou/reaper
Distributed Elegant Scraper and Crawler Framework for Rust.
crawler data-scraping rust scraper spider
Last synced: 24 Jul 2025
https://github.com/burakkaygusuz/web-security-scanner
A Java-based web security browser, it detects common web vulnerabilities such as SQL Injection, XSS and sensitive information disclosure.
crawler java vulnerability-scanner web-security xss
Last synced: 16 May 2026
https://github.com/lilchen96/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 28 Dec 2025
https://github.com/alphabs/navercafeclient
네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리
crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping
Last synced: 06 May 2026
https://github.com/allancapistrano/steam.py
An API wrapper for Steam written in Python.
Last synced: 16 Mar 2025
https://github.com/engineer2b/cure_crawl
Cure afvalbeheer kalender crawler
afval afvalwijzer browser crawler kalender
Last synced: 22 Oct 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 12 Jun 2026
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 07 Jul 2025
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 06 May 2026
https://github.com/bradsec/gomine
A Go CLI tool to quickly crawl and mine (download) specific file types from websites.
cli crawler golang terminal-based
Last synced: 09 Apr 2025
https://github.com/fusetim/bitcrawler
Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?
Last synced: 30 Mar 2025
https://github.com/patrickschababerle/schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.
crawler puppeteer scraper scraping web-crawler
Last synced: 04 Apr 2025
https://github.com/chenbingwei1201/threads_scraper
A Python package for scraping Threads posts.
chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites
Last synced: 03 Feb 2026
https://github.com/diegojromerolopez/relwrac
A basic crawler developed with python and asyncio
asyncio crawler page-rank python
Last synced: 11 Nov 2025