Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/bkdev98/ebooks-crawler
Ebooks crawler for personal purpose using ReactJS.
crawler material-ui nodejs reactjs
Last synced: 12 Apr 2026
https://github.com/jorgeparavicini/medalytik-python
Python crawlers for a job mediation firm
Last synced: 07 Jul 2025
https://github.com/mlibre/clean-web-scraper
A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖
ai artificial-intelligence clean crawler data-preprocessing dataset fine-tuning llm recursive-crawling scraper training
Last synced: 17 Mar 2025
https://github.com/dhchenx/quick-crawler
A toolkit for quickly performing crawler functions
Last synced: 27 Oct 2025
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 28 Oct 2025
https://github.com/amirespahbodi/url_crawler
Async Web Crawler for Website Title and Favicon
crawler fastapi pydantic python3 sqlalchemy
Last synced: 15 Apr 2026
https://github.com/citiususc/polypus
Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis
analytics bigdata crawler scraper sentiment-analysis twitter
Last synced: 09 Feb 2026
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 09 Feb 2026
https://github.com/mc256/node-static-webpage-crawler
download entire website with its directory structure.
cache-server crawler nodejs static-site
Last synced: 16 Apr 2026
https://github.com/jongwony/boardgame_finder
나무위키의 보드게임 카테고리를 모두 크롤링해서 특정 필터를 걸기 위한 프로젝트입니다.
asyncio crawler namuwiki python38
Last synced: 27 Feb 2026
https://github.com/khdxsohee/email-miner-pro
EMail Miner Pro is designed specifically for professionals scraping data from search engines like Google, ensuring that generic emails (e.g., Gmail, Yahoo) are correctly linked to their business websites found on the page.
chrome crawler crawling email email-extractor extension-chrome lead-generation miner scraper
Last synced: 03 Feb 2026
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 08 Mar 2026
https://github.com/phatpham9/scraper.fun
Building, using & sharing HTML scraper are way funnier!
Last synced: 24 Mar 2025
https://github.com/linjonh/videowebsidesparser
This Project is used to parse a video web side to remove ads.
Last synced: 13 Jun 2025
https://github.com/joaooliveirapro/trawlergo
TrawlerGo 🐛 is a basic HTTP crawler written in Go, designed to efficiently discover all URLs within a specified domain while capturing related HTTP request information.
Last synced: 09 Jun 2026
https://github.com/danielemoraschi/go-sitemap-app
crawler golang sitemap sitemap-generator
Last synced: 29 Apr 2026
https://github.com/danielemoraschi/sitemap-common
Simple PHP Sitemap generator and crawler library.
crawler php php-library php-sitemap-generator sitemap
Last synced: 11 Mar 2026
https://github.com/blarc/windsurf-crawler
A simple crawler that collects windsurf boards offers from different sites.
Last synced: 10 Sep 2025
https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen
Fetch Keskisuomalainen kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/raspi/scrapy-kuntavaalit2021-sanoma
Fetch Sanoma kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/raspi/scrapy-kuntavaalit2021-almamedia
Fetch Almamedia kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 16 Jul 2025
https://github.com/jonasrenault/pubchem-api-crawler
Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.
chemistry crawler molecular-formula pubchem python
Last synced: 15 May 2026
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 09 Jul 2025
https://github.com/basemax/crawler-news-currency-gold-coins
PHP Crawler to get Persian news related to currency coin and gold.
crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler
Last synced: 05 Jul 2025
https://github.com/der3318/daily-pixiv
Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations
crawler line-notify pixiv workflow
Last synced: 03 Mar 2025
https://github.com/peterbencze/silene
Silene is an open source web crawler framework built upon Pyppeteer.
crawler framework pypp python scraper webcrawler
Last synced: 12 Jan 2026
https://github.com/shentengtu/cht-yp-crawler
Simple Crawler of www.iyp.com.tw.
crawler node-js nodejs yellow-pages yellowpages
Last synced: 09 May 2026
https://github.com/balintpethe/laravel-universal-scraper
Universal Scraper for Laravel
crawler laravel scraper web-scraper
Last synced: 13 Jan 2026
https://github.com/seanghay/wpget
⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API
Last synced: 08 Feb 2026
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 16 May 2026
https://github.com/lolyratul025/web-email-bundler
A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.
crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping
Last synced: 22 May 2026
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 19 Apr 2026
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 28 May 2026
https://github.com/massongit/ibaraki-univ-circle-crawler
Crawls official circles in Ibaraki University from university's website
Last synced: 25 Mar 2025
https://github.com/w3labkr/ipynb-scraper
A collection of frequently used Jupiter notebook code.
crawler ipynb jupyter jupyter-notebook python scrapper
Last synced: 19 Apr 2026
https://github.com/hvtuananh/twitter_crawler
Daemon to call and get tweets from Twitter Public Stream API
crawler java streaming-api tweets twitter twitter-crawler
Last synced: 11 Mar 2025
https://github.com/cls1991/gank.io-go
A simple crawler for fetching pictures from http://gank.io, implemented in golang.
crawler gankio goquery pictures
Last synced: 27 Feb 2025
https://github.com/patrik-fredon/python_wallpaper_crawler
Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.
crawler crawling-python image image-recognition images python scraping-websites scrapper selenium-python uv
Last synced: 14 Sep 2025
https://github.com/atasoglu/websense
A modular AI-powered web scraper for data pipelines.
ai automation crawler data-extraction llm parsing scraper structured-output web-scraping
Last synced: 31 Jan 2026
https://github.com/rowyio/llm-web-crawler
Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.
ai automation crawler llm lowcode nocode scraper web web-crawler workflow
Last synced: 15 Jul 2025
https://github.com/ericc-ch/crawldown
Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.
Last synced: 05 Jul 2025
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 02 Aug 2025
https://github.com/matheusfaustino/jazzmaster_crawler
It is a crawling for getting the audio programs from a specific radio program called Jazzmaster
Last synced: 14 Jun 2025
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 23 Feb 2025
https://github.com/ecklf/reddit-clawler
A command-line tool written in Rust that crawls Reddit posts from a user or subreddit
cli crawler downloader downloader-for-reddit reddit
Last synced: 31 Mar 2025
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 16 May 2026
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 12 Apr 2026
https://github.com/jlenon7/sef_automation
📑 Crawler that automatically enrol in open vacancies in SEF website.
athenna crawler esm nodejs playwright portugal residence sef typescript
Last synced: 03 Mar 2026
https://github.com/intina47/ee_error
implementation of a web crawler using c++
cpp crawler curl gumbo libcurl stanford-nlp web
Last synced: 31 Jan 2026
https://github.com/luminovrym/crawler-tools-js
Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web
crawler crawler-js data js web-scraping
Last synced: 08 Sep 2025
https://github.com/rsheremeta/web-crawler
A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output
crawler go golang web-crawler webcrawler
Last synced: 12 Jun 2026
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 17 May 2026
https://github.com/dubniczky/bad-robot
This is a python crawler that disregards robots.txt rules and downloads disallowed resources
crawler osint-python osint-tool python robots-txt
Last synced: 31 Mar 2025
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 02 Apr 2025
https://github.com/dubniczky/webmap
Website mapping crawler implemented in python
crawler mapping mapping-tools package python scraping security
Last synced: 31 Mar 2025
https://github.com/sedrubal/webcrawler
Crawl sites and search for security issues.
crawler script security website-auditing
Last synced: 17 Mar 2025
https://github.com/basemax/okala-store-ids
A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.
crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala
Last synced: 10 Jun 2025
https://github.com/Mahdijamebozorg/CryptoFundamentalAnalyzer
An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.
crawler crypto cryptocurrency data-mining datamining information-retrieval llm python
Last synced: 25 Sep 2025
https://github.com/earelin/jwraith
A Java clone of the Wraith website comparison tool.
crawler screenshots screenshots-comparison selenium webtest
Last synced: 17 May 2026
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 19 Jul 2025
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 12 Apr 2025
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 28 Mar 2025
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 13 Jun 2026
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 31 Jan 2026
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 13 May 2026
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 24 Jun 2025
https://github.com/pranavj1001/webcrawler
A simple Web Crawler
crawler java javascript nodejs web-crawler
Last synced: 11 May 2026
https://github.com/mnoalett/cscrawler
BSc degree thesis - crawler for www.couchsurfing.org
bsc-thesis couchsurfing crawler data-analysis database python
Last synced: 02 May 2026
https://github.com/suconghou/sitemap
a simple sitemap generator and page crawler
crawler html-parser nim-lang scraper sitemap spiders
Last synced: 15 May 2026
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 09 Apr 2025
https://github.com/dappros/site_crawler
Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.
crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing
Last synced: 20 Jan 2026
https://github.com/imrany/spindle
An open-source, lightweight web crawler and scraper. It can discover links on the web (crawler) and extract structured data from webpages (scraper).
Last synced: 24 Sep 2025
https://github.com/surister/scrupy
Python library to create web Crawlers which aims to be powerful yet simple.
crawler crawling-framework crawling-python http library python scraping
Last synced: 15 May 2026
https://github.com/andresayac/cuevana3
Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.
Last synced: 05 Apr 2025
https://github.com/ismoreirakt/spyder
The web is changing. Spyder sees it.
alerts automation crawler monitor
Last synced: 01 Mar 2025
https://github.com/mnemocron/VPNNetworkShareCrawler
ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it
Last synced: 11 Mar 2025
https://github.com/ryoii/hook
A declarative Java crawler framework
crawler declarative java java-crawler-framework jdk11
Last synced: 18 Mar 2025
https://github.com/precioux/pacman
AI Course Projects - Fall 2022
adversial artificial-intelligence bfs-search crawler csp dfs mdp pacman-agent pacman-game pacman-projects reinforcement-learning ucs
Last synced: 28 May 2026