Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-07-03 00:06:53 UTC
- JSON Representation
https://github.com/aminehsan/datamining-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scraping
Last synced: 25 Jul 2025
https://github.com/basemax/okala-database-crawler
A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.
crawler crawler-php curl data json okala okala-com okalacom php php-crawler scraper
Last synced: 01 May 2026
https://github.com/b3j4y/unidisk
A Crawler to search for keywords and compare the score
comparison crawler nlp solr-client
Last synced: 17 Jan 2026
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 06 May 2026
https://github.com/microlinkhq/cloudflare-bot-directory
CloudFlare Radar verified bots directory – 500+ web crawlers and user agents as JSON.
bot-detection bots cloudflare cloudflare-radar crawler crawlers dataset datasets googlebot user-agent user-agents user-agents- verified-bots web-crawler web-scraping
Last synced: 20 Apr 2026
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 24 Jul 2025
https://github.com/marcosvbras/twitton
A simple Python library to make Twitter Search API easily to use
crawler crawling python spider twitter twitter-api
Last synced: 27 Mar 2025
https://github.com/claudio-code/nap-web-crawler
Created It crawler to find broken links in docs of framework and languages
Last synced: 07 Jul 2025
https://github.com/semoal/pythoncrawler
Python crawler with XMLRPC & BeautifulSoap
beautifulsoup crawler python wordpress xmlrpc
Last synced: 15 Apr 2026
https://github.com/heyihuang826/ncku_course
Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).
Last synced: 01 Mar 2026
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 07 Jun 2026
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 30 Mar 2025
https://github.com/wilmsn/simple_deye_crawler
A simple crawler to get data from the Deye Inverter using the status webpage
crawler deye fhem inverter shell-script
Last synced: 27 May 2026
https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance
Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics
arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics
Last synced: 28 Apr 2026
https://github.com/kestarumper/imagecrawler
Downloads images from given URL
Last synced: 28 Jun 2025
https://github.com/dmarcosl/upshelf-technical-test
Technical test for Upshelf
crawler interview python scraping scrapy spider technical-test web-scraping
Last synced: 09 Apr 2025
https://github.com/kenanbek/tutorial-python-crawler
Crawling website data using Python with requests and Beautiful Soup libraries
beautifulsoup crawler crawling miner parser python python-requests requests
Last synced: 30 Mar 2025
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 06 Apr 2025
https://github.com/nyarla/net-paranoid-go
(WIP) A paranoidic helpers for untrusted web content crawler
crawler filtering golang helper
Last synced: 14 Jan 2026
https://github.com/xoraus/revieworacle
The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.
ai crawler datascience machinelearning scrappy selenium-webdriver
Last synced: 07 May 2026
https://github.com/lolyratul025/web-email-bundler
A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.
crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping
Last synced: 22 May 2026
https://github.com/hdevlinz/affiliate-chrome-extension
chrome-extension crawler tiktok
Last synced: 14 May 2025
https://github.com/g-ongenae/morphalou-crawler
A Crawler for CNRTL's Morphologie words
crawler french lexical-databases list-of-words words
Last synced: 25 Feb 2025
https://github.com/kettou/silentscraper
SilentScraper is a web scraping solution built with advanced stealth protocols. It operates undetectably in the background, bypassing anti-scraping mechanisms to collect structured data at scale. It's lightwight architecture mimics humans browsing patterns, rotating IP addresses, spoofing user agents, and more
beautifulsoup beautifulsoup4 crawler datastructures datastructures-algorithms python webautomation webscraper webscraping
Last synced: 23 Jul 2025
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 03 Feb 2026
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 08 Oct 2025
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 14 Sep 2025
https://github.com/romangw/lukki
Completely free code for a webcrawling bot.
crawler python web-scraping web-scraping-python
Last synced: 08 Oct 2025
https://github.com/killianmeersman/wander
Convenient scraping library for Gophers
crawler data-mining golang scraper spider
Last synced: 14 Jan 2026
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 22 Jun 2025
https://github.com/webdevcave/directory-crawler-php
Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.
crawler crawling directory path php php-library
Last synced: 12 Feb 2026
https://github.com/ryoii/hook
A declarative Java crawler framework
crawler declarative java java-crawler-framework jdk11
Last synced: 18 Mar 2025
https://github.com/patrik-fredon/python_wallpaper_crawler
Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.
crawler crawling-python image image-recognition images python scraping-websites scrapper selenium-python uv
Last synced: 14 Sep 2025
https://github.com/bernieyangmh/check-link
Checking through whole website, identifying broken links.
Last synced: 14 Jan 2026
https://github.com/peterbencze/silene
Silene is an open source web crawler framework built upon Pyppeteer.
crawler framework pypp python scraper webcrawler
Last synced: 12 Jan 2026
https://github.com/gxjansen/website-to-pdf
Creates a PDF based on the content of a website/subomain
claude-3-sonnet crawler python3
Last synced: 30 Mar 2025
https://github.com/sgeisler/fishbones2epub
fetches the fishbones novel and outputs an epub
Last synced: 22 Mar 2025
https://github.com/kyungw00k/stealth-wright
Silent browser automation CLI with stealth capabilities
crawler go playwright stealth-automation
Last synced: 31 May 2026
https://github.com/daitangio/find
Python + SQLite search engine
crawler indexer python search-engine
Last synced: 18 Jan 2026
https://github.com/bruce-lee-ly/crawler
Several fun crawler cases implemented in Python.
Last synced: 27 Jun 2025
https://github.com/panagiotisptr/codeforces-companion
A codeforces parser, code tester and testcase generator in Go
codeforces-parser competitions crawler go golang parser test-automation testing
Last synced: 14 Jan 2026
https://github.com/namchee/hackerbits
Web Crawler dan Clustering pada website HackerNews.
Last synced: 09 Oct 2025
https://github.com/licoy/win4000-images-crawler
基于scrapy爬取&下载win4000.com的图片壁纸
Last synced: 28 Mar 2025
https://github.com/dappsar/ethglobal-crawler
A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.
Last synced: 09 Oct 2025
https://github.com/mstephen19/apify-click-events
Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to
apify apify-sdk crawler scraper web-automation
Last synced: 23 Aug 2025
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 09 Oct 2025
https://github.com/humbertodias/go-nie-crawler
Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.
Last synced: 03 Jul 2026
https://github.com/nextlevelshit/adonis-crawler
A free web crawler on top of the incredibile AdonisJS Framework
adonisjs crawler javascript nodejs regex spider websocket
Last synced: 22 May 2026
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 20 May 2026
https://github.com/kiranjisonawane143/blockchain-data-crawler
🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 06 May 2026
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 10 May 2026
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 06 May 2026
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 19 May 2026
https://github.com/boatraceventureproject/boatracescraper
The BVP Crawler package for Boatrace.
boatrace crawler php php-library php8
Last synced: 17 Mar 2025
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 06 Apr 2025
https://github.com/zrquan/gatherer
Gatherer 是一个简易的爬虫工具
crawler infosec pentest security
Last synced: 14 Jan 2026
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 22 Jul 2025
https://github.com/isaqueveras/scrape-google-results
Scrape Google Results in Golang
crawler golang google scraper webcrawler
Last synced: 21 Mar 2025
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 10 Jun 2026
https://github.com/ninja-yubaraj/lootbin
A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.
crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin
Last synced: 11 Oct 2025
https://github.com/andreposman/magic-number
A CLI Tool/API to calculate the passive income in FII's
Last synced: 14 Jan 2026
https://github.com/maddevsio/spiderwoman
"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources
big-data count-links crawler golang
Last synced: 19 May 2026
https://github.com/katronquillo/grimm
Simple search engine for the Brothers Grimm Fairy Tales
Last synced: 24 Apr 2026
https://github.com/timzatko/fiit-vinf-1
School project - data crawling, storing using ElasticSearch and visualisation.
Last synced: 16 Jan 2026
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 14 May 2025
https://github.com/jesseokeya/linkedin-scraper
Selenium webDriver used to get information from linkedIn
chromedriver crawler linkedin os python scraper selenium-webdriver
Last synced: 29 Apr 2026
https://github.com/ignmaro/new
The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.
bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3
Last synced: 13 Oct 2025
https://github.com/mevljas/gov.si-crawler-playwright
A standalone crawler that crawls only .gov.si web sites using Playwright.
crawler multithreading playwright sqlachemy
Last synced: 19 Jan 2026
https://github.com/adham90/github_user_crawler
GeekHub: github username crawler
Last synced: 21 Mar 2025
https://github.com/abx123/coronachan
Simple lambda function to crawl MKN twitter account for daily Malaysia COVID-19 updates.
crawler lambda-functions python
Last synced: 28 Mar 2025
https://github.com/m-taghizadeh/persian_question_answering_voice2voice_ai
This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.
artificial-intelligence corpus-linguistics crawler deep-learning farsi farsi-datasets large-language-models machine-learning natural-language-processing persian python question-answering scraping-python speech-to-text text-to-speech transformer-architecture word2vec
Last synced: 04 May 2026
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 18 May 2026
https://github.com/xyk2002/aqistudy-crawler
关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件
Last synced: 22 Aug 2025
https://github.com/mt4110/postal_converter_ja
High-performance Japanese Postal Code Converter & API. Auto-updating, DB-agnostic (MySQL/PostgreSQL), written in Rust & Next.js.日本郵便局のデータを自動更新機能付き、Rustの非同期クローリングシステム。最加速で最新の郵便番号データの更新化がされます。
api crawler docker mysql nextjs nix postgresql react rust
Last synced: 13 Feb 2026
https://github.com/constaf79/pycn
🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.
cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web
Last synced: 14 May 2026
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 11 May 2025