An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/heyihuang826/ncku_course

Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).

captcha crawler onedrive

Last synced: 01 Mar 2026

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 13 Apr 2026

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 24 Mar 2025

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 11 Apr 2026

https://github.com/whateverzpy/douban_comments

HITSZ 2025 秋季的大数据导论课程作业内容

bigdata crawler scrapy

Last synced: 01 Oct 2025

https://github.com/eneax/web-crawler

A web crawler built in Node.js

crawler javascript nodejs web-crawler

Last synced: 15 Apr 2026

https://github.com/solracsf/perplexitybot-ips

Collected PerplexityBot IPs

bots crawler ip ipset perplexity

Last synced: 15 Feb 2026

https://github.com/phanletrunghieu/webcrawler

A web crawler with Spring MVC

crawler java servlet spring-mvc springframework

Last synced: 23 Mar 2025

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 29 Oct 2025

https://github.com/nyarla/net-paranoid-go

(WIP) A paranoidic helpers for untrusted web content crawler

crawler filtering golang helper

Last synced: 14 Jan 2026

https://github.com/btlmd/asahi_nikkei_news_crawler

日本经济新闻、朝日新闻爬虫

crawler

Last synced: 07 Oct 2025

https://github.com/greytabby/grawl

Simple web crawler for learning.

crawler

Last synced: 14 Jan 2026

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 22 Jun 2025

https://github.com/mccranky83/aistudy-docs-crawler

上海市中小学数字教学系统爬虫

crawler hoarding puppeteer

Last synced: 07 Apr 2025

https://github.com/wilmsn/simple_deye_crawler

A simple crawler to get data from the Deye Inverter using the status webpage

crawler deye fhem inverter shell-script

Last synced: 27 May 2026

https://github.com/weizujie/python3-spider

Python 写的一些爬虫小脚本

crawler python3

Last synced: 18 May 2026

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 23 Mar 2025

https://github.com/seart-group/github-keyword-crawler

A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints

api-mining crawler dockerized github-api miner mongodb-database python-script

Last synced: 04 Aug 2025

https://github.com/moj124/web_crawler

The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.

crawler crawler-python links-spider

Last synced: 13 Mar 2025

https://github.com/moparisthebest/nginx-limit-crawlers

rate limit crawlers in nginx

ai crawler nginx

Last synced: 14 Mar 2025

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 02 Feb 2026

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 08 Oct 2025

https://github.com/romangw/lukki

Completely free code for a webcrawling bot.

crawler python web-scraping web-scraping-python

Last synced: 08 Oct 2025

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 27 Mar 2025

https://github.com/killianmeersman/wander

Convenient scraping library for Gophers

crawler data-mining golang scraper spider

Last synced: 14 Jan 2026

https://github.com/yaoshanliang/linkedinspider

Crawl job information from LinkedIn for data analysis

big-data crawler python social-network-analysis

Last synced: 30 Mar 2025

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 31 Jan 2026

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 07 Jun 2026

https://github.com/bernieyangmh/check-link

Checking through whole website, identifying broken links.

checkurl crawler golang

Last synced: 14 Jan 2026

https://github.com/taleblou/brokenlinkchecker_python

This Python web crawler traverses a website, verifies resource links (CSS, JS, images, videos, iframes), and identifies broken links with HTTP errors (400-599)

crawler http links python resources website

Last synced: 03 Apr 2025

https://github.com/ryoii/hook

A declarative Java crawler framework

crawler declarative java java-crawler-framework jdk11

Last synced: 18 Mar 2025

https://github.com/s3rgeym/wscrap

Command line web scraping tool.

crawler scraping

Last synced: 09 Apr 2025

https://github.com/kyungw00k/stealth-wright

Silent browser automation CLI with stealth capabilities

crawler go playwright stealth-automation

Last synced: 31 May 2026

https://github.com/daitangio/find

Python + SQLite search engine

crawler indexer python search-engine

Last synced: 18 Jan 2026

https://github.com/roele/roast

A JVM Data Crawler

cli crawler jvm

Last synced: 16 May 2025

https://github.com/panagiotisptr/codeforces-companion

A codeforces parser, code tester and testcase generator in Go

codeforces-parser competitions crawler go golang parser test-automation testing

Last synced: 14 Jan 2026

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 09 Oct 2025

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 23 Mar 2025

https://github.com/dappsar/ethglobal-crawler

A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.

crawler ethglobal python

Last synced: 09 Oct 2025

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 09 Oct 2025

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 13 May 2026

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 11 May 2025

https://github.com/jiusanzhou/reaper

Distributed Elegant Scraper and Crawler Framework for Rust.

crawler data-scraping rust scraper spider

Last synced: 24 Jul 2025

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 15 Mar 2025

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 23 Jul 2025

https://github.com/xyk2002/aqistudy-crawler

关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件

aqistudy crawler python-3

Last synced: 22 Aug 2025

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 16 Jan 2026

https://github.com/zigai/crawlwright

Web crawling framework powered by Playwright

crawler crawling playwright python scraping wrighter

Last synced: 18 May 2026

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 15 Jul 2025

https://github.com/luanpotter/series-api

A simple IMDB crawler feeding a Series API

api crawler imdb json rest series

Last synced: 15 Feb 2026

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 29 Mar 2025

https://github.com/slava-vishnyakov/grucrawler

Simple Ruby crawler

crawler ruby

Last synced: 25 Oct 2025

https://github.com/cafitac/ai-crawler

AI-driven network-first crawler compiler for authorized workflows

agents ai crawler http mcp python scraping

Last synced: 31 May 2026

https://github.com/zrquan/gatherer

Gatherer 是一个简易的爬虫工具

crawler infosec pentest security

Last synced: 14 Jan 2026

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 06 Apr 2025

https://github.com/huakunshen/cron-crawler-template

Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.

crawler github-actions python

Last synced: 15 May 2026

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 31 Mar 2025

https://github.com/humbertodias/go-nie-crawler

Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.

crawler golang

Last synced: 03 Mar 2025

https://github.com/ninja-yubaraj/lootbin

A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.

crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin

Last synced: 11 Oct 2025

https://github.com/andreposman/magic-number

A CLI Tool/API to calculate the passive income in FII's

crawler finance golang

Last synced: 14 Jan 2026

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 02 Aug 2025

https://github.com/radityaharya/sitesweeper

Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file

crawler pdf python website-crawler

Last synced: 27 Mar 2025

https://github.com/katronquillo/grimm

Simple search engine for the Brothers Grimm Fairy Tales

crawler elasticlunr react

Last synced: 24 Apr 2026

https://github.com/prorobot-ai/worker

A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).

crawler golang grpc-server scraper

Last synced: 29 Jul 2025

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 09 Aug 2025

https://github.com/hackthedev/botnet

Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!

botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web

Last synced: 17 Mar 2025

https://github.com/yanglr/csharp_spider

Crawler in C#

crawler csharp spider

Last synced: 12 Oct 2025

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 04 Aug 2025

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 27 Mar 2025

https://github.com/m-taghizadeh/persian_question_answering_voice2voice_ai

This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.

artificial-intelligence corpus-linguistics crawler deep-learning farsi farsi-datasets large-language-models machine-learning natural-language-processing persian python question-answering scraping-python speech-to-text text-to-speech transformer-architecture word2vec

Last synced: 04 May 2026

https://github.com/ignmaro/new

The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.

bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3

Last synced: 13 Oct 2025

https://github.com/mawkler/go-web-crawler

Toy web server written in Go

crawler go

Last synced: 15 Aug 2025

https://github.com/daviddavo/blogspot-crawler

Crawler for blogspot and blogger with beautifulsoup

crawler hacktoberfest python

Last synced: 19 Apr 2026

https://github.com/kimseogyu/crawling-music-ranks

음원순위 크롤링 코드

crawler jest typescript

Last synced: 07 Apr 2025

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 01 May 2026

https://github.com/marcosvbras/twitton

A simple Python library to make Twitter Search API easily to use

crawler crawling python spider twitter twitter-api

Last synced: 27 Mar 2025

https://github.com/xjchenhao/crawler-hangzhou

杭州网的新闻爬虫

crawler hangzhou node

Last synced: 21 Feb 2026

https://github.com/hiscaler/fetch-one-page

Fetch one page by configs

crawler golang

Last synced: 06 Nov 2025

https://github.com/atasoglu/websense

A modular AI-powered web scraper for data pipelines.

ai automation crawler data-extraction llm parsing scraper structured-output web-scraping

Last synced: 31 Jan 2026

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 28 Jul 2025

https://github.com/claudio-code/nap-web-crawler

Created It crawler to find broken links in docs of framework and languages

crawler

Last synced: 07 Jul 2025

https://github.com/viktorholk/ranged

A Rust-based web crawler and pattern matcher

crawler regex rust scraper web

Last synced: 30 Mar 2025

https://github.com/kiranjisonawane143/blockchain-data-crawler

🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 06 May 2026

https://github.com/Mahdijamebozorg/CryptoFundamentalAnalyzer

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 25 Sep 2025

https://github.com/jonesrussell/pipelinex

Firecrawl-style web intelligence pipeline powered by North Cloud

crawler pipeline vue

Last synced: 09 Mar 2026

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 10 May 2026