An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 01 Apr 2025

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 10 Apr 2025

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 10 Apr 2025

https://github.com/allanbian1017/mbpprice

二手Macbook Pro資訊

crawler python

Last synced: 14 Jan 2026

https://github.com/mehdieidi/offliner

Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.

concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread

Last synced: 14 Jan 2026

https://github.com/heitor57/astronomy-news

:telescope::newspaper: Astronomy News

crawler data-science news text-mining

Last synced: 06 Oct 2025

https://github.com/b3j4y/unidisk

A Crawler to search for keywords and compare the score

comparison crawler nlp solr-client

Last synced: 17 Jan 2026

https://github.com/semoal/pythoncrawler

Python crawler with XMLRPC & BeautifulSoap

beautifulsoup crawler python wordpress xmlrpc

Last synced: 15 Apr 2026

https://github.com/heyihuang826/ncku_course

Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).

captcha crawler onedrive

Last synced: 01 Mar 2026

https://github.com/nyarla/net-paranoid-go

(WIP) A paranoidic helpers for untrusted web content crawler

crawler filtering golang helper

Last synced: 14 Jan 2026

https://github.com/btlmd/asahi_nikkei_news_crawler

日本经济新闻、朝日新闻爬虫

crawler

Last synced: 07 Oct 2025

https://github.com/greytabby/grawl

Simple web crawler for learning.

crawler

Last synced: 14 Jan 2026

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 08 Oct 2025

https://github.com/romangw/lukki

Completely free code for a webcrawling bot.

crawler python web-scraping web-scraping-python

Last synced: 08 Oct 2025

https://github.com/killianmeersman/wander

Convenient scraping library for Gophers

crawler data-mining golang scraper spider

Last synced: 14 Jan 2026

https://github.com/bernieyangmh/check-link

Checking through whole website, identifying broken links.

checkurl crawler golang

Last synced: 14 Jan 2026

https://github.com/kyungw00k/stealth-wright

Silent browser automation CLI with stealth capabilities

crawler go playwright stealth-automation

Last synced: 31 May 2026

https://github.com/daitangio/find

Python + SQLite search engine

crawler indexer python search-engine

Last synced: 18 Jan 2026

https://github.com/panagiotisptr/codeforces-companion

A codeforces parser, code tester and testcase generator in Go

codeforces-parser competitions crawler go golang parser test-automation testing

Last synced: 14 Jan 2026

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 09 Oct 2025

https://github.com/dappsar/ethglobal-crawler

A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.

crawler ethglobal python

Last synced: 09 Oct 2025

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 09 Oct 2025

https://github.com/slava-vishnyakov/grucrawler

Simple Ruby crawler

crawler ruby

Last synced: 25 Oct 2025

https://github.com/cafitac/ai-crawler

AI-driven network-first crawler compiler for authorized workflows

agents ai crawler http mcp python scraping

Last synced: 31 May 2026

https://github.com/zrquan/gatherer

Gatherer 是一个简易的爬虫工具

crawler infosec pentest security

Last synced: 14 Jan 2026

https://github.com/ninja-yubaraj/lootbin

A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.

crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin

Last synced: 11 Oct 2025

https://github.com/andreposman/magic-number

A CLI Tool/API to calculate the passive income in FII's

crawler finance golang

Last synced: 14 Jan 2026

https://github.com/katronquillo/grimm

Simple search engine for the Brothers Grimm Fairy Tales

crawler elasticlunr react

Last synced: 24 Apr 2026

https://github.com/yanglr/csharp_spider

Crawler in C#

crawler csharp spider

Last synced: 12 Oct 2025

https://github.com/ignmaro/new

The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.

bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3

Last synced: 13 Oct 2025

https://github.com/hiscaler/fetch-one-page

Fetch one page by configs

crawler golang

Last synced: 06 Nov 2025

https://github.com/mizcausevic-dev/procurement-pulse-engine

The crawl + aggregate engine behind the AI Procurement Pulse. Probes a universe of vendor domains for the 11 Kinetic Gain Protocol Suite documents and produces the quarterly issue dataset. Issue #1: the zero baseline.

ai-governance ai-procurement-pulse crawler data-journalism javascript kinetic-gain-protocol-suite procurement research well-known

Last synced: 01 Jun 2026

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 15 Oct 2025

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 15 Oct 2025

https://github.com/mizcausevic-dev/aeo-crawler

BFS crawler for AEO Protocol v0.1 declaration graphs. Seed an origin, follow primary_source URIs, emit JSON Lines records of every fetch. Built on aeo-sdk-go. Concurrent, depth-limited, budget-capped, stdlib-only HTTP.

aeo aeo-protocol ai-governance answer-engine-optimization crawler entity-graph go-cli golang kinetic-gain-protocol-suite protocol-implementation well-known

Last synced: 01 Jun 2026

https://github.com/marshalw/crawler

爬虫项目

crawler javascript nodejs

Last synced: 22 Jan 2026

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 16 Oct 2025

https://github.com/foolishway/blog-crawler

blog-crawler crawl blogs by your configuration file.

blogs config crawler

Last synced: 22 Jan 2026

https://github.com/asmrcodez-yt/google-extensions-scraper

🚀 Download free and open-source Chrome extensions for web scraping! Extract data from various websites effortlessly with our latest .crx releases.

chrom codez crawler extension free linkedin omid opensource scraper thecodez web-scraper

Last synced: 17 Oct 2025

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 19 Oct 2025

https://github.com/bersegosx/exparic

Web parser via yaml config

crawler parser yaml-configuration

Last synced: 21 Oct 2025

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 01 Jun 2026

https://github.com/guillempuche/news_crawler

Scrape news from Olot town hall (https://www.olot.cat) with TypeScript and Crawlee. Collects summaries and full articles, stored in separate datasets.

biomejs crawlee crawler news-crawler olot townhall yarn-berry

Last synced: 23 Oct 2025

https://github.com/obsidianplusplus/tensorrt-python-api-crawler

用于抓取 NVIDIA TensorRT Python API 文档并转换为 Markdown 格式的 Python 爬虫 | Python crawler for scraping NVIDIA TensorRT Python API documentation and converting it to Markdown format.

api base converter crawler deep docs documentation gpt knowledge learning llm markdown nvidia offline python scraper scraping tensorrt web

Last synced: 14 May 2026

https://github.com/rutopio/crawler-2020-taiwanese-election-results

2020 台灣選舉結果爬蟲:以不分區政黨票為例

crawler python

Last synced: 24 Oct 2025

https://github.com/xatier/metart-streamlit

Metart network viewer with streamlit 💦🍑💡

crawler streamlit streamlit-webapp

Last synced: 23 Jan 2026

https://github.com/abhijeetps/noddler

Web Crawler build using NodeJS

cheerio crawler csv nodejs

Last synced: 24 Feb 2026

https://github.com/ashwantmanikoth/IntellilSearch

This is a AI powered crawler that can search the web for information based on your input.

crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation

Last synced: 25 Oct 2025

https://github.com/0xh3xa/benign-crawler

Crawler for downloading benign files from FileHippo and other sources

benign crawler datasets downloader malware-research

Last synced: 26 Oct 2025

https://github.com/recepkizilarslan/console-tourist

Tourist is a simple tool that allows you to collect console messages, errors, unsuccessful requests of all your pages after the DOM loading with authentication support.

console-log crawler crawling crawling-tool error-monitoring error-reporting qa qa-automation qatools

Last synced: 24 Feb 2026

https://github.com/chamzzzzzz/supersimplesoup

a go package implements a super simple soup like DOM API

beatifulsoup crawler crawler-go dom go golang html-parser

Last synced: 28 Jan 2026

https://github.com/gn00678465/crawler

使用 Firecrawl API 的 Python CLI 工具,支援多種輸出格式的網頁爬取。

crawler pythone

Last synced: 06 Feb 2026

https://github.com/danielfillol/ab2l_crawler

Crawler for AB2L radar

brazil crawler lawtech legaltech

Last synced: 28 Jan 2026

https://github.com/russellsteadman/netscrape

A Node.js framework for creating good bots

bot crawler crawling exclusion rfc9309 scraper scraping web-scraping

Last synced: 20 Jun 2026

https://github.com/madret/selenium_crawler

Selenium Webcrawler based on the chromedriver.

chromedriver crawler human-like selenium selenium-webdriver webcrawler

Last synced: 15 Apr 2026

https://github.com/miiraak/scrapc

C# WinForms - Crawler & Scraper Web content

crawler csharp html scraper url web windows-forms

Last synced: 29 Jan 2026

https://github.com/atasoglu/websense

A modular AI-powered web scraper for data pipelines.

ai automation crawler data-extraction llm parsing scraper structured-output web-scraping

Last synced: 31 Jan 2026

https://github.com/gustavooferreira/wcrawler

Simple Web Crawler CLI tool with "minimal" dependencies

cli crawler golang graph html links web

Last synced: 31 Jan 2026

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 31 Jan 2026

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 31 Jan 2026

https://github.com/ashwantmanikoth/intellilsearch

This is a AI powered crawler that can search the web for information based on your input.

crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation

Last synced: 15 Apr 2026

https://github.com/lucasromualdo/glassdoorcrawler

Crawler em Python para coletar vagas do Glassdoor e exportar para Excel

cli crawler glassdoor openpyxl pandas python web-scraping

Last synced: 25 Feb 2026

https://github.com/guanbinrui/img-crawler

A image crawler.

crawler

Last synced: 10 Feb 2026

https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb

Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.

crawler fastapi google-scholar mongodb python selenium

Last synced: 16 Apr 2026

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 11 Feb 2026

https://github.com/basemax/github-repos-report-generator

A Python CLI tool to fetch all public repositories of a GitHub user, extracting repository details such as name, URL, description, top language, and tags. Outputs data in CSV, JSON, and HTML formats.

api api-github crawler csv export extract github github-api github-export github-exporter github-info html json py python

Last synced: 16 Apr 2026

https://github.com/webdevcave/directory-crawler-php

Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.

crawler crawling directory path php php-library

Last synced: 12 Feb 2026

https://github.com/mt4110/postal_converter_ja

High-performance Japanese Postal Code Converter & API. Auto-updating, DB-agnostic (MySQL/PostgreSQL), written in Rust & Next.js.日本郵便局のデータを自動更新機能付き、Rustの非同期クローリングシステム。最加速で最新の郵便番号データの更新化がされます。

api crawler docker mysql nextjs nix postgresql react rust

Last synced: 13 Feb 2026

https://github.com/yggverse/pulsarss

RSS Aggregator for Gemini Protocol

aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust

Last synced: 13 Feb 2026

https://github.com/shivamsaraswat/webxcrawler

WebXCrawler is a fast static crawler to crawl a website and get all the links.

crawler crawling python scraping webcrawler webxcrawler

Last synced: 13 Feb 2026

https://github.com/solracsf/perplexitybot-ips

Collected PerplexityBot IPs

bots crawler ip ipset perplexity

Last synced: 15 Feb 2026

https://github.com/luanpotter/series-api

A simple IMDB crawler feeding a Series API

api crawler imdb json rest series

Last synced: 15 Feb 2026

https://github.com/igorbrizack/crawler-web

Aplicação de coleta de dados Web com ReactJS e Python - API Rest

beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper

Last synced: 16 Apr 2026

https://github.com/nsalvacao/cli-plugins

OpenAPI for CLIs — Crawl any CLI's --help output and generate structured Claude Code plugins with expert command knowledge

ai-agent claude-code cli cli-reference crawler developer-tools help-parser llm plugin python

Last synced: 04 Mar 2026

https://github.com/metehan777/http-header-link-graph

Publish a site's link graph & heading map in HTTP response headers. Crawl 65k pages in 99 seconds without parsing one byte of HTML. Companion code for the SEO Week 2026 NYC experiment.

aeo answer-engine-optimization cloudflare-workers crawler generative-engine-optimization geo http-headers link-graph python rust seo site-architecture technical-seo

Last synced: 03 Jun 2026

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 16 Apr 2026

https://github.com/olostep-api/olostep-cli

CLI for the Olostep API — scrape, map, crawl, answer, batch the web from your terminal. Pure JS rewrite of olostep-cli.

ai-agents cli crawler mcp nodejs npm olostep scraping typescript web-scraping

Last synced: 03 Jun 2026

https://github.com/landrisek/contentbot

Create simple content (discussion posts and products description) from previously used data or crawl them from public data.

content crawler golang php php72

Last synced: 17 Apr 2026

https://github.com/rodrigorvsn/ace

🔥 Receiving an email of hottest promotions every day

crawler cronjob nextjs prisma puppeteer react-email resend

Last synced: 17 Apr 2026

https://github.com/bennettdams/vace-it-crawler

Python (Scrapy) crawler to access data of FACEIT.com

crawler python scrapy

Last synced: 03 Jun 2026

https://github.com/triekai/review-radar

An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.

crawler firebase gemini google-maps nextjs openai pwa react

Last synced: 04 Apr 2026

https://github.com/theabbie/shopcrawler

Crawler for Discovering Product URLs on E-commerce Websites (assignment)

crawler

Last synced: 18 Apr 2026

https://github.com/thamindur/ir-project

Search Engine for Sri Lankan MPs

crawler elasticsearch python scraping search-engine

Last synced: 19 Apr 2026

https://github.com/gesiscss/github_traffic_crawler

Retrieve the data information from the repositories (insight, usage, commits)

crawler github traffic

Last synced: 20 Apr 2026

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 20 Apr 2026

https://github.com/ravenastar-js/ravpagelinks

🚀 RavPageLinks 🕷️ Ferramenta básica de Enumeração de URLs em Páginas Web

axios chalk crawler links playwright ravenastar scraping url-enumeration

Last synced: 20 Apr 2026

https://github.com/brianbruggeman/vax

A vaccination signup tool

covid-19 crawler signup vaccination

Last synced: 21 Apr 2026