An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/tjdsneto/jcnet-crawler

Extract (scrap) movie schedule info from JCNet movies page

crawler scraping

Last synced: 11 Apr 2026

https://github.com/heyihuang826/ncku_course

Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).

captcha crawler onedrive

Last synced: 01 Mar 2026

https://github.com/tetreum/xupopter_chrome_extension

Extension to easily create crawling recipes

crawler scrapper scrapping webscraper

Last synced: 04 Apr 2025

https://github.com/thesurlydev/surly-spider

A command line interface for the spider library

crawl crawler rust spider surly surly-spider

Last synced: 16 Feb 2026

https://github.com/billy0402/tibame-python-data-analysis

A learning project from TibaMe Python data analysis course.

ai course crawler jupyter-notebook matplotlib pandas python requests

Last synced: 10 Apr 2026

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 16 May 2025

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 09 Aug 2025

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均

crawler nodejs

Last synced: 29 Apr 2026

https://github.com/aweirddev/air-web

A lightweight package for crawling the web with the minimalist of code.

crawl crawler markdown scrape scraper web

Last synced: 25 Jan 2026

https://github.com/hong539/ip_lookup

For ip_lookup with some Public or Private API

crawler ipv4 ipwhois python

Last synced: 19 Aug 2025

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 13 Apr 2026

https://github.com/nyarla/net-paranoid-go

(WIP) A paranoidic helpers for untrusted web content crawler

crawler filtering golang helper

Last synced: 14 Jan 2026

https://github.com/btlmd/asahi_nikkei_news_crawler

日本经济新闻、朝日新闻爬虫

crawler

Last synced: 07 Oct 2025

https://github.com/greytabby/grawl

Simple web crawler for learning.

crawler

Last synced: 14 Jan 2026

https://github.com/jonesrussell/pipelinex

Firecrawl-style web intelligence pipeline powered by North Cloud

crawler pipeline vue

Last synced: 09 Mar 2026

https://github.com/abx123/crawler

Simple lambda function to crawl daily web novel updates.

crawler firebase-database golang lambda-functions

Last synced: 28 Mar 2025

https://github.com/luickk/vulnerability-crawler

Small python program meant to analyze random sites found on google for any vulnerabilities!

crawler xss

Last synced: 20 Aug 2025

https://github.com/atasoglu/websense

A modular AI-powered web scraper for data pipelines.

ai automation crawler data-extraction llm parsing scraper structured-output web-scraping

Last synced: 31 Jan 2026

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 16 May 2026

https://github.com/mohammadreza-mohammadi94/python-webscraper-projects

A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.

bs4 crawler object-oriented-programming python requests scrapy webscraping

Last synced: 13 Jul 2025

https://github.com/gustavooferreira/wcrawler

Simple Web Crawler CLI tool with "minimal" dependencies

cli crawler golang graph html links web

Last synced: 31 Jan 2026

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 24 Mar 2025

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 08 Oct 2025

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 20 Mar 2025

https://github.com/romangw/lukki

Completely free code for a webcrawling bot.

crawler python web-scraping web-scraping-python

Last synced: 08 Oct 2025

https://github.com/jayzhan211/python-crawler-startups

python crawler learning

crawler python

Last synced: 20 Mar 2025

https://github.com/killianmeersman/wander

Convenient scraping library for Gophers

crawler data-mining golang scraper spider

Last synced: 14 Jan 2026

https://github.com/kkamara/nodejs-scraper

(2022) Use JavaScript technologies to crawl and click buttons on websites with GUI.

bot crawler nodejs scraper spider

Last synced: 25 Jan 2026

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 31 Jan 2026

https://github.com/tetreum/xupopter_runner

Executes crawling recipes coming from Xupopter Chrome Extension.

crawler scrapper scrapping webscraper

Last synced: 08 Aug 2025

https://github.com/tetreum/xupopter_client

Simple interface to manage Xupopter recipes aswell as it's runners.

crawler scrapper scrapping webscraper

Last synced: 04 Apr 2025

https://github.com/bernieyangmh/check-link

Checking through whole website, identifying broken links.

checkurl crawler golang

Last synced: 14 Jan 2026

https://github.com/moparisthebest/nginx-limit-crawlers

rate limit crawlers in nginx

ai crawler nginx

Last synced: 14 Mar 2025

https://github.com/krishpranav/gozap

⚡️ Multiple target ZAP Scanning made in go

cli crawler go go-crawler golang zap

Last synced: 27 Mar 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 10 Apr 2026

https://github.com/kyungw00k/stealth-wright

Silent browser automation CLI with stealth capabilities

crawler go playwright stealth-automation

Last synced: 31 May 2026

https://github.com/daitangio/find

Python + SQLite search engine

crawler indexer python search-engine

Last synced: 18 Jan 2026

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 11 Apr 2026

https://github.com/panagiotisptr/codeforces-companion

A codeforces parser, code tester and testcase generator in Go

codeforces-parser competitions crawler go golang parser test-automation testing

Last synced: 14 Jan 2026

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 09 Oct 2025

https://github.com/fmind/fincrawl

Crawl documents, metadata, and files from financial institutions

crawler documents finance python scrapy

Last synced: 30 Apr 2026

https://github.com/dappsar/ethglobal-crawler

A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.

crawler ethglobal python

Last synced: 09 Oct 2025

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 27 Mar 2025

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 09 Oct 2025

https://github.com/guanbinrui/img-crawler

A image crawler.

crawler

Last synced: 10 Feb 2026

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 31 Jan 2026

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 22 Aug 2025

https://github.com/ark930/douban-movie-crawler

豆瓣影评爬虫

crawler douban movie python

Last synced: 18 Mar 2025

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 08 Sep 2025

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 13 Sep 2025

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 07 May 2026

https://github.com/pengkobe/my-web-crawler

auto pull blog update from bloggers. dev based on angular2

crawler nodejs

Last synced: 18 May 2026

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 24 Aug 2025

https://github.com/slava-vishnyakov/grucrawler

Simple Ruby crawler

crawler ruby

Last synced: 25 Oct 2025

https://github.com/cafitac/ai-crawler

AI-driven network-first crawler compiler for authorized workflows

agents ai crawler http mcp python scraping

Last synced: 31 May 2026

https://github.com/zrquan/gatherer

Gatherer 是一个简易的爬虫工具

crawler infosec pentest security

Last synced: 14 Jan 2026

https://github.com/onetail/crawler-with-kafka-docker

homework to crawler and anaylsis

analysis crawler kafka-docker

Last synced: 18 Mar 2025

https://github.com/hoosnick/olx-parser

OLX Real Estate Parser

crawler olx

Last synced: 25 Aug 2025

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 23 Mar 2025

https://github.com/jimut123/leaderbehaviour

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

crawler leaderbehaviour newsscraper scrapy timesofindia

Last synced: 16 Jan 2026

https://github.com/waived/google-drive-crawler

Proxy-based crawler to expose public (shared) Google Drive links

crawler crawler-python file-crawler google-drive-api shared-folders web-spider

Last synced: 27 Mar 2025

https://github.com/ninja-yubaraj/lootbin

A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.

crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin

Last synced: 11 Oct 2025

https://github.com/andreposman/magic-number

A CLI Tool/API to calculate the passive income in FII's

crawler finance golang

Last synced: 14 Jan 2026

https://github.com/matheusfaustino/phrawl

Phrawl: A web crawling framework in PHP (or it seems so)

crawler crawling crawling-framework php scraper wip

Last synced: 08 Sep 2025

https://github.com/avsbharadwaj/web_crawler

A basic web crawler that prints out the links and description present on a website rescursively

crawler web

Last synced: 21 Apr 2026

https://github.com/katronquillo/grimm

Simple search engine for the Brothers Grimm Fairy Tales

crawler elasticlunr react

Last synced: 24 Apr 2026

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 15 May 2026

https://github.com/joaooliveirapro/trawlergo

TrawlerGo 🐛 is a basic HTTP crawler written in Go, designed to efficiently discover all URLs within a specified domain while capturing related HTTP request information.

crawler go golang http

Last synced: 09 Jun 2026

https://github.com/ashwantmanikoth/intellilsearch

This is a AI powered crawler that can search the web for information based on your input.

crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation

Last synced: 15 Apr 2026

https://github.com/jonasrenault/pubchem-api-crawler

Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.

chemistry crawler molecular-formula pubchem python

Last synced: 15 May 2026

https://github.com/yanglr/csharp_spider

Crawler in C#

crawler csharp spider

Last synced: 12 Oct 2025

https://github.com/jackfsuia/chats-crawler

Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。

crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser

Last synced: 09 Jul 2025

https://github.com/lucasromualdo/glassdoorcrawler

Crawler em Python para coletar vagas do Glassdoor e exportar para Excel

cli crawler glassdoor openpyxl pandas python web-scraping

Last synced: 25 Feb 2026

https://github.com/timchen10001/crawler-711-taiwan

Crawler for Python to scrapping updated informations of 711

711 crawler python python3 taiwan

Last synced: 27 Mar 2025

https://github.com/ignmaro/new

The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.

bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3

Last synced: 13 Oct 2025

https://github.com/lillyschramm/spiegel.de-miner

A bot that automatically saves any posts created at Spiegel.de

crawler spiegel-online

Last synced: 01 Sep 2025

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 25 Aug 2025

https://github.com/xoraus/revieworacle

The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.

ai crawler datascience machinelearning scrappy selenium-webdriver

Last synced: 07 May 2026

https://github.com/ferru97/jsketchfabcrawler

jSketchfabCrawler is a java for the automatic crawling of model's information from sketchfab.com

crawler data database java sketchfab sql

Last synced: 03 Jan 2026

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 23 Feb 2025

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 02 May 2026

https://github.com/lolyratul025/web-email-bundler

A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.

crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping

Last synced: 22 May 2026

https://github.com/pwcong/zhihuhook

知乎钩子,愿者上钩。

crawler zhihu

Last synced: 08 Dec 2025

https://github.com/mevljas/gov.si-crawler-playwright

A standalone crawler that crawls only .gov.si web sites using Playwright.

crawler multithreading playwright sqlachemy

Last synced: 19 Jan 2026

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 25 Feb 2025

https://github.com/dnknth/robot.py

Simple web spider

crawler curio python

Last synced: 23 Jul 2025

https://github.com/hiscaler/fetch-one-page

Fetch one page by configs

crawler golang

Last synced: 06 Nov 2025

https://github.com/jonesrussell/north-cloud

A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.

content crawler publisher

Last synced: 25 Jan 2026

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 06 Apr 2025

https://github.com/luminovrym/crawler-tools-js

Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web

crawler crawler-js data js web-scraping

Last synced: 08 Sep 2025

https://github.com/crosscutsaw/iscsicrawler

iscsicrawler is a bash script that crawls files in the iscsi targets with ease.

crawler iscsi iscsi-target iscsiadm

Last synced: 16 Jan 2026

https://github.com/liuzhuan/simple-spider

A simple python web spider.

crawler python python-3

Last synced: 30 Mar 2025

https://github.com/constaf79/pycn

🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.

cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web

Last synced: 14 May 2026