An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/jongwony/boardgame_finder

나무위키의 보드게임 카테고리를 모두 크롤링해서 특정 필터를 걸기 위한 프로젝트입니다.

asyncio crawler namuwiki python38

Last synced: 27 Feb 2026

https://github.com/basemax/my-site-url-finders

A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.

crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder

Last synced: 15 Oct 2025

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 07 Jun 2026

https://github.com/supratikchatterjee16/serp_bot

A generic SERP bot, that can be used with just about any search engine.

bot crawler python requests scraping search serp user-agent-spoofer

Last synced: 14 Dec 2025

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 12 Mar 2025

https://github.com/dimitar0528/crawlitics

An AI-powered Next.js and Python-based ecommerce web crawler, scraper and data-analyst platform that transforms scattered product data into clear market insights.

crawler nextjs product-analysis python scraper

Last synced: 08 Sep 2025

https://github.com/juan-kabbali/glassdoor-linkedin-web-scrapper

CLI application that acts as web scrapper to retrieve Glassdoor and LinkedIn information

crawler webscraping

Last synced: 29 Jan 2026

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 14 Apr 2026

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 19 Oct 2025

https://github.com/fa7ad/aiub-notes-dl

Download all notes from AIUB's portal

aiub beautifulsoup4 crawler

Last synced: 12 Mar 2025

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 21 Mar 2025

https://github.com/igorbrizack/web-scraper

Aplicação de raspagem de dados HTML, construída em python.

crawler pytest python3 scraper

Last synced: 08 May 2026

https://github.com/russellsteadman/netscrape

A Node.js framework for creating good bots

bot crawler crawling exclusion rfc9309 scraper scraping web-scraping

Last synced: 20 Jun 2026

https://github.com/madret/selenium_crawler

Selenium Webcrawler based on the chromedriver.

chromedriver crawler human-like selenium selenium-webdriver webcrawler

Last synced: 15 Apr 2026

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 10 May 2026

https://github.com/miiraak/scrapc

C# WinForms - Crawler & Scraper Web content

crawler csharp html scraper url web windows-forms

Last synced: 29 Jan 2026

https://github.com/aweirddev/air-web

A lightweight package for crawling the web with the minimalist of code.

crawl crawler markdown scrape scraper web

Last synced: 25 Jan 2026

https://github.com/atasoglu/websense

A modular AI-powered web scraper for data pipelines.

ai automation crawler data-extraction llm parsing scraper structured-output web-scraping

Last synced: 31 Jan 2026

https://github.com/gustavooferreira/wcrawler

Simple Web Crawler CLI tool with "minimal" dependencies

cli crawler golang graph html links web

Last synced: 31 Jan 2026

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 31 Jan 2026

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 31 Jan 2026

https://github.com/ashwantmanikoth/intellilsearch

This is a AI powered crawler that can search the web for information based on your input.

crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation

Last synced: 15 Apr 2026

https://github.com/lucasromualdo/glassdoorcrawler

Crawler em Python para coletar vagas do Glassdoor e exportar para Excel

cli crawler glassdoor openpyxl pandas python web-scraping

Last synced: 25 Feb 2026

https://github.com/mevljas/gov.si-crawler-playwright

A standalone crawler that crawls only .gov.si web sites using Playwright.

crawler multithreading playwright sqlachemy

Last synced: 19 Jan 2026

https://github.com/constaf79/pycn

🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.

cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web

Last synced: 14 May 2026

https://github.com/dasantonym/node-cesspoll

:poop: Turd Miner Node Module

crawler news poopetry potty-humour

Last synced: 28 Oct 2025

https://github.com/huyduc1602/uniapp-crawler

Crawl và Dịch tài liệu Uni-app

crawler docker python

Last synced: 25 Jan 2026

https://github.com/martincastroalvarez/web-to-pdf

Web crawlers using Python & Beautiful Soup

crawler python3 webcrawler

Last synced: 08 Apr 2025

https://github.com/zenixls2/2chpreprocess

Dump messages from 2ch with some preprocessing for ML analysis

2ch crawler python

Last synced: 26 Mar 2025

https://github.com/forattini-dev/crawlex

The stealth crawler that actually looks like Chrome.

crawler stealth

Last synced: 14 May 2026

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 17 Nov 2025

https://github.com/xjchenhao/crawler-hangzhou

杭州网的新闻爬虫

crawler hangzhou node

Last synced: 21 Feb 2026

https://github.com/laffrex/xiaolanben_crawler

一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。

crawler pandas selenium src

Last synced: 23 Mar 2025

https://github.com/rafaelmoraes003/tech-news

Analysis and manipulation of news data from a technology website obtained through data scraping using Python.

crawler data-scraping https mongodb parsel pymongo python web-scraping

Last synced: 05 May 2026

https://github.com/viktorholk/ranged

A Rust-based web crawler and pattern matcher

crawler regex rust scraper web

Last synced: 30 Mar 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Apr 2026

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 15 Mar 2025

https://github.com/dinofizz/sitemapper

sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.

astradb cassandra concurrency crawler go golang kubernetes nats sitemap

Last synced: 16 Jan 2026

https://github.com/phanletrunghieu/webcrawler

A web crawler with Spring MVC

crawler java servlet spring-mvc springframework

Last synced: 23 Mar 2025

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均

crawler nodejs

Last synced: 29 Apr 2026

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 16 May 2025

https://github.com/lillyschramm/spiegel.de-miner

A bot that automatically saves any posts created at Spiegel.de

crawler spiegel-online

Last synced: 01 Sep 2025

https://github.com/jimut123/leaderbehaviour

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

crawler leaderbehaviour newsscraper scrapy timesofindia

Last synced: 16 Jan 2026

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 23 Mar 2025

https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez

Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.

beautifulsoup crawler immigration web

Last synced: 16 Jun 2025

https://github.com/agucova/needs-seeding

🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.

crawler sci-hub torrents

Last synced: 12 Oct 2025

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 15 Jul 2025

https://github.com/pmuens/crawler

Multi-threaded Web crawler with support for custom fetching and persisting logic

crawler crawler-engine rust rust-lang web-crawler web-crawling

Last synced: 15 May 2025

https://github.com/sanskar107/c-subject-predictor

Predicts topic of a code.

crawler nlp rnn

Last synced: 14 Mar 2025

https://github.com/ghsaboias/alpha-agent

An intelligent web research assistant that combines web crawling, search functionality, and AI-powered analysis using Anthropic's Claude API.

ai claude crawler search web

Last synced: 14 Mar 2025

https://github.com/insectmk/douban-crawler

豆瓣电影Top250爬虫及数据展示

analysis crawler django echarts mysql python3 website

Last synced: 10 Mar 2026

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 29 May 2026

https://github.com/lfsc09/crawl-this-go

Simple CLI tool for crawling pdf documents and html pages

crawler go

Last synced: 18 Jun 2025

https://github.com/kweonminsung/crawl2toast

Real-time toast notification of crawled data with CSS selectors(Windows Only)

beautifulsoup4 crawler selenium tkinter toast-notifications

Last synced: 18 May 2026

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/crosscutsaw/iscsicrawler

iscsicrawler is a bash script that crawls files in the iscsi targets with ease.

crawler iscsi iscsi-target iscsiadm

Last synced: 16 Jan 2026

https://github.com/pengkobe/my-web-crawler

auto pull blog update from bloggers. dev based on angular2

crawler nodejs

Last synced: 18 May 2026

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 13 Sep 2025

https://github.com/moparisthebest/nginx-limit-crawlers

rate limit crawlers in nginx

ai crawler nginx

Last synced: 14 Mar 2025

https://github.com/jonesrussell/pipelinex

Firecrawl-style web intelligence pipeline powered by North Cloud

crawler pipeline vue

Last synced: 09 Mar 2026

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 09 Aug 2025

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 18 Jun 2026

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣,那么就应该立即行动起来

crawler python

Last synced: 15 Jun 2026

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 11 Jul 2025

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 01 May 2026

https://github.com/jiusanzhou/reaper

Distributed Elegant Scraper and Crawler Framework for Rust.

crawler data-scraping rust scraper spider

Last synced: 24 Jul 2025

https://github.com/kimseogyu/crawling-music-ranks

음원순위 크롤링 코드

crawler jest typescript

Last synced: 07 Apr 2025

https://github.com/alphabs/navercafeclient

네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리

crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping

Last synced: 06 May 2026

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Jun 2026

https://github.com/fusetim/bitcrawler

Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?

bittorrent crawler dht

Last synced: 30 Mar 2025

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 31 Mar 2025

https://github.com/diegojromerolopez/relwrac

A basic crawler developed with python and asyncio

asyncio crawler page-rank python

Last synced: 11 Nov 2025

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 15 Mar 2025

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Aug 2025

https://github.com/jmousqueton/check-broken-link

Multi-threaded Python tool for crawling and checking all internal links on a website, with live Rich dashboard, broken link export (CSV), and detailed source tracking.

check crawler error400 error404 error500 links

Last synced: 29 Aug 2025

https://github.com/roele/roast

A JVM Data Crawler

cli crawler jvm

Last synced: 16 May 2025

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 12 Apr 2026

https://github.com/orkan/tlc

Simple PHP/cURL/FlareSolverr framework with Logger, Cache and more!

crawler curl flaresolverr net scrap

Last synced: 27 Aug 2025

https://github.com/ferru97/jsketchfabcrawler

jSketchfabCrawler is a java for the automatic crawling of model's information from sketchfab.com

crawler data database java sketchfab sql

Last synced: 03 Jan 2026

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 25 Aug 2025

https://github.com/hoosnick/olx-parser

OLX Real Estate Parser

crawler olx

Last synced: 25 Aug 2025

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 23 Mar 2025

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 24 Aug 2025

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 07 May 2026

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 22 Aug 2025