An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/vshulcz/youtube_crawler

A simple YouTube crawler, allows you to quickly collect data from channels, view and sort them in a table, perform SQL queries and advanced search by various parameters.

crawler database gui osint parser python requests reverse-engineering sql tkinter youtube

Last synced: 16 Jan 2026

https://github.com/wangshouh/sdufelib_seat_crawler

SDUFE Library Reservation Seat Monitoring Crawler

crawler python

Last synced: 12 Feb 2026

https://github.com/karambir/ugc-colleges

Python Script to extract college names from UGC, India website.

college crawler extract html-parser python python-script ugc

Last synced: 19 Apr 2025

https://github.com/juliandavidmr/raptor

Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.

crawler kotlin mysql spider

Last synced: 20 Oct 2025

https://github.com/supadata-ai/js

Official TypeScript/JavaScript SDK for the Supadata API.

ai crawler llm markdown scraper transcript web-crawler youtube

Last synced: 22 Mar 2025

https://github.com/typingmonk/mnd_adiz_news_crawler

Web crawler that target to mnd.gov.tw post relate to ADIZ(防空識別區) report.

crawler

Last synced: 10 Jul 2025

https://github.com/robmch/mindfactory_crawling

A Python 3 Crawler for Mindfactory.de

crawler crawling data webcrawler webcrawling

Last synced: 07 May 2025

https://github.com/xcrypt0r/hyacinth

🌸 Dcinside image crawler with deadly simple structure

beautifulsoup4 crawler dcinside parsing pyqt5 pyside2

Last synced: 28 Apr 2025

https://github.com/floscha/genius-lyrics-crawler

A concurrent crawler to retrieve song lyrics from Genius

celery crawler fluentd genius lyrics mongodb python

Last synced: 30 Apr 2025

https://github.com/pyaesoneaungrgn/2d-crawler

2D crawler for set.or.th

2d 2d-crawler crawler myanmar php

Last synced: 28 Apr 2025

https://github.com/archan937/webhead

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing

Last synced: 25 Apr 2025

https://github.com/leomaurodesenv/smm-course-search

A package to searching courses - Super Mario Maker

bookmark-site crawler javascript json mario-game mario-maker nodejs

Last synced: 01 Apr 2025

https://github.com/lon9/arxiv

For scraping arxiv.org

arxiv crawler golang

Last synced: 24 Mar 2025

https://github.com/cr0hn/feed-to-exporter

Get RSS Feed and export as Wordpress Post

crawler feed rss wordpress

Last synced: 30 Oct 2025

https://github.com/choi-jiwoo/naver-place-scraper

Scrape reviews from Naver Place

crawler python scraper

Last synced: 14 Jan 2026

https://github.com/roccomuso/is-bing

Verify that a request is from Bing crawlers using Bing's DNS verification steps

bing bot check crawler dns ip js nodejs verify

Last synced: 27 Aug 2025

https://github.com/sayakie/pixiv-crawler

Crawls images from Pixiv 🚀

crawler nodejs pixiv typescript

Last synced: 21 Mar 2025

https://github.com/spencerlepine/readme-crawler

A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL

crawler javascript node nodejs readme scraper web-crawler webcrawer

Last synced: 03 May 2025

https://github.com/hktalent/scrapysite

ScrapySite,go Web Crawler(spider), scraping,intelligence gathering

crawler elasticsearch go scraping site spider web

Last synced: 14 May 2025

https://github.com/aminehsan/crawler-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scarping

Last synced: 14 Oct 2025

https://github.com/leelow/nightmare-screenshot-selector

👻 📷 A Nightmare plugin to easily take screenshots.

crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler

Last synced: 12 Apr 2025

https://github.com/zebbern/dezcrwl

🕷️ | dezcrwl is a website history crawler gather hidden information and check vulnerabilities for extracted .js endpoints & much more!

crawl crawler crawler-python crawlers ctf-tools hacking historical-data information information-gathering information-retrieval information-security infosec osint osint-tool pentesting-tools python reconnaissance tool web website

Last synced: 14 Apr 2025

https://github.com/aprilnea/xjtlu

This is how to get all the network resources of XJTLU.

crawler gateway http-auth python spider web-crawler xjtlu

Last synced: 01 Aug 2025

https://github.com/farkaskid/webcrawler

Simple and fast web crawler.

crawler go golang goroutines web webcrawler

Last synced: 14 Jan 2026

https://github.com/holmofy/spring-spider

Spring Spider App Utility Library.

crawler java spider spring spring-spider

Last synced: 17 Mar 2025

https://github.com/filipefilardi/wpp-broadcaster

Crawler made with Selenium and Python to constantly receive video/audio from target and broadcast to a list of contacts.

broadcast crawler python selenium

Last synced: 30 May 2026

https://github.com/xdk78/grabbi

grabbi a simple web scraper/crawler

crawler html scraper web-scraper

Last synced: 14 Apr 2025

https://github.com/kstrassheim/datawarehouse-crawler

This is a content and schema crawler tool to receive, update and import various kinds of data into a Onprem or Cloud based SQLServer or Azure-Synapse-Analysis (Azure Datawarehouse SQLServer). As source it supports SQLServer Tables, ODATA Endpoints, CSV Files or Excel Files. For multiple sources it can run in parallel mode where it would make a thread for each connection. The speciality of this crawler is that it creates the target tables by himself using the additional info from source.json. In case of Azure-Synapse-Analysis it would estimate the distribution type and keys. The syncing works completely without SQL Transactions by using a consistency correction algorithm for very frequent fact tables. There are 5 Syncing Algorithms (see Manual/Insert) which can be selected as well as one Update Algorithm.

azure-data-warehouse azure-synapse-analytics business-intelligence crawler csv data-import data-science datawarehouse datawarehousing docker dotnet-core-2 excel integration-testing odata parallel-computing sql

Last synced: 28 Apr 2026

https://github.com/oxylabs/web-crawler

Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.

api crawler github-python scraper web-crawler web-crawler-python web-scraping web-scraping-api webscraping

Last synced: 01 Aug 2025

https://github.com/mrmarble/mineseek

Minecraft server scanner

crawler minecraft minecraft-server scanner slp

Last synced: 07 Apr 2026

https://github.com/giscafer/airlevel-crawler

a demo of crawler for air-level.com

crawler java nodejs

Last synced: 28 Apr 2025

https://github.com/vitorebatista/horoscopefree

The Astrology API Rest daily horoscope

crawler horoscope horoscope-crawler horoscopes-api

Last synced: 14 Oct 2025

https://github.com/basemax/instagramseleniumhashtagimagepython

Instagram Selenium Python: A selenium-based crawler to extract images from special hashtags on Instagram.

crawler crawler-python crawlers instagram python python-selenium selenium selenium-python

Last synced: 15 May 2026

https://github.com/hctilg/pinterest-crawler

Downloads all images suitable for search

crawler pinterest

Last synced: 12 Apr 2025

https://github.com/moqsien/scrapx

scrapy定制版; A customized and enhanced version of scrapy for managing hundreds or even thousands of spiders.

crawler framework pymongo scrapy spider

Last synced: 10 Jul 2025

https://github.com/coghost/iparse

To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support

crawler parser parser-library python xkcd yaml

Last synced: 12 Oct 2025

https://github.com/fzdwx/go-pachong

go 爬虫,能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url

crawler go golang

Last synced: 28 Apr 2025

https://github.com/yanliu1111/dashboard-flask-echarts

📊 Pandemic Monitoring Realtime Dashboard

crawler echarts flask mysql selenium-webdriver

Last synced: 20 Jun 2025

https://github.com/x-tropy/docroll

Turn complex programming knowledge 📚 into engaging, AI-powered video 📺 lessons.

ai animation course crawler documentation slides tutorial video

Last synced: 11 Oct 2025

https://github.com/developerdavi/meli-crawler

Basic web crawler API for getting products from MercadoLibre (BRL | MLB)

api crawler meli-crawler mercadolibre mercadolibre-sdk mercadolivre mercadolivre-sdk nextjs now products react zeit

Last synced: 12 Apr 2025

https://github.com/danielmorell/se_bot_checker

Validate search engine user agents and IP addresses.

crawler googlebot python search-engine spider

Last synced: 15 Apr 2025

https://github.com/sujinleeme/koreamarathonapi

APIs of Marathon Events in Korea

crawler korea marathon-events python3

Last synced: 23 Jun 2025

https://github.com/wentsingnee/covid-19_crawler

COVID-19 疫情动态爬虫

cplusplus crawler

Last synced: 23 Apr 2025

https://github.com/hxr16f/ss-grabber

Automation script for downloading user screenshots.

automation crawler downloader grabber lightshot screenshot script

Last synced: 20 Jul 2025

https://github.com/giant-stone/gmq

一个支持自定义消费速率的简单消息队列 Simple, reliable, lightweight and efficient task queue in Go

crawler message-queue redis task-manager

Last synced: 12 Jan 2026

https://github.com/feedeo/youtube-channel-crawler

YouTube Channel :tv: Crawler

crawler youtube youtube-channel

Last synced: 06 Feb 2026

https://github.com/mirocow/yii2-crawler

Http concurrent crawler for Yii2

concurrency crawler guzzle yii2-extension

Last synced: 27 May 2026

https://github.com/iml1111/toonkor_collector

툰코 만화 수집기

crawler python

Last synced: 13 Jul 2025

https://github.com/spire-rs/spire

🗼 A flexible async framework for building high-performance crawlers and scrapers, designed for developers who need extensible pipelines, strong concurrency, and robust middleware support.

crawler framework scraper webdriver

Last synced: 21 Jan 2026

https://github.com/hangyan/generate-cs-word-dict

Generate a word dict for CS from stackoverflow/github tags

crawler dict github python word

Last synced: 29 Oct 2025

https://github.com/gatenlp/wpextract

Create datasets from WordPress sites for research or archiving

corpus crawler nlp text-extraction text-mining web-scraping wordpress

Last synced: 25 Jun 2025

https://github.com/tikazyq/github-crawler

Github repositories crawler

crawler scrapy

Last synced: 04 Apr 2025

https://github.com/synacktraa/crawl

Web crawler designed to efficiently retrieve unique href, script and form links from a web application.

bash crawler regex shell web-spidering

Last synced: 06 Apr 2026

https://github.com/kernelerr/pixivsync

Pixiv图片下载及同步工具

crawler pixiv pixiv-crawler python

Last synced: 14 May 2025

https://github.com/eished/tujigu_crawler

tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler

crawler node nodejs

Last synced: 28 Apr 2026

https://github.com/vmdang/historycrawler

The OOP project collects historical data in Vietnam and displays

crawler gson java javafx jsoup

Last synced: 17 Jun 2025

https://github.com/firesjoeng/bfo

Bilibili Followers Observer | Bilibili实时粉丝数监视器

bilibili crawler python

Last synced: 13 Apr 2025

https://github.com/elliotxx/readnewspaper

自动获取电子版报纸,方便每天阅读

crawler lxml newspaper pypdf2 python requests

Last synced: 12 Apr 2025

https://github.com/samnoh/cliboards

⌨️ Surf your online communities on CLI

cli-application crawler javascript

Last synced: 17 Jan 2026

https://github.com/code-inside/sloader

Worker that loads and retrieves data from "slow" endpoints.

crawler drop json yml

Last synced: 03 Sep 2025

https://github.com/serkan-ozal/driflyte-mcp-server

The Driflyte MCP Server exposes tools that allow AI assistants to query and retrieve topic-specific knowledge from recursively crawled and indexed web pages.

ai crawler mcp model-context-protocol opentelemetry rag

Last synced: 05 Oct 2025

https://github.com/liyifeng1994/go-crawler

基于golang的分布式爬虫项目

crawler elastic elasticsearch golang

Last synced: 01 May 2025

https://github.com/foolin/scrago

An simpe, fast, extensible crawl page framework for golang

crawler go scrago scrapy

Last synced: 24 Feb 2025

https://github.com/pjt3591oo/rust-exchange-crawler

rust 공부겸 만들어보는 크롤러

crawler rust

Last synced: 16 May 2025

https://github.com/cuerz/douban-top

Golang爬虫 爬取豆瓣榜单

crawler douban golang goquery

Last synced: 08 Feb 2026

https://github.com/feliz-szk/berserk

Berserk: Crawler to increase web traffic(based on tor and privoxy)

anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser

Last synced: 21 Apr 2026

https://github.com/omerdogan3/kitapp-crawler

Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.

book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler

Last synced: 04 May 2026

https://github.com/birkhofflee/blizzard_forum.js

An unofficial Node.js API for Blizzard Forums. (works in 2019)

api crawler web

Last synced: 26 Apr 2026

https://github.com/haxzie-xx/crode.js-node-web-crawler

Node.js Crawler built for open FTP sites for movie link collection.

crawler nodejs

Last synced: 01 May 2026

https://github.com/bakhirev/assayo-crawler

📈 Visualization and analysis of your git repository data.

audit commit crawler data-visualization git report statistics

Last synced: 05 Mar 2026

https://github.com/laurybueno/monibus

API de monitoramento de ônibus em São Paulo

api crawler django docker mapping sptrans

Last synced: 08 May 2026

https://github.com/somnisomni/trawler-csharp

The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#

crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper

Last synced: 06 May 2026

https://github.com/ayusharma/rss-parser

A simple crawler in ReactJS

crawler reactjs rss-parser

Last synced: 16 Apr 2026

https://github.com/techguy-bhushan/web-spider

multi-threaded webs crawler

crawler python web-spider

Last synced: 27 Mar 2026

https://github.com/0000xffff/webgrab

web page: crawler / file scanner / downloader

crawler download downloader scrape scraper webcrawler

Last synced: 17 Apr 2026

https://github.com/kkamara/php-scraper

:office: (Live Link) (2022) Use PHP technologies to crawl and click buttons on websites with GUI. I highly recommend working with Linux (including virtual machines) or MacOs. Laravel 11.

bot crawler laravel scraper spider

Last synced: 01 Apr 2026

https://github.com/zhaotianff/qzone

想起那天夕阳下的奔跑,那是我逝去的青春

crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf

Last synced: 24 Apr 2026

https://github.com/maicss/1024img

1024 image nodejs crawler

1024 crawler nodejs

Last synced: 03 May 2026

https://github.com/achannarasappa/locust-cli

Developer tools to accelerate development of Locust jobs

cli crawler headless-chrome puppeteer scraper

Last synced: 26 Apr 2026

https://github.com/wujunchuan/xiamen-housing-data-collection

利用(计划) Github Actions 定时采集厦门市住房保障与房屋管理局的一手房/二手房网签情况

crawler docker docker-image github-actions nodejs ocr spider tesseract-ocr xiamen

Last synced: 04 Apr 2026

https://github.com/bitlytwiser/tormonger

Recursive Tor network crawler

crawler go golang tor

Last synced: 18 Jun 2026

https://github.com/huzecong/film-spider

Spiders crawling for film listing websites.

crawler

Last synced: 09 Jun 2026

https://github.com/capturr/price-extract

Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.

amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript

Last synced: 30 Apr 2026

https://github.com/tokenmill/crawling-framework-example

Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.

crawler crawling-framework elasticsearch storm-crawler

Last synced: 08 May 2026

https://github.com/idanhoro/nasa-heat-maps-prediction

In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms

beautifulsoup crawler machine-learning pillow prediction python sklearn

Last synced: 05 Apr 2026

https://github.com/crackcomm/go-google-search

Google search NSQ worker

crawler google google-search search

Last synced: 16 Feb 2026