An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/nakabonne/staticcollector

Application to analyze static files of competing sites

crawler go golang

Last synced: 19 May 2026

https://github.com/highbreed/web-crawler

A web crawler script that crawls the target website and lists its links

crawler crawling python3

Last synced: 07 Jun 2026

https://github.com/tbarnes94/fortnite-weapons-bot

A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.

crawler discord discord-bot discord-js typescript2

Last synced: 06 Jan 2026

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 14 Dec 2025

https://github.com/thiiagoms/dict-crawler

Simple crawler on UOL dictionary

beautifulsoup4 crawler dic python pythonic

Last synced: 26 May 2026

https://github.com/qin2dim/istockphoto-go

📸 Gracefully download dataset from iStockPhoto.

colly crawler istockphoto

Last synced: 05 Apr 2025

https://github.com/rdil/crawley

My attempt at a web crawler.

bs4 crawler python python3 web

Last synced: 11 Jun 2025

https://github.com/denrydu/baiduimagecrawler

自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!

baidu crawler dynamic python3

Last synced: 04 Nov 2025

https://github.com/erikmueller/jazmax

Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator

crawler data-science efficiency green heatpump jaz

Last synced: 24 Mar 2025

https://github.com/lucassmacedo/tvtime-scrapy

A simple crawly to get personal watched movies/series/animes from my user profile

crawler phython scrapy tvtime

Last synced: 10 Oct 2025

https://github.com/btheu/estivate

Mapping from DOM to POJO with CSS Query Syntax and annotations.

annotations crawler csquery css-selector html java jsoup jsoup-operation pojo

Last synced: 12 Jan 2026

https://github.com/norconex/committer-neo4j

Implementation of Norconex Committer for Neo4j.

crawler neo4j neo4j-committer norconex-committer

Last synced: 19 Jan 2026

https://github.com/byt3n33dl3/thc-katanax

The Next generation of Samurai blades that Crawling and Spidering Framework.

cli crawler domain framework golang hacking http pentesting subdomain subfinder tls

Last synced: 16 Apr 2025

https://github.com/stevieflyer/quokka

An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.

crawler parallel web-automation

Last synced: 12 Jul 2025

https://github.com/jemaf/stackoverflow-jobs

A wrapper for crawling data at Stack Overflow Jobs portal

crawler jobs python stack-overflow

Last synced: 14 Jan 2026

https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

crawler python spider

Last synced: 22 Jul 2025

https://github.com/liebki/githubnet

This library allows you to retrieve several things from GitHub, things like trending repositories, profiles of users, the repositories of users and related information.

crawler crawling github github-trending htmlagilitypack microsoft

Last synced: 20 May 2026

https://github.com/truethari/fcrawler

Python application that can be used to copy files of a given file type from a folder directory.

copy copy-files crawl crawler crawler-python file files

Last synced: 25 Feb 2025

https://github.com/arif98741/deadlink-checker-python

A Python tool to crawl websites and check for broken/dead links with detailed reporting in both text and PDF formats.

crawler crawling python python3 website-scraper

Last synced: 18 Apr 2026

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 21 Jan 2026

https://github.com/gabrielrf/bsbdf

Telegram Public Channel

crawler python telegram telegram-channel telegraph

Last synced: 05 Feb 2026

https://github.com/thecloer/crawler-himym

How I met your mother script PDF generator for learning English

crawler pdf pdf-generation typescript web-scraping webscraping

Last synced: 21 Jun 2025

https://github.com/roccomuso/is-twitter

Verify that a request is from Twitter crawlers using DNS verification steps

bot crawler dns ip js nodejs twitter verification

Last synced: 14 Sep 2025

https://github.com/ian-lin8239/javdb_magnet

JavDB 磁力鏈接專用工具 - 自動獲取有碼月榜前30部影片的磁力鏈接,支持智能過濾和多格式導出

cli-tool crawler javdb magnet-link python web-scraping

Last synced: 12 Feb 2026

https://github.com/epigos/newsbot

A news bot written in Go for Dialogflow and Facebook messenger

autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed

Last synced: 22 Mar 2025

https://github.com/r3c0ger/liscaps

A LSTM-based intelligent stock crawl, analysis and prediction system.

crawler lstm python pytorch stock streamlit

Last synced: 27 Jan 2026

https://github.com/merrier/overwatch-spider

:beetle: Overwatch Spider with NodeJS + node-crawler

crawler javascript jquery nodejs overwatch spider

Last synced: 29 Mar 2025

https://github.com/mzazakeith/puppetmaster

Puppeteer & Crawl4AI microservice for web automation, scraping, and AI processing with Bull queues

agent ai automation bull bullmq chrome crawl4ai crawler data data-extraction extraction gemini llm llms openai playwright puppeteer web-automation

Last synced: 13 May 2025

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 04 Mar 2025

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 06 Apr 2025

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 22 Oct 2025

https://github.com/carloocchiena/python_url_crawler

A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website

beautifulsoup crawler python python3

Last synced: 12 Feb 2026

https://github.com/brunojppb/airport-crawler

Simple and powerful CLI app to get worldwide airport information in JSON format

airport cli crawler ruby

Last synced: 09 Jun 2026

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 14 Apr 2026

https://github.com/telanflow/scrago

A micro crawler framework. achieved by GOLANG.

crawler go micro-framework spider

Last synced: 25 Jun 2025

https://github.com/tvrcgo/collect

数据采集

crawler scraper

Last synced: 06 Apr 2025

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 13 Apr 2025

https://github.com/markoczy/crawler

A Web Crawler based on Go and Chromedp

cli crawler golang

Last synced: 17 Jan 2026

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 03 Feb 2026

https://github.com/sntran/gen_spider

An Erlang/Elixir behaviour to define Spiders

behaviour crawler generic interface spider

Last synced: 28 Feb 2026

https://github.com/nueip/curl

NUEiP Curl Lib

crawler php

Last synced: 11 Jun 2025

https://github.com/xcrypt0r/watchdog

🐶 Dcinside image crawler that includes NSFW detection (Enhanced version of Hyacinth)

crawler crawling dc dcinside nodejs nsfw parsing tensorflow

Last synced: 18 May 2026

https://github.com/liuzl/newsmth

A go crawler for newsmth.net

bigdata crawler newsmth nlp

Last synced: 14 May 2025

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 18 Feb 2026

https://github.com/liinen/vocalist-backend

vloom backend implementation in cloud service, with crawling dataset from karaoke website

connection-pool crawler express mysql ncloud-server pagination python3 selenium

Last synced: 13 Apr 2026

https://github.com/woojubb/link-collector

웹페이지 주소 및 RSS를 크롤링 해주는 프로그램

crawler crawling rss

Last synced: 28 May 2026

https://github.com/h-alice/piday-crawler

This is a little script to get pi digits and pretty print.

crawler

Last synced: 25 Jun 2025

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 05 Apr 2025

https://github.com/foufou-exe/yspeed

Yspeed is a library that scrapes the Speedtest site

crawler python rich scraper scraping selenium selenium-python speedtest

Last synced: 28 Feb 2026

https://github.com/jiannei/github-trending

Github trending crawling based on lumen.

crawler github-trending lumen php

Last synced: 30 Apr 2025

https://github.com/spaceemotion/goodreads-browser

Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍

books crawler goodreads

Last synced: 22 Jan 2026

https://github.com/nzrsky/useragent-generator

High-performance User-Agent generator for Go. Zero-alloc bots, auto-updated browser versions from real usage data.

bot browser crawler go golang http scraping user-agent useragent

Last synced: 14 Apr 2026

https://github.com/superreal/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 14 May 2026

https://github.com/sharmadhiraj/free-json-datasets

Collection of free JSON data that are scraped and parsed from different websites.

collection crawler data data-scraping datasets json sports statistics web-scraping

Last synced: 28 Mar 2025

https://github.com/buaadreamer/buaastar

北航星球网站 北航2021年夏季学期Python英文课大作业

crawler css flask html javascript python

Last synced: 28 Apr 2026

https://github.com/fbielejec/nagger

nag reviewers of PRs

bot crawler github slack

Last synced: 04 May 2026

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 23 Jun 2026

https://github.com/arshamroshannejad/scrapify

Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.

403-bypass arkose cloudflare crawler golang http-client scraper

Last synced: 18 Apr 2026

https://github.com/eduardozepeda/go-web-crawler

A concurrent web crawler written in go that looks for exposed .git and .env uris.

crawler environment-variables git go pentesting security-audit

Last synced: 16 Apr 2026

https://github.com/YGGverse/pulsarss

RSS Aggregator for Gemini Protocol

aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust

Last synced: 15 Jun 2026

https://github.com/maraf/staticsitecrawler

A simple util for crawling links from root URL and saving HTML documents.

crawler static-site-generator

Last synced: 21 Apr 2026

https://github.com/eduardosbcabral/desafio-tecnico-mp

Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.

crawler csharp dotnet

Last synced: 08 May 2026

https://github.com/mashukui/xhs_pic_tool

用python开发的小红书图片采集软件,支持下载小红书笔记无水印图片、采集笔记数据、评论数据等。小红书爬虫|小红书无水印图片|小红书无水印下载|小红书评论爬虫|小红书采集工具|小红书评论采集|小红书采集软件|小红书爬取数据|xiaohongshu|xhs|XHS

crawler gui gui-application python-spider spider xhs xhs-downloader xhs-spider xiaohongshu xiaohongshu-downloader

Last synced: 04 Apr 2026

https://github.com/cyberdolfi/serverrawler

ServerRawler is a Minecraft Server Crawler, written in Rust

crawler minecraft ratatui-rs rust seeker servercrawler serverseeker

Last synced: 04 Mar 2026

https://github.com/devkoriel/teslalarm-kr

🚀 Teslalarm KR Real-time, AI-powered Tesla news & price alerts tailored for the Korean market. Stay updated on price changes, new model releases, and more – delivered directly to your Telegram. 🔔 Join us and help revolutionize Tesla news in Korea!

crawler telegram-bot tesla

Last synced: 04 Apr 2026

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 28 Apr 2026

https://github.com/shunk031/lineblogscraper

Scraper for LINE Blog in Scrapy

crawler lineblog scraper scrapy

Last synced: 17 Jun 2026

https://github.com/elky84/lol-crawler

Notification from LOL friend game start & end.

crawler csharp docker dotnet web-crawler

Last synced: 07 May 2026

https://github.com/kahsolt/allchan

An image crawler for xChan(4chan/8ch/...) image board.

4chan 4chan-downloader 8chan crawler image-crawler

Last synced: 23 Jun 2026

https://github.com/marabesi/social-crawler

Easy way to find emails from social networks

crawler emails php social-crawler social-network

Last synced: 02 Mar 2026

https://github.com/gitzhiqing/netprogcode

网络编程实验代码~

crawler network socket

Last synced: 24 Apr 2026

https://github.com/rebrowser/stubhub-dataset

StubHub secondary ticket market data: event listings with section, row, quantity, delivery type, ticket class, and 500+ venues across US, Canada, and Europe. Updated daily.

concert-tickets crawler data-collection data-science dataset event-tickets live-events open-data resale-tickets scraper secondary-market sports-tickets stubhub tickets web-scraping

Last synced: 03 May 2026

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 30 Apr 2026

https://github.com/nava45/simplempcrawler

Simple Multiprocessing Crawler in python

crawler multiprocessing python

Last synced: 22 Jun 2026

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 30 Apr 2026

https://github.com/gnujoow/crawl-repo

crawling github's repositories basic info

crawler github github-api python3

Last synced: 03 May 2026

https://github.com/kapitanluffy/sunny-crawler

That moment when I tried learning things about "Big Data" and "Inverted Indexes"

big-data crawler inverted-index php search

Last synced: 30 Apr 2026

https://github.com/zanmato/shouting-robin

SEO Crawler focused on E-commerce

crawler developer-tools seo seo-tools

Last synced: 21 Jun 2026

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 29 Apr 2026

https://github.com/yuminn-k/crawling-tabelog

Crawling store information from tabelog

crawler python3

Last synced: 08 Jun 2026

https://github.com/viclafouch/pe-crawler

📌 An automated system that serves data extracted from the Google Help Center

crawler javascript nodejs postgresql sequelize

Last synced: 17 Apr 2026

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 29 Apr 2026

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 19 Apr 2026

https://github.com/restuwahyu13/node-scraper-content

example node scraper all content programming using puppeteer

crawler nodejs puppeter scrapper

Last synced: 14 May 2026

https://github.com/mauricelambert/cr0wl3r

Full and discreet web crawler for pentest, red-teaming or hacking discovery using simple HTTP request or selemium.

crawler discovery links pentest scan scraper security selenium uri url web web-links

Last synced: 11 Jun 2026

https://github.com/ewertoncodes/mind-crawler

A simple api written in Rails to extract quotations from the Quotes to Scrape site.

crawler ruby ruby-on-rails

Last synced: 14 May 2026

https://github.com/manojahi/is-there-any-song-reference-in-article

It will tell if there are any songs references in article from a website.

crawler lyrics-search python webscraping

Last synced: 28 Mar 2026

https://github.com/manku27/webscrapping

Crawls and scraped a website to get rental listings as per my custom needs which the website wasnt providing, and to directly scrape necessary information like Property owner's phone number for quick use.

beautifulsoup crawler python scraper

Last synced: 30 Apr 2026

https://github.com/tctien342/simple-doc-crawler

Craw all sub page from given URL to markdown

cli crawler llm markdown

Last synced: 03 Mar 2026

https://github.com/poodle64/supacrawl

Zero-infrastructure web scraping for the terminal

cli crawler llm markdown playwright python scraper terminal web-scraping

Last synced: 04 Mar 2026