An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/adbenitez/smd

Simple Manga Downloader, a tool to search and download manga

bs4 command-line-tool crawler crawling downloader manga manga-downloader python python3 urllib

Last synced: 14 Jan 2026

https://github.com/amirzenoozi/insta-downloader

You Can Download Instagram Post With This Script

crawler crawling downloader instagram

Last synced: 20 Jul 2025

https://github.com/rodyherrera/codexdrake

An open source, privacy-first, self-hosting capable and blazing fast search engine written in JavaScript. Browse anonymously and safely without the need to pay third-party APIs. 👀

adblock books crawler google images javascript metasearch metasearch-engine news nodejs privacy-first search search-engine searchengine searx self-hosted videos webscraping websearch wikipedia

Last synced: 27 Mar 2026

https://github.com/codingcrush/aiocrawler

Async crawler framework based on aiohttp and asyncio for running fast.

aiofiles aiohttp asyncio crawler uvloop

Last synced: 19 Sep 2025

https://github.com/ctf-archives/live-photo-crawler

实时图床的图像爬取脚本

crawler pailixiang photoplus

Last synced: 03 Jul 2025

https://github.com/96bearli/biliup_record

对bilibili的up动态留档

bili crawler python

Last synced: 16 Mar 2025

https://github.com/wisdom-valley/planet-helper-release

A useful 知识星球 download helper and studying assistant.

crawler knowledgebase spider zsxq

Last synced: 09 Jun 2026

https://github.com/orangmuda/SECTOOL

sᴇᴀʀᴄʜ ᴇɴɢɪɴᴇ sᴄʀᴀᴘᴇʀ ᴛᴏᴏʟ (ʙᴀsʜ)

crawler crawling scraper website-scraper

Last synced: 22 May 2026

https://github.com/krolow/marsvin

Structural Crawler framework written in PHP

crawler framework parser php

Last synced: 13 Oct 2025

https://github.com/timschneeb/app-crawler

Python script that searches GitHub, F-Droid and IzzySoft's F-Droid repo for apps with Shizuku support. Updated daily.

crawler f-droid github shizuku

Last synced: 07 May 2025

https://github.com/redco/goose-starter-kit

This is a starter kit for redco/goose-parser

crawler docker goose goose-parser parser starter-kit

Last synced: 04 Apr 2025

https://github.com/byt3n33dl3/crawler_v2

Remote access Trojan based (Client) After the Malware hits the Kernel.

compiler crawler exploit offensive-security pentesting rat

Last synced: 13 Apr 2025

https://github.com/frostming/renren-dumps

人人网数据备份器

crawler renren spider

Last synced: 11 Apr 2025

https://github.com/scrapingant/scrapingant-client-js

ScrapingAnt API client for JavaScript / Node.js.

crawler scraper scraping scrapingant webscraping

Last synced: 15 Aug 2025

https://github.com/randark-jmt/live-photo-crawler

实时图床的图像爬取脚本

crawler pailixiang photoplus

Last synced: 09 Oct 2025

https://github.com/geminidsystems/googlenewsscraper

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper

Last synced: 13 Aug 2025

https://github.com/catalyst/moodle-tool_crawler

A moodle link crawling robot, find broken, slow and oversized links

crawler moodle plugin-moodle

Last synced: 28 Feb 2026

https://github.com/petrpatek/airbnb-scraper

Apify public actor for scraping Airbnb homes.

airbnb airbnb-api apify crawler data-extraction scrape

Last synced: 20 Mar 2025

https://github.com/theritikchoure/crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping

Last synced: 30 Oct 2025

https://github.com/aneesh-aparajit/reddit-crawler

Reddit Crawler API for collecting datasets from Reddit.

crawler nlp python reddit scraper web-crawler

Last synced: 16 Jan 2026

https://github.com/johansatge/psi-report

Crawls a website, gets PageSpeed Insights data for each page, and exports an HTML report.

cli crawler html-report pagespeed-insights

Last synced: 27 Mar 2025

https://github.com/zhifengle/js-hook

解析 JavaScript 的 AST,添加自定义的钩子

crawler js-reverse

Last synced: 10 Apr 2025

https://github.com/qzcool/cpef

私募基金管理人查询数据接口。Chinese Private Equity Funds APIs.

china crawler data finance fund funds hedge-funds private-equity python python3 scraper scraping-websites spider

Last synced: 26 Feb 2026

https://github.com/hfrost0/simple-baidu-image-download

只有30行的百度图片爬虫,只用最简单的语句

crawler image

Last synced: 11 Mar 2026

https://github.com/viclafouch/fetch-crawler

📌 A Node.JS Web crawler using the API Fetch to scrap static websites

cheerio crawler crawling-sites fetch-api nodejs promises scrapping

Last synced: 17 Mar 2026

https://github.com/thesoenke/news-crawler

Crawler that collects and extracts content of daily published news articles

crawler news

Last synced: 07 May 2025

https://github.com/yifan123/arxiv_spider

An arxiv spider

arxiv crawler spider

Last synced: 26 Jun 2025

https://github.com/willin/beian-domain

获取最新可备案域名列表爬虫

beian crawler domain node

Last synced: 26 Jun 2025

https://github.com/unistudents/saffron

A fairly intuitive & powerful framework that enables you to collect & save articles and news from all over the web.

aggregator announcements api-scraper articles crawler crawler-framework dynamic-scraping html-scraping javascript news parser rss rss-aggregator rss-feed rss-parser saffron scraping typescript wordpress-api

Last synced: 21 Feb 2026

https://github.com/siveci/javdb_magnet_spider

基于 Python 的 JavDB 磁力链接自动化爬虫。采用 curl_cffi 完美模拟浏览器 TLS 指纹绕过 Cloudflare 防火墙。支持多页列表抓取,根据“无码/中字/高清”等标签及文件大小,自动筛选并导出最优的磁力链接至 CSV 文件。

crawler data-extraction javdb magnet-links python python3 scraper spider

Last synced: 06 Jun 2026

https://github.com/lightzhu/node_crawler

Node.js 项目,koa cheerio爬虫小程序,爬取电影、免费科学上网节点,钉钉定时消息。

crawler freevpn mongoose node ss ssr v2ray vmess vpn

Last synced: 23 Oct 2025

https://github.com/utkucanbykl/sofpythonbot

This Telegram-Bot answers python questions by using stackoverflow subjects.

beautifulsoup crawler machine-learning mongodb naive-bayes-algorithm python telegram-bot

Last synced: 14 Aug 2025

https://github.com/jacraig/spidey

A multi threaded web crawler library that is generic enough to allow different engines to be swapped in.

crawler webcrawler

Last synced: 12 Aug 2025

https://github.com/guillim/arachnida

App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.

crawler crawling framework headless-chrome javascipt meteor scraper scrapping

Last synced: 15 Jun 2025

https://github.com/cristipufu/scrapy-net

Scrapy the web scraping tool - a naive implementation in C#

crawler scraper scrapy

Last synced: 28 Oct 2025

https://github.com/twtrubiks/google-play-store-spider-bs4-excel

Google-Play-Store-spider use Beautiful Soup on Python to EXCEL

beautifulsoup crawler google-play-store pyexcel python sql-database xlsx

Last synced: 15 Apr 2025

https://github.com/jpwahle/cs-insights-crawler

This repository implements the interaction with DBLP, information extraction and pre-processing of papers, and a client to store data to the cs-insights-backend.

crawler dblp dblp-dataset nlp semanticscholar

Last synced: 18 Apr 2026

https://github.com/myconsciousness/atproto-pds-search

This project automatically crawls and visualizes the atproto PDS endpoints indexed in the PLC directory.

atproto bluesky crawler dart flutter indexer pds search search-engine searching

Last synced: 22 Apr 2025

https://github.com/ycrao/some-spider-code

some spider code 财经资讯以及基金股票外汇价格爬虫

ai crawler deep-seek economics fin-eco-news finance forex fund-value spider stock-price

Last synced: 29 Jun 2025

https://github.com/odanieldcs/bot-webscraper

Código fonte do web scraper

cheerio crawler request scraper spider tutorial

Last synced: 02 Aug 2025

https://github.com/louis70109/pleaguebot

P+ League Chatbot(unofficial)(deprecated)

basketball chatbot crawler line

Last synced: 14 Apr 2025

https://github.com/fjcanyue/comic_downloader

🚀 轻量级命令行漫画下载器 (CLI),支持摩锐漫画、读漫屋、看漫画等热门平台。Python 实现,极简高效。

comic-downloader comics crawler manga manga-downloader

Last synced: 26 Jan 2026

https://github.com/tca166/ck3-history-extractor

A program designed for creating an encyclopedia of sorts containing your ck3 history

ck3 crawler python3 rust save-file save-files

Last synced: 04 Jul 2025

https://github.com/hiyali/node-crawler-on-mongodb

🕷 NodeJS + Puppeteer crawler on MongoDB

crawler example mongob nodejs puppeteer

Last synced: 13 Jul 2025

https://github.com/whitejoce/Get_Weather

通过获取IP定位,爬取当地的天气(不需要API)

crawler python3 spider weather-forecast

Last synced: 14 Apr 2025

https://github.com/beomi/data_camp_wcr_3

파이썬을 활용한 실전 웹크롤링 CAMP 3기 소스코드

crawler python

Last synced: 27 Aug 2025

https://github.com/lablnet/pakweather_scraper

A multi-threaded Pakistan Weather crawler written in JavaScript

crawler data mit-license open-source pakistan scraping weather weather-channel

Last synced: 22 Aug 2025

https://github.com/misaka10843/copymanga-nasdownloader

copymanga-downloader的mini ver,专为nas设计,不止于copymanga,支持多种平台!

comic copymanga crawler downloader python

Last synced: 16 Jan 2026

https://github.com/houtini-ai/seo-crawler-mcp

Crawl and analyse your website for errors and issues that affect your site's SEO inside a self contained MCP - interact in your AI assistant or in terminal for later AI SEO analysis in chat.

crawlee crawler librecrawl mcp seo seo-analysis sqlite technical-seo-audit

Last synced: 06 May 2026

https://github.com/piotrpdev/webuy-cex-price-tracker

A python script that gets the prices of certain Cex products and uploads them to google sheets

cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex

Last synced: 05 May 2025

https://github.com/eugen1j/aioscrapy

Python asynchronous library for web scrapping

asyncio crawler python-crawler python37 webscraper

Last synced: 09 Oct 2025

https://github.com/nadar/crawler

A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.

crawler hacktoberfest html pdf php

Last synced: 13 Apr 2025

https://github.com/jayin/goods-crawling

爬取amazon/bestbuy/costco/6pm 的商品详情

amazon crawler node

Last synced: 15 Mar 2025

https://github.com/bunseokbot/darklight

Engine for collecting onion domains and crawling from webpage based on Tor network

celery crawler crawling darkweb engine python redis tor

Last synced: 11 May 2025

https://github.com/vinhlh/frontendmasters-crawler

A demo of a serverless crawler built on AWS Lambda (scheduled tasks) and store results in S3

aws crawler lambda s3 serverless

Last synced: 11 Aug 2025

https://github.com/mythkiven/python

python 脚本、python 爬虫、python 工具

crawler python script spider

Last synced: 05 Aug 2025

https://github.com/doreanbyte/katswiri

A crawler to find job listings and aggregate them from multiple sources

assistant crawler employment-opportunities job-aggreg job-finder time-management

Last synced: 04 Sep 2025

https://github.com/ne-lexa/roach-php-bundle

Symfony bundle for roach-php/core

crawler php roach-php scrapy spider symfony symfony-bundle

Last synced: 10 Apr 2025

https://github.com/xfengyin/zhihu-salt-novel-downloader

知乎盐选小说下载器 - 多线程爬取知乎盐选专栏小说,支持CLI+GUI双模式、多种导出格式、代理配置、Cookie登录、断点续传

cli-tool crawler novel-downloader python zhihu

Last synced: 19 Jun 2026

https://github.com/rational-kunal/netflix-hotkeys

A Chrome extension to enhance your Netflix binging experience!

chrome-extension crawler netflix

Last synced: 10 Mar 2026

https://github.com/lablnet/web-spider

Multi threaded Web crawler

crawl crawler mit open-source package project python spider

Last synced: 27 Feb 2026

https://github.com/jtiala/wpdl

⬇️ Scrape pages, posts, images and other data from a WordPress instance.

crawler downloader scraper scraping wordpress

Last synced: 08 May 2025

https://github.com/bluurr/quora-loader

A realtime read-only locator and extraction library for Quora questions and answers.

answers api bluurr client crawler crawling java questions quora scraper scraping selenium

Last synced: 26 Jul 2025

https://github.com/gimnathperera/web-scraping-riyasewana.lk

Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan vehicle selling web sites.

crawler python scrapy spider webscraping

Last synced: 30 Apr 2025

https://github.com/cutecutecat/knightreport

坎公骑冠剑会战统计工具

crawler csv-export game-tool

Last synced: 18 Mar 2025

https://github.com/mevljas/nepremicnine-discord-bot

A discord bot for notifying about new listings on the nepremicnine.net website.

crawler discord-bot scraper

Last synced: 19 Jan 2026

https://github.com/maengsanha/instacrawler

KMU CS Capstone Design project: Instagram Meta Search Engine

crawler go instagram metasearch

Last synced: 14 Jan 2026

https://github.com/bringyourownideas/laravel-sitemap

Simple crawler and sitemap generator for Laravel. No headless browser - just a crawler.

crawler laravel laravel-sitemap sitemap-generator sitemap-xml

Last synced: 01 May 2025

https://github.com/lucasayres/linkedin-crawler-connections

Linkedin crawler to search and collect my connections (profile picture, name, occupation, location, email and phone).

chromedriver connections crawler linkedin profile python scraper selenium

Last synced: 16 May 2025

https://github.com/lysandrejik/omegle-crawler-node

Node library to connect to and interact with the Omegle website.

crawler omegle puppeteer

Last synced: 06 Mar 2026

https://github.com/leonzucchini/Recipes

Project to get and analyse data on recipes from chefkoch.de

cooking crawler python recipe

Last synced: 03 Apr 2025

https://github.com/confact/spider.cr

Spider.cr is a spider crawler in Crystal. It handles collecting, scraping, and parsing. So you can spend your time collecting the data you want on a big scale.

crawler spider

Last synced: 22 Apr 2025

https://github.com/sunhailin-leo/12306-go

Use Go-resty to crawl 12306

12306 crawler go-resty golang

Last synced: 10 Jul 2025

https://github.com/mithro/fastsvncrawler

fast-svn-crawler / fastsvncrawler - A tool for listing SVN repository content

crawler export import subversion svn vcs

Last synced: 13 Apr 2025

https://github.com/bjoern-hempel/php-web-crawler

A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.

crawler mit-license php recursive webcrawler webscraper xpath

Last synced: 11 Apr 2025

https://github.com/sobak/scrawler

Declarative, scriptable web robot (crawler) and scrapper

crawler crawler-engine robots-txt scraper scraping-websites

Last synced: 25 Mar 2025

https://github.com/wetrycode/tegenaria

Tegenaria is a crawler framework based on golang

crawler crawler-engine crawler-framework framework go golang spider spiders

Last synced: 12 Jan 2026

https://github.com/proclnas/curl-rox

Just another curl wrapper for webCrawling purposes

crawler curl curlphp php

Last synced: 01 Apr 2026

https://github.com/exp-codes/bilibili-plugin

哔哩哔哩插件姬

bilibili crawler live programming

Last synced: 16 Aug 2025

https://github.com/matheuscas/pycnpj-crawler

Mais um módulo para extrair dados de empresas a partir do CNPJ

cnpj crawler python python3

Last synced: 03 Sep 2025

https://github.com/yowenter/stackshare

A simple Web crawler for stackshare.io using scrapy .

crawler python stackshare

Last synced: 30 Oct 2025

https://github.com/sanix-darker/ziim

Let your CLI find available solutions for errors / exceptions online on commands you hit, for you, no need open a Browser. and find something yourself

cli crawler error-correcting-codes error-handling exception-handler exception-handling exceptions javascript python scraper stackoverflow stackoverflow-api stackoverflow-questions

Last synced: 13 Apr 2025

https://github.com/crispy-computing-machine/phpcrawl

PHPCrawl Web Crawler PHP 8

crawl crawler php php74 sphider

Last synced: 03 Oct 2025

https://github.com/hironsan/japanese-news-crawler

A complete automated japanese news crawler built on the top of Scrapy framework

crawler

Last synced: 01 Apr 2026

https://github.com/dvf/bitcoin-node-crawler

A node crawler for discovering nodes on the Bitcoin network

bitcoin btc crawler explorer p2p python

Last synced: 28 Oct 2025

https://github.com/twtrubiks/google-play-store-spider-selenium

Google-Play-Store-spider use Selenium +Beautiful Soup on Python

beautifulsoup chrome crawler firefox python selenium spider sqlite

Last synced: 15 Apr 2025

https://github.com/logocomune/botdetector

BotDetector is a golang library that detects Bot/Spider/Crawler from user agent

botdetector bots crawler go golang golang-library spider user-agent

Last synced: 29 Apr 2025

https://github.com/pi-2r/devoxxfr2025-tock-studio-ia-gen

Projet issu du codelab Devoxx France 2025 “À la recherche du RAG perdu” : atelier de 3h pour apprendre à créer un chatbot IA Générative autonome, local et sans Internet, basé uniquement sur des frameworks open source

ai chatbot crawler devoxx devoxx-fr-2025 docker generative-ai jailbreak kotlin langchain langfuse localai mistral ollama open-source rag scrapoxy scrapy

Last synced: 07 Oct 2025