An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/ronin-rb/ronin-web

ronin-web is a collection of useful web helper methods and commands.

cli crawler hacktoberfest helpers html proxy-server ronin-rb ruby server spider web xml

Last synced: 03 Oct 2025

https://github.com/xiantang/spider

web crawler

crawler python3

Last synced: 14 Apr 2025

https://github.com/kant2002/ncrawler

Web Crawler written in C#

crawler scrapper

Last synced: 17 Jul 2025

https://github.com/threekiii/awesome-scrapy

一个基于Scrapy的数据采集爬虫代码库

appium crawler fiddler python python3 scrapy selenuim spider

Last synced: 21 Aug 2025

https://github.com/himself65/luogucrawler

一个python爬虫来爬取洛谷各种信息

crawler python python3

Last synced: 09 Oct 2025

https://github.com/zenrows/scaling-to-distributed-crawling

Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.

crawler crawling distributed python python3 scraping spider

Last synced: 18 Mar 2026

https://github.com/moskrc/crawlerdetect

🕷CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.

bot crawler detect python spider user-agent

Last synced: 16 Jan 2026

https://github.com/Maicius/UniversityRecruitment-sSurvey

用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?

analysis beautifulsoup crawler data redis university

Last synced: 06 Mar 2025

https://github.com/mrxujiang/crawel

基于Apify+node+react搭建的有点意思的爬虫平台

apify crawler node puppeteer react react-hooks umi umi3

Last synced: 13 Apr 2025

https://github.com/jonaslejon/lolcrawler

Headless web crawler for bugbounty and penetration-testing/redteaming

bugbounty crawler docker penetration-testing penetration-testing-tools redteam redteam-tools redteaming

Last synced: 12 Jul 2025

https://github.com/elboletaire/php-crawler

:spider: A simple crawler (spider) writen in php just for fun, with zero dependencies

crawler php spider

Last synced: 10 Jan 2026

https://github.com/p0dalirius/robotstester

This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.

bugbounty crawler pentesting python robots tool

Last synced: 21 Aug 2025

https://github.com/axetroy/crawler

nodejs 爬虫框架. crawler framework for nodejs

crawler nodejs

Last synced: 18 Jun 2025

https://github.com/maicius/universityrecruitment-ssurvey

用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?

analysis beautifulsoup crawler data redis university

Last synced: 28 Apr 2025

https://github.com/scrapfly/python-scrapfly

Scrapfly Python SDK for headless browsers and proxy rotation

crawler headless-browser python scraper scraping scraping-api sdk web-scraper web-scraping

Last synced: 14 Apr 2025

https://github.com/rix4uni/uforall

uforall is a fast url crawler this tool crawl all URLs number of different sources, alienvault,WayBackMachine,urlscan,commoncrawl

alienvault bugbounty commoncrawl crawler osint recon reconnaissance urlscan wayback

Last synced: 15 Apr 2025

https://github.com/veliovgroup/spiderable-middleware

Pre-rendering for JavaScript websites that delivers SSR-level SEO, enhanced link previews, and performance via effortless middleware integration — ideal for PWAs, SPAs, and modern JS-driven apps, websites, and webpages

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 12 Apr 2025

https://github.com/charlespikachu/seleniumlogin

Login some website using selenium.

crawler selenium selenium-webdriver spider taobao

Last synced: 23 Oct 2025

https://github.com/VeliovGroup/spiderable-middleware

Pre-rendering for JavaScript websites that delivers SSR-level SEO, enhanced link previews, and performance via effortless middleware integration — ideal for PWAs, SPAs, and modern JS-driven apps, websites, and webpages

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 13 May 2025

https://github.com/taseikyo/crawler

:snake:A collection of simple Python crawlers.

baidu-tieba bilibili bing crawler douban pixiv python-crawler python3 youku

Last synced: 19 Oct 2025

https://github.com/kkomelin/insecres

A console tool that finds insecure resources on HTTPS sites

crawler finder https security

Last synced: 22 Jun 2025

https://github.com/VAllens/CrawlerSamples

This is a Puppeteer+AngleSharp crawler console app samples, used C# 7.1 coding and dotnet core build.

anglesharp chsarp crawler dotnetcore headless headless-browsers headless-chrome headless-chromium puppeteer

Last synced: 04 May 2025

https://github.com/ryuchen/deadpool

该项目是一个使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫工作,所有组件都能够原生支持规模并发和分布式,加上celery原生的分布式调用,实现大规模并发。

celery crawler deadpool python3 spider taobao taobao-spider tmall tmall-spider

Last synced: 21 Mar 2025

https://github.com/armand1m/papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

cache crawler jsdom nodejs scraper scraping typescript web-scraping

Last synced: 28 Jan 2026

https://github.com/kylemocode/medium-stat-box

Practical pinned gist which show your latest medium status 📌

awesome-pinned-gists crawler github-action github-gists medium-stats

Last synced: 17 Apr 2025

https://github.com/NatsuFox/Tapestry

Tapestry - 基于 Agent Skill Bundle 的轻量级书签知识库 https://natsufox.github.io/Tapestry

agent-skills claude-code codex crawler knowledge-base openclaw workflow

Last synced: 27 Apr 2026

https://github.com/m-ahmadi/tse-client

A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.

browser caching cli cli-app compression crawler data dataset downloader iran node-module stock stock-data stock-market stock-prices tehran ticker tsetmc universal

Last synced: 18 Feb 2026

https://github.com/m-haisham/novelsave_sources

A collection of webnovel sources offering varying amounts of scraping capability.

crawler lightnovel scraper

Last synced: 22 Jan 2026

https://github.com/migalabs/armiarma

Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network

crawler ethereum libp2p monitoring

Last synced: 21 Aug 2025

https://github.com/bin-huang/nodespider

[DEPRECATED] Simple, flexible, delightful web crawler/spider package

async crawl crawler node pipeline promise spider web

Last synced: 19 Sep 2025

https://github.com/hengxin666/bilibili_danmu_crawling

爬取B站历史弹幕/全弹幕, 支持高级弹幕, Bas弹幕爬取. [2025年]可用; 内有算法可保证几乎不丢失弹幕情况下, 减少请求次数, 以提高爬取速度; 有GUI界面, 支持继续爬取. 通过二分确认最早有弹幕的日期, 再而爬取; 内置弹幕文件去重和弹幕文件合并功能

bilibili-danmaku crawler danmaku python

Last synced: 24 Jul 2025

https://github.com/iljan/narr

Download audio tracks from Netflix to sample your favorite shows

chrome-devtools-protocol cli crawler downloader music

Last synced: 27 Jul 2025

https://github.com/scrapy-plugins/scrapy-zyte-api

Zyte API integration for Scrapy

crawler plugin proxy scraping scrapy

Last synced: 04 Apr 2025

https://github.com/crawlerclub/crawler

Crawler4U, a general purpose focused crawler

crawler information-extraction spider

Last synced: 17 Jan 2026

https://github.com/twtrubiks/auto_crawler_ptt_beauty_image

Auto Crawler Ptt Beauty Image Use Python Schedule

beauty crawler heroku image ptt python schedule tutorial

Last synced: 26 Jun 2025

https://github.com/hackfengJam/ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

crawler distributed-systems django elasticsearch scrapy

Last synced: 28 Mar 2025

https://github.com/flulemon/sneakpeek

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis

crawler crawler-python crawlers crawling crawling-engine crawling-framework python python3 scraper scraper-api scraper-engine scrapers scraping scraping-framework vue website-crawler

Last synced: 14 Jan 2026

https://github.com/jfreegman/toxcrawler

A Tox DHT network crawler

crawler dht dht-network tox toxcore

Last synced: 14 Apr 2025

https://github.com/safonovpro/node-html-crawler

Simple for use node html crawler (spider) of site web pages

crawler es6 node spider

Last synced: 12 Mar 2026

https://github.com/gamemann/bestbuy-parser

A personal tool using Python's Scrapy framework to scrape Best Buy's product pages for RTX 3080 TIs and notify if available/not sold out.

3080 automation best bestbuy bot buy crawler parser python python3 rtx scrapy ti

Last synced: 11 Mar 2026

https://github.com/heyingcai/cetty

基于事件分发的爬虫框架

crawler event-dispatcher gather spider

Last synced: 04 May 2025

https://github.com/xfgryujk/taobaoanalysis

练习NLP,分析淘宝评论的项目

crawler nlp taobao

Last synced: 16 Apr 2025

https://github.com/andreaskoch/gargantua

The fast website crawler

command-line crawler golang xml-sitemap

Last synced: 14 Apr 2025

https://github.com/haxzie-xx/instagram-downloader

Node.js/Express app to retrive instagram video/image download urls

crawler downloader express instagram instagram-scraper nodejs

Last synced: 18 Mar 2025

https://github.com/proxzima/darkspider

Anatomy and Visualization of the Network structure of the Dark web using multi-threaded crawler

collaborate crawler dark-web extractor github github-pages hacktoberfest networkx onion osint python scraper tor

Last synced: 14 Mar 2026

https://github.com/wolverinn/igxe-c5-buff-csgo-skins-sale-data-catch

Automatically get the csgo skins sale data on igxe.cn and buff and c5game.com.You can choose the specific skins to get data.

crawler csgo-skin

Last synced: 25 Mar 2025

https://github.com/apocelipes/schannel-qt5

A GUI client of schannel powered by therecipe/qt and golang

client-side crawler go golang goqt linux qcharts qt5

Last synced: 07 May 2025

https://github.com/ph-7/crawling-emails

Very simple bash script to crawl email addresses from a specific website.

bash crawler email email-scraper scrape scrape-email scraper scraping shell wget

Last synced: 22 Aug 2025

https://github.com/miry/medup

Download all content from Medium and Dev.to to local folder

cli crawler devto json markdown medium sync tool

Last synced: 08 Apr 2025

https://github.com/helviojunior/filecrawler

File Crawler index files and search hard-coded credentials

crawler crawling-python elasticsearch leaks leaks-scanner

Last synced: 08 Apr 2025

https://github.com/subins2000/phpwebcrawler

A Web Crawler Created in PHP

crawler php

Last synced: 05 May 2025

https://github.com/juzeon/advanced-php-crawler

新浪博客文章/wenku8轻小说文库爬虫,可抓取图片保存,一键制作电子书。kindle读书党的神器!

calibre crawler gitbook kindle php sina

Last synced: 20 Feb 2026

https://github.com/pykong/pypergrabber

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

crawler email-inbox google-scholar pdf pmid pubmed python sci-hub scraper

Last synced: 15 Apr 2025

https://github.com/code4everything/visual-spider

欢迎体验我们全新的桌面端效率工具RunFlow,https://myrest.top/myflow

crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization

Last synced: 04 Oct 2025

https://github.com/a252937166/toutiaocrawler

头条号爬虫案例

crawler toutiao

Last synced: 06 Jul 2025

https://github.com/dept/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 04 Mar 2026

https://github.com/fanhuaandluomu/sina_spider

新浪微博爬虫:登录、关键词微博查询、微博监控

crawler python-2 sina-spider

Last synced: 09 Apr 2025

https://github.com/mamal72/iranian-calendar-events

Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website

crawler events iranian jalali jalali-calendar persian

Last synced: 07 May 2025

https://github.com/debugtalk/webcrawler

A web crawler based on requests-html, mainly targets for url validation test.

crawler requests-html web-crawler weblink

Last synced: 15 Apr 2025

https://github.com/deptagency/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 08 Jul 2025

https://github.com/gomjellie/pysaint

[deprecated] 유세인트 파이썬 클라이언트

crawler sap soongsil unofficial

Last synced: 30 Apr 2025

https://github.com/howie6879/php-google

Google search results crawler, get google search results that you need - php

crawler google-search php-google

Last synced: 16 May 2025

https://github.com/gimnathperera/abans-lk-webscraping

🌐 Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan web sites

crawler python scrapy spider

Last synced: 30 Jun 2025

https://github.com/mjavadhpour/telegram-member-inviter

Crawling client's groups and channels to invite their members to a target group.

crawler python python3 robot telegram telegram-client telethon

Last synced: 19 Apr 2025

https://github.com/koallen/google-image-downloader

A script to download images from images.google.com

crawler google-images selenium

Last synced: 18 Jan 2026

https://github.com/tychozzz/article_crawler

✨ Article Crawler is a package used to crawl articles with Markdown format from a specific webpage and store them locally in HTML / Markdown formats.

article crawler html markdown pypi python

Last synced: 30 Apr 2025

https://github.com/k1low/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 16 Apr 2025

https://github.com/mattwang44/uspto-patft-web-crawler

Crawler for fetching information of US Patents and PDF bulk download

crawler patent patent-crawler pyqt5 python3 uspto

Last synced: 11 Oct 2025

https://github.com/k1LoW/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 08 Jul 2025

https://github.com/jurooravec/crawlee-one

Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.

actor apify crawlee crawler framework scraper scraping web

Last synced: 09 Feb 2026

https://github.com/simionrobert/bitinsight

:earth_africa: Bittorrent Network Overview through Infohash Indexing, Metadata and IP visualisations of the DHT network

bep51 bittorrent crawler dht elasticsearch infohash javascript nodejs torrent

Last synced: 13 Apr 2025

https://github.com/endermanch/ddom

A simple, open-source, easy to use, and free download manager for malware samples.

crawler downloader malware manager samples

Last synced: 06 Sep 2025

https://github.com/codelibs/fess-crawler

Web/FileSystem Crawler Library

crawler java

Last synced: 07 Apr 2025

https://github.com/italia/publiccode-crawler

publiccode.yml crawler for the Open Source software catalog of Developers Italia

crawler developers-italia hacktoberfest publiccode publiccodeyml

Last synced: 10 Feb 2026

https://github.com/bigsk1/supa-crawl-chat

Integrates Supabase with Crawl4AI and AI Chat to create a powerful web crawling and semantic search solution. Streamlit supabase data visualization. Run all in Docker. API and more!

crawl4ai crawler docker embeddings fastapi gpt-4o openai-api pgvector postgresql scraping streamlit supabase

Last synced: 15 May 2026

https://github.com/riptl/ytpriv

YT metadata exporter

big-data crawler csv datascience json video youtube

Last synced: 10 May 2025

https://github.com/alehkot/job-funnel-ts

Automated tool for scraping job postings into a .xlsx files inspired by Job Funnel.

crawler hacktoberfest jobs typescript

Last synced: 03 Aug 2025

https://github.com/o8e/soccer-scrape

:page_with_curl: Scrape football data from Bet365

bet365 betting crawler es6 football javascript puppeteer scraper soccer

Last synced: 10 Mar 2026

https://github.com/hunterhug/marmot

💐Marmot A Golang HTTP Download

crawler gohttp gospider marmot spider

Last synced: 21 Jan 2026

https://github.com/alex-page/get-site-urls

🔗 Get all of the URL's from a website.

crawler sitemap-generator urls

Last synced: 16 Mar 2025

https://github.com/ERap320/CrowLeer

Powerful C++ web crawler based on libcurl

cli crawler crawling download

Last synced: 10 May 2025

https://github.com/ysh329/douban-crawler

抓取豆瓣小组相关信息(小组、用户、帖子)。

crawler douban douban-crawler

Last synced: 13 Jun 2025