An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/twtrubiks/line-bot-tutorial

line-bot-tutorial use python flask

bot crawler heroku line ptt python-flask tutorial

Last synced: 16 May 2025

https://github.com/jairovadillo/pychromeless

Python Lambda Chrome Automation (naming pending)

automation aws-lambda chrome chromium crawler python selenium

Last synced: 12 Mar 2026

https://github.com/GraySilver/wencai

This is a wencai crawler.(i问财的策略回测接口的Pythonic工具包)

crawler finance pandas quant quantitative-finance tushare wencai

Last synced: 27 Mar 2025

https://github.com/oppsec/pinkerton

🕵️ JavaScript file crawler and secret finder tool developed with Python

crawl crawler hacktoberfest javascript pentest python python3 redteam secrets

Last synced: 31 Mar 2025

https://github.com/oxylabs/python-web-scraping-tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 16 May 2025

https://github.com/eight04/comiccrawler

An image crawler written in Python.

cli crawler gui image-crawler python tkinter

Last synced: 15 May 2025

https://github.com/viasite/site-audit-seo

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx

audit cli crawl-site crawler lighthouse puppeteer scraper seo seo-audit seo-site-audit site-audit xlsx

Last synced: 14 Mar 2026

https://github.com/BlessedRebuS/Krawl

Krawl is a customizable lightweight cloud native web deception server and anti-crawler that creates fake web applications with low-hanging vulnerabilities and realistic, randomly generated decoy data

anti-crawling blue-team cloud-native crawler cybersecurity deception honeypot kubernetes security self-hosted spider web

Last synced: 11 Feb 2026

https://github.com/devanshbatham/Gorecon

Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into their arsenal

admin-panel-finder backups-finder cmsdetecter configurationfiles crawler directory-bruteforce dns dnsrecon email-hunter geo-ip nameserver recon reconaissance reverse-dns scanner subdomain-enumeration subdomain-scanner subnet-lookup whois-lookup wordpress-scanner

Last synced: 03 Apr 2025

https://github.com/eight04/ComicCrawler

An image crawler written in Python.

cli crawler gui image-crawler python tkinter

Last synced: 03 Aug 2025

https://github.com/chenjiandongx/github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 13 Apr 2025

https://github.com/Jasonnor/th-music-video-generator

Touhou Project random music video generator/player, crawling image and video from websites to generate MV.

crawler javascript music-video touhou web

Last synced: 27 Apr 2025

https://github.com/zhupingqi/RuiJi.Net

crawler framework, distributed crawler extractor

crawler extractor headless-chrome netcore owin scraper scrapy

Last synced: 04 May 2025

https://github.com/chenjiandongx/Github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 30 Apr 2025

https://github.com/algolia/algoliasearch-netlify

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler

algolia algolia-crawler algoliasearch crawler jamstack netlify netlify-plugin search

Last synced: 03 Oct 2025

https://github.com/hezhizheng/go-movies

golang spider Crawler 爬虫 电影

colly crawler docker fasthttp go gocolly golang movies redis spider

Last synced: 16 May 2025

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 28 Mar 2025

https://github.com/lucasjinreal/weibo_terminator_workflow

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

crawler nlp scraper sentiment-analysis weibo-terminator

Last synced: 05 Mar 2026

https://github.com/antchfx/antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

crawler crawling framework golang scraping web-crawler web-spider

Last synced: 14 Mar 2025

https://github.com/zntfdr/Selenops

A Swift Web Crawler 🕷

command-line-tool crawler scripting swift web

Last synced: 18 Jul 2025

https://github.com/outpoot/vyntr

Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com

crawler duckduckgo engine google python rust search tantivy web

Last synced: 15 May 2025

https://github.com/xyntax/filesensor

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

crawler fuzzing pentesting scrapy

Last synced: 03 Sep 2025

https://github.com/zrashwani/arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

crawler php scraping seo

Last synced: 13 Jan 2026

https://github.com/zntfdr/selenops

A Swift Web Crawler 🕷

command-line-tool crawler scripting swift web

Last synced: 08 May 2025

https://github.com/6677-ai/tap4-ai-crawler

The crawler opened source by tap4.ai

aitoolkit aitools crawler crawler-engine crawler-python

Last synced: 16 May 2025

https://github.com/turnersoftware/infinitycrawler

A simple but powerful web crawler library for .NET

crawler robots-txt spider web-crawler web-crawling

Last synced: 21 Jun 2025

https://github.com/dwisiswant0/galer

A fast tool to fetch URLs from HTML attributes by crawl-in.

crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls

Last synced: 12 Apr 2025

https://github.com/myvyang/chromium_for_spider

dynamic crawler for web vulnerability scanner

chromium crawler puppeteer security spider

Last synced: 11 Jul 2025

https://github.com/amerkurev/scrapper

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

crawler crawler-python crawling headless readability scraper scraping web-parsers web-parsing web-scraping

Last synced: 08 May 2025

https://github.com/cwjokaka/ok_ip_proxy_pool

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

aiohttp async beautifulsoup4 crawler flask http ip pool proxy proxypool py python python3 spider sqlite

Last synced: 13 Apr 2025

https://github.com/TurnerSoftware/InfinityCrawler

A simple but powerful web crawler library for .NET

crawler robots-txt spider web-crawler web-crawling

Last synced: 25 Mar 2025

https://github.com/vitorfs/woid

Simple news aggregator displaying top stories in real time

crawler django news

Last synced: 09 Apr 2025

https://github.com/mohammedcha/gplay-scraper

GPlay Scraper is a powerful Python Google Play scraper library for extracting comprehensive app data from the Google Play Store. Scrape Google Play Store apps to get ratings, install counts, reviews, ASO metrics, developer information, and 65+ data fields

android app-analytics crawler google google-play play-store play-store-api playstore scarper scraper

Last synced: 02 Mar 2026

https://github.com/ptt-alertor/ptt-alertor

:loudspeaker: Ptt 文章通知機器人!Notify Ptt Article in Realtime

chatbot crawler linebot messenger-bot ptt telegram-bot

Last synced: 14 Jan 2026

https://github.com/kong36088/ZhihuSpider

多线程知乎用户爬虫,基于python3

crawler multi-threading python python3 spider zhihu

Last synced: 19 Jul 2025

https://github.com/spatie/robots-txt

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

crawler php robots-txt

Last synced: 14 May 2025

https://github.com/ScottSloan/Bili23-Downloader

下载 Bilibili 视频/番剧/电影/纪录片 等资源

bilibili crawler linux macos python videodownloader windows wxpython

Last synced: 16 Mar 2025

https://github.com/lgh06/web-page-monitor

Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。

change-alert change-detection change-monitor crawler monitor website-change-monitor website-monitoring

Last synced: 31 Oct 2025

https://github.com/R4yGM/dorkscout

DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

bug-bounty crawler ghdb golang google-dorks osint scraper security

Last synced: 11 Jul 2025

https://github.com/zhaotianff/csharpcrawler

C#爬虫示例程序,想学习爬虫入门知识的可以看过来。后续会慢慢加入更多爬虫相关的知识。

crawler csharp wpf

Last synced: 09 Apr 2025

https://github.com/vormkracht10/laravel-seo-scanner

Scan your Laravel application routes for SEO improvements suggestions.

crawler laravel laravel-framework laravel-seo laravel-seo-scanner scanner seo seo-optimization seo-tools seotools

Last synced: 15 Apr 2025

https://github.com/redco/goose-parser

Universal scraping tool, which allows you to extract data using multiple environments

browser crawler docker goose jsdom nodejs parser parsing phantomjs scraper scraping

Last synced: 09 Apr 2025

https://github.com/tufayellus/linkedin-scraper

A LinkedIn Scraper to scrape up to 1k LinkedIn profiles(due to LinkedIn limit) from company profile links and save their e-mail addresses if available! (actively maintained, if anything doesn't work, open an issue in the repo)

crawler digital-marketing email-marketing email-scraper leads linkedin linkedin-bot linkedin-gui linkedin-scraper linkedin-scraper-gui scrape-email scrape-emails scraper scraper-engine

Last synced: 27 Oct 2025

https://github.com/crawlab-team/crawlab-lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

crawlab crawler crawler-management crawling-tasks platform scrapy scrapy-ui scrapyd scrapyd-ui spider web-crawler

Last synced: 28 Jan 2026

https://github.com/zhaow-de/rotating-tor-http-proxy

A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.

amd64 arm64 armv6 armv7 crawler docker-image dockerhub-image haproxy multi-platform privoxy-tor proxy tor

Last synced: 13 Feb 2026

https://github.com/gaussic/weibo_wordcloud

根据关键词抓取微博数据,再生成词云

crawler keyword search weibo wordcloud

Last synced: 27 Jun 2025

https://github.com/icy/google-group-crawler

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.

bash cookie crawler curl google ownership wget

Last synced: 10 Apr 2025

https://github.com/linkedtales/scrapedin-linkedin-crawler

Crawler for LinkedIn full profiles 2019

crawler linkedin linkedin-crawler

Last synced: 08 Apr 2025

https://github.com/forcefledgling/proxyhub

An advanced [Finder | Checker | Server] tool for proxy servers, supporting both HTTP(S) and SOCKS protocols. 🎭

anonymity anonymous crawler free-proxy free-proxy-list http-proxy privacy proxies proxy proxy-checker proxy-grabber proxy-list proxy-scraper proxy-scrapper proxy-server proxy-tool proxypool socks socks4 socks5

Last synced: 05 Oct 2025

https://github.com/crypto-crawler/crypto-crawler-rs

A rock-solid cryptocurrency crawler library.

crawler cryptocurrency websocket

Last synced: 12 Dec 2025

https://github.com/songtianyi/laosj

golang light-weight image crawler

aiss crawler douban downloader girls image meizitu sexy spiders

Last synced: 16 Jan 2026

https://github.com/Norconex/crawler

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

collector-fs collector-http crawler crawlers filesystem-crawler flexible java search-engine web-crawler

Last synced: 11 Jun 2026

https://github.com/macacajs/NoSmoke

A cross platform UI crawler which scans view trees then generate and execute UI test cases.

android crawler ios macaca smoke-tests test-automation webdriver

Last synced: 15 Apr 2025

https://github.com/mgleon08/instagram-crawler

Crawl instagram photos, posts and videos for download.

crawler gem instagram instagram-crawler instagram-scraper ruby rubygems scraper

Last synced: 06 Apr 2025

https://github.com/webysther/packagist-mirror

📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer

composer composer-packages crawler mirror packagist packagist-mirror php

Last synced: 30 Dec 2025

https://github.com/subins2000/search

An Open Source Search Engine

crawler php search search-engine

Last synced: 09 Apr 2025

https://github.com/Josue87/MetaFinder

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

crawler metadata osint

Last synced: 12 Jul 2025

https://github.com/Webysther/packagist-mirror

📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer

composer composer-packages crawler mirror packagist packagist-mirror php

Last synced: 02 Apr 2025

https://github.com/elliotxx/zhihu-crawler-people

A simple distributed crawler for zhihu && data analysis

crawler python python-crawler spider web-crawler web-spider

Last synced: 13 Apr 2025

https://github.com/cocrawler/cocrawler

CoCrawler is a versatile web crawler built using modern tools and concurrency.

aiohttp aiohttp-client async-python concurrency crawler pluggable-modules python3 screenshot warc

Last synced: 14 Dec 2025

https://github.com/codesofun/web-bee

🐝 Web vertical crawler framework for fun

crawler framework java java-8 webbee

Last synced: 13 Apr 2025

https://github.com/gosom/scrapemate

Golang Crawling and scraping framework

crawler go go-framework golang scraper spider web-crawler web-scraping

Last synced: 31 Jan 2026

https://github.com/AnyISalIn/zhihu_fun

基于 Selenium 的知乎关键词爬虫

crawler python python3 selenium zhihu

Last synced: 27 Mar 2025

https://github.com/ma63d/leetcode-spider

用 node.js 爬你自己的 leetcode 解题源码

algorithm co crawler leetcode nodejs

Last synced: 09 Apr 2025

https://github.com/nfx/slrp

rotating open proxy multiplexer

crawler golang proxy proxy-checker proxy-list proxy-pool proxy-server

Last synced: 04 Apr 2025

https://github.com/bytebuff/scrapingoutsourcing

ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个

appium crawler docker requests scrapy spider

Last synced: 07 Apr 2026

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/jarryshaw/darc

Darkweb Crawler Project

crawler darkweb

Last synced: 18 Jun 2025

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 18 Apr 2025

https://github.com/Jiramew/spoon

🥄 A package for building specific Proxy Pool for different Sites.

crawler distributed ip proxies proxy proxy-provider proxypool python redis spider spoon

Last synced: 07 Apr 2025

https://github.com/norconex/crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

collector-fs collector-http crawler crawlers filesystem-crawler flexible java search-engine web-crawler

Last synced: 09 Jun 2026

https://github.com/N0taN3rd/Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving

Last synced: 06 Apr 2025

https://github.com/guilhermecgs/ir

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir

acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping

Last synced: 26 Apr 2025

https://github.com/n0tan3rd/squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving

Last synced: 13 Sep 2025

https://github.com/karust/gogetcrawl

Extract web archive data using Wayback Machine and Common Crawl

commoncrawl concurrency crawler golang wayback-machine webarchive

Last synced: 15 Jan 2026

https://github.com/fanhuaandluomu/pkulaw_spider

爬取北大法宝网http://www.pkulaw.cn/Case/

ai crawler law python-2 spider

Last synced: 12 Sep 2025

https://github.com/zhangbohan/fun_crawler

Crawl some picture for fun

crawler meizitu python spider

Last synced: 20 Aug 2025

https://github.com/cytopia/urlbuster

Powerful mutable web directory fuzzer to bruteforce existing and/or hidden files or directories.

brute-force bruteforce bruteforce-attacks crawler cytopia-sec url-bruteforcer

Last synced: 09 Apr 2025

https://github.com/stulzq/HttpCode.Core

简单、易用、高效 一个有态度的开源.Net Http请求框架!可以用制作爬虫,api请求等等。

crawler httpcode httpmock httprequest net-core net-standard

Last synced: 04 May 2025

https://github.com/JarryShaw/darc

Darkweb Crawler Project

crawler darkweb

Last synced: 27 Mar 2025

https://github.com/chenjiandongx/soksaccounts

🔥 Shadowsocks 账号爬虫

crawler shadowsocks

Last synced: 22 Apr 2025

https://github.com/beb7/gflare-tk

Open-Source Python Based SEO Web Crawler

crawler python robots-txt scraper seo seo-crawler tkinter

Last synced: 07 May 2025

https://github.com/vinaygopinath/ngmeta

Dynamic meta tags in your AngularJS single page application

angularjs crawler meta-tags opengraph seo ui-router

Last synced: 22 Oct 2025

https://github.com/vinaygopinath/ngMeta

Dynamic meta tags in your AngularJS single page application

angularjs crawler meta-tags opengraph seo ui-router

Last synced: 18 Jul 2025

https://github.com/clarketm/s3recon

Amazon S3 bucket finder and crawler.

crawler finder python recon s3 s3-bucket

Last synced: 07 Apr 2025