Crawler | Ecosyste.ms: Awesome

https://github.com/twtrubiks/line-bot-tutorial

line-bot-tutorial use python flask

bot crawler heroku line ptt python-flask tutorial

Last synced: 16 May 2025

https://github.com/s0rg/crawley

The unix-way web crawler

cli crawler go golang golang-application pentest pentest-tool pentesting unix-way web-crawler web-scraping web-spider

Last synced: 16 May 2025

https://github.com/jairovadillo/pychromeless

Python Lambda Chrome Automation (naming pending)

automation aws-lambda chrome chromium crawler python selenium

Last synced: 12 Mar 2026

https://github.com/flairNLP/fundus

A very simple news crawler with a funny name

cc-news commoncrawl corpus crawler news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping

Last synced: 04 Mar 2025

https://github.com/GraySilver/wencai

This is a wencai crawler.（i问财的策略回测接口的Pythonic工具包）

crawler finance pandas quant quantitative-finance tushare wencai

Last synced: 27 Mar 2025

https://github.com/oppsec/pinkerton

🕵️ JavaScript file crawler and secret finder tool developed with Python

crawl crawler hacktoberfest javascript pentest python python3 redteam secrets

Last synced: 31 Mar 2025

https://github.com/oxylabs/python-web-scraping-tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 16 May 2025

https://github.com/mustafadalga/instagram-bot

An Instagram bot developed using the Selenium Framework

automation automation-selenium bot bulk-comments bulk-unfollow crawler crawling download-stories instagram instagram-api instagram-bot instagram-downloader instagram-without-api mass-liking python python3 selenium selenium-framework selenium-python selenium-webdriver

Last synced: 02 Oct 2025

https://github.com/eight04/comiccrawler

An image crawler written in Python.

cli crawler gui image-crawler python tkinter

Last synced: 15 May 2025

https://github.com/viasite/site-audit-seo

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx

audit cli crawl-site crawler lighthouse puppeteer scraper seo seo-audit seo-site-audit site-audit xlsx

Last synced: 14 Mar 2026

https://github.com/BlessedRebuS/Krawl

Krawl is a customizable lightweight cloud native web deception server and anti-crawler that creates fake web applications with low-hanging vulnerabilities and realistic, randomly generated decoy data

anti-crawling blue-team cloud-native crawler cybersecurity deception honeypot kubernetes security self-hosted spider web

Last synced: 11 Feb 2026

https://github.com/devanshbatham/Gorecon

Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into their arsenal

admin-panel-finder backups-finder cmsdetecter configurationfiles crawler directory-bruteforce dns dnsrecon email-hunter geo-ip nameserver recon reconaissance reverse-dns scanner subdomain-enumeration subdomain-scanner subnet-lookup whois-lookup wordpress-scanner

Last synced: 03 Apr 2025

https://github.com/eight04/ComicCrawler

An image crawler written in Python.

cli crawler gui image-crawler python tkinter

Last synced: 03 Aug 2025

https://github.com/chenjiandongx/github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 13 Apr 2025

https://github.com/Jasonnor/th-music-video-generator

Touhou Project random music video generator/player, crawling image and video from websites to generate MV.

crawler javascript music-video touhou web

Last synced: 27 Apr 2025

https://github.com/zhupingqi/RuiJi.Net

crawler framework, distributed crawler extractor

crawler extractor headless-chrome netcore owin scraper scrapy

Last synced: 04 May 2025

https://github.com/chenjiandongx/Github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 30 Apr 2025

https://github.com/algolia/algoliasearch-netlify

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler

algolia algolia-crawler algoliasearch crawler jamstack netlify netlify-plugin search

Last synced: 03 Oct 2025

https://github.com/hezhizheng/go-movies

golang spider Crawler 爬虫电影

colly crawler docker fasthttp go gocolly golang movies redis spider

Last synced: 16 May 2025

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 28 Mar 2025

https://github.com/lucasjinreal/weibo_terminator_workflow

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

crawler nlp scraper sentiment-analysis weibo-terminator

Last synced: 05 Mar 2026

https://github.com/antchfx/antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

crawler crawling framework golang scraping web-crawler web-spider

Last synced: 14 Mar 2025

https://github.com/zntfdr/Selenops

A Swift Web Crawler 🕷

command-line-tool crawler scripting swift web

Last synced: 18 Jul 2025

https://github.com/rodrigogs/xvideos

xvideos API library

api crawler library nodejs npm porn scrapper xvideos

Last synced: 26 Apr 2026

https://github.com/outpoot/vyntr

Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com

crawler duckduckgo engine google python rust search tantivy web

Last synced: 15 May 2025

https://github.com/xyntax/filesensor

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

crawler fuzzing pentesting scrapy

Last synced: 03 Sep 2025

https://github.com/zrashwani/arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

crawler php scraping seo

Last synced: 13 Jan 2026

https://github.com/zntfdr/selenops

A Swift Web Crawler 🕷

command-line-tool crawler scripting swift web

Last synced: 08 May 2025

https://github.com/6677-ai/tap4-ai-crawler

The crawler opened source by tap4.ai

aitoolkit aitools crawler crawler-engine crawler-python

Last synced: 16 May 2025

https://github.com/turnersoftware/infinitycrawler

A simple but powerful web crawler library for .NET

crawler robots-txt spider web-crawler web-crawling

Last synced: 21 Jun 2025

https://github.com/dwisiswant0/galer

A fast tool to fetch URLs from HTML attributes by crawl-in.

crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls

Last synced: 12 Apr 2025

https://github.com/myvyang/chromium_for_spider

dynamic crawler for web vulnerability scanner

chromium crawler puppeteer security spider

Last synced: 11 Jul 2025

https://github.com/amerkurev/scrapper

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

crawler crawler-python crawling headless readability scraper scraping web-parsers web-parsing web-scraping

Last synced: 08 May 2025

https://github.com/cwjokaka/ok_ip_proxy_pool

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

aiohttp async beautifulsoup4 crawler flask http ip pool proxy proxypool py python python3 spider sqlite

Last synced: 13 Apr 2025

https://github.com/TurnerSoftware/InfinityCrawler

A simple but powerful web crawler library for .NET

crawler robots-txt spider web-crawler web-crawling

Last synced: 25 Mar 2025

https://github.com/sudheer-ranga/aliexpress-product-scraper

Get Aliexpress product details as a json response including feedbacks, variants, shipping info, description, images, etc.,

aliexpress aliexpress-api aliexpress-crawler aliexpress-product-json aliexpress-product-scraper aliexpress-scraper aliexpress-spider crawler dropship dropshipping hacktoberfest hacktoberfest19 hacktoberfest2019 product-json product-reviews product-scraper scraper spider

Last synced: 06 Apr 2025

https://github.com/vitorfs/woid

Simple news aggregator displaying top stories in real time

crawler django news

Last synced: 09 Apr 2025

https://github.com/mohammedcha/gplay-scraper

GPlay Scraper is a powerful Python Google Play scraper library for extracting comprehensive app data from the Google Play Store. Scrape Google Play Store apps to get ratings, install counts, reviews, ASO metrics, developer information, and 65+ data fields

android app-analytics crawler google google-play play-store play-store-api playstore scarper scraper

Last synced: 02 Mar 2026

https://github.com/ovnrain/javbus-api

一个自我托管的 JavBus API 服务

adults api api-server crawler docker javbus magnet nodejs spider typescript vercel vercel-deployment

Last synced: 09 Apr 2025

https://github.com/ptt-alertor/ptt-alertor

:loudspeaker: Ptt 文章通知機器人！Notify Ptt Article in Realtime

chatbot crawler linebot messenger-bot ptt telegram-bot

Last synced: 14 Jan 2026

https://github.com/dwisiswant0/gf-secrets

Secret and/or credential patterns used for gf.

alienvault-otx bugbounty crawler gau gf gitleaks infosec open-threat-exchange secrets-detection trufflehog trufflehog3 wayback wayback-machine waybackurl

Last synced: 20 Jul 2025

https://github.com/kong36088/ZhihuSpider

多线程知乎用户爬虫，基于python3

crawler multi-threading python python3 spider zhihu

Last synced: 19 Jul 2025

https://github.com/spatie/robots-txt

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

crawler php robots-txt

Last synced: 14 May 2025

https://github.com/ScottSloan/Bili23-Downloader

下载 Bilibili 视频/番剧/电影/纪录片等资源

bilibili crawler linux macos python videodownloader windows wxpython

Last synced: 16 Mar 2025

https://github.com/lgh06/web-page-monitor

Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。

change-alert change-detection change-monitor crawler monitor website-change-monitor website-monitoring

Last synced: 31 Oct 2025

https://github.com/R4yGM/dorkscout

DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

bug-bounty crawler ghdb golang google-dorks osint scraper security

Last synced: 11 Jul 2025

https://github.com/zhaotianff/csharpcrawler

C#爬虫示例程序，想学习爬虫入门知识的可以看过来。后续会慢慢加入更多爬虫相关的知识。

crawler csharp wpf

Last synced: 09 Apr 2025

https://github.com/vormkracht10/laravel-seo-scanner

Scan your Laravel application routes for SEO improvements suggestions.

crawler laravel laravel-framework laravel-seo laravel-seo-scanner scanner seo seo-optimization seo-tools seotools

Last synced: 15 Apr 2025

https://github.com/redco/goose-parser

Universal scraping tool, which allows you to extract data using multiple environments

browser crawler docker goose jsdom nodejs parser parsing phantomjs scraper scraping

Last synced: 09 Apr 2025

https://github.com/tufayellus/linkedin-scraper

A LinkedIn Scraper to scrape up to 1k LinkedIn profiles(due to LinkedIn limit) from company profile links and save their e-mail addresses if available! (actively maintained, if anything doesn't work, open an issue in the repo)

crawler digital-marketing email-marketing email-scraper leads linkedin linkedin-bot linkedin-gui linkedin-scraper linkedin-scraper-gui scrape-email scrape-emails scraper scraper-engine

Last synced: 27 Oct 2025

https://github.com/crawlab-team/crawlab-lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

crawlab crawler crawler-management crawling-tasks platform scrapy scrapy-ui scrapyd scrapyd-ui spider web-crawler

Last synced: 28 Jan 2026

https://github.com/zhaow-de/rotating-tor-http-proxy

A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.

amd64 arm64 armv6 armv7 crawler docker-image dockerhub-image haproxy multi-platform privoxy-tor proxy tor

Last synced: 13 Feb 2026

https://github.com/kirralabs/indonesian-NLP-resources

data resource untuk NLP bahasa indonesia

corpus corpus-linguistics crawler dataset dependency-parser indonesian indonesian-language named-entity-recognition nlp parallel-corpus pos-tagging sentiment-analysis

Last synced: 15 Apr 2025

https://github.com/gaussic/weibo_wordcloud

根据关键词抓取微博数据，再生成词云

crawler keyword search weibo wordcloud

Last synced: 27 Jun 2025

https://github.com/icy/google-group-crawler

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.

bash cookie crawler curl google ownership wget

Last synced: 10 Apr 2025

https://github.com/linkedtales/scrapedin-linkedin-crawler

Crawler for LinkedIn full profiles 2019

crawler linkedin linkedin-crawler

Last synced: 08 Apr 2025

https://github.com/forcefledgling/proxyhub

An advanced [Finder | Checker | Server] tool for proxy servers, supporting both HTTP(S) and SOCKS protocols. 🎭

anonymity anonymous crawler free-proxy free-proxy-list http-proxy privacy proxies proxy proxy-checker proxy-grabber proxy-list proxy-scraper proxy-scrapper proxy-server proxy-tool proxypool socks socks4 socks5

Last synced: 05 Oct 2025

https://github.com/crypto-crawler/crypto-crawler-rs

A rock-solid cryptocurrency crawler library.

crawler cryptocurrency websocket

Last synced: 12 Dec 2025

https://github.com/songtianyi/laosj

golang light-weight image crawler

aiss crawler douban downloader girls image meizitu sexy spiders

Last synced: 16 Jan 2026

https://github.com/Norconex/crawler

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

collector-fs collector-http crawler crawlers filesystem-crawler flexible java search-engine web-crawler

Last synced: 11 Jun 2026

https://github.com/macacajs/NoSmoke

A cross platform UI crawler which scans view trees then generate and execute UI test cases.

android crawler ios macaca smoke-tests test-automation webdriver

Last synced: 15 Apr 2025

https://github.com/mgleon08/instagram-crawler

Crawl instagram photos, posts and videos for download.

crawler gem instagram instagram-crawler instagram-scraper ruby rubygems scraper

Last synced: 06 Apr 2025

https://github.com/webysther/packagist-mirror

📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer

composer composer-packages crawler mirror packagist packagist-mirror php

Last synced: 30 Dec 2025

https://github.com/subins2000/search

An Open Source Search Engine

crawler php search search-engine

Last synced: 09 Apr 2025

https://github.com/Josue87/MetaFinder

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

crawler metadata osint

Last synced: 12 Jul 2025

https://github.com/0xsha/chainwalker

Rapid Smart Contract Crawler

blockchain crawler dataset evm-bytecode geth security smart-contracts web3

Last synced: 09 Mar 2026

https://github.com/0xsha/ChainWalker

Rapid Smart Contract Crawler

blockchain crawler dataset evm-bytecode geth security smart-contracts web3

Last synced: 11 Jul 2025

https://github.com/Webysther/packagist-mirror

📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer

composer composer-packages crawler mirror packagist packagist-mirror php

Last synced: 02 Apr 2025

https://github.com/elliotxx/zhihu-crawler-people

A simple distributed crawler for zhihu && data analysis

crawler python python-crawler spider web-crawler web-spider

Last synced: 13 Apr 2025

https://github.com/cocrawler/cocrawler

CoCrawler is a versatile web crawler built using modern tools and concurrency.

aiohttp aiohttp-client async-python concurrency crawler pluggable-modules python3 screenshot warc

Last synced: 14 Dec 2025

https://github.com/codesofun/web-bee

🐝 Web vertical crawler framework for fun

crawler framework java java-8 webbee

Last synced: 13 Apr 2025

https://github.com/gosom/scrapemate

Golang Crawling and scraping framework

crawler go go-framework golang scraper spider web-crawler web-scraping

Last synced: 31 Jan 2026

https://github.com/AnyISalIn/zhihu_fun

基于 Selenium 的知乎关键词爬虫

crawler python python3 selenium zhihu

Last synced: 27 Mar 2025

https://github.com/ma63d/leetcode-spider

用 node.js 爬你自己的 leetcode 解题源码

algorithm co crawler leetcode nodejs

Last synced: 09 Apr 2025

https://github.com/nfx/slrp

rotating open proxy multiplexer

crawler golang proxy proxy-checker proxy-list proxy-pool proxy-server

Last synced: 04 Apr 2025

https://github.com/bytebuff/scrapingoutsourcing

ScrapingOutsourcing专注分享爬虫代码尽量每周更新一个

appium crawler docker requests scrapy spider

Last synced: 07 Apr 2026

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/jarryshaw/darc

Darkweb Crawler Project

crawler darkweb

Last synced: 18 Jun 2025

https://github.com/saeeddhqan/evine

Interactive CLI Web Crawler

cli crawler data-mining fuzzing go golang osint scraper web-crawler

Last synced: 12 Jan 2026

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 18 Apr 2025

https://github.com/Jiramew/spoon

🥄 A package for building specific Proxy Pool for different Sites.

crawler distributed ip proxies proxy proxy-provider proxypool python redis spider spoon

Last synced: 07 Apr 2025

https://github.com/norconex/crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

collector-fs collector-http crawler crawlers filesystem-crawler flexible java search-engine web-crawler

Last synced: 09 Jun 2026

https://github.com/N0taN3rd/Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving

Last synced: 06 Apr 2025

https://github.com/guilhermecgs/ir

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir

acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping