Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with spider
A curated list of projects in awesome lists tagged with spider .
https://github.com/wycm/zhihu-crawler
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Last synced: 02 Aug 2024
https://github.com/oltarasenko/crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider
Last synced: 01 Aug 2024
https://github.com/spider-rs/spider
The fastest web crawler written in Rust. Maintained by @a11ywatch.
ai-scraping crawler headless-chrome indexer llm-crawler rust scraping spider web-crawler
Last synced: 31 Jul 2024
https://github.com/elixir-crawly/crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider
Last synced: 29 Sep 2024
https://github.com/madeindjs/spider
The fastest web crawler written in Rust. Maintained by @a11ywatch.
ai-scraping crawler headless-chrome indexer llm-crawler rust scraping spider web-crawler
Last synced: 03 Aug 2024
https://github.com/coder-hxl/x-crawl
x-crawl is a flexible Node.js AI-assisted crawler library. Making crawler work more efficient, intelligent and convenient. ------ x-crawl 是一个灵活的 Node.js AI 辅助爬虫库。使爬虫工作变得更加高效、智能和便捷。(v10 版本已发布)
ai ai-crawl chromium crawl crawler fingerprint flexible javascript multifunction nodejs promise puppeteer spider typescript web
Last synced: 27 Sep 2024
https://github.com/kingschan1204/istock
:point_right:一个基于spring boot 实现的java股票爬虫(仅支持A股),如果你:heart:请:star: . V2升级版正在开发中!
bootstrap echarts java jqgrid mongodb spider spring-boot stock vue2
Last synced: 27 Sep 2024
https://github.com/postmodern/spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
crawler ruby scraper spider spider-links web web-crawler web-scraper web-scraping web-spider
Last synced: 31 Jul 2024
https://github.com/wspl/creeper
:paw_prints: Creeper - The Next Generation Crawler Framework (Go)
crawler cross-platform framework golang language script spider
Last synced: 31 Jul 2024
https://github.com/lb2281075105/python-spider
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath
Last synced: 28 Sep 2024
https://github.com/lb2281075105/Python-Spider
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath
Last synced: 30 Jul 2024
https://github.com/yutto-dev/yutto
:ice_cube: 一个可爱且任性的 B 站视频下载器(bilili V2)
aiohttp asyncio bangumi bilibili coroutines cross-platform danmaku downloader spider video
Last synced: 31 Jul 2024
https://github.com/bookstairs/bookhunter
A download tools for clawing the ebooks from internets.
Last synced: 31 Jul 2024
https://github.com/xnx3/templatespider
扒网站工具,看好哪个网站,指定好URL,自动扒下来做成模版。所见网站,皆可为我所用!
Last synced: 04 Aug 2024
https://github.com/zachleat/glyphhanger
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
font glyphs spider subset subsetting unicode web-fonts webfonts
Last synced: 31 Jul 2024
https://github.com/xuxueli/xxl-crawler
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
crawler distributed flexible java object-oriented spider web xxl-crawler
Last synced: 02 Oct 2024
https://github.com/socialsisteryi/cxkitty
超星学习通答题姬(视频文档观看、模拟答题,无需浏览器、无需油猴,容器/host 运行ok!
beautifulsoup4 chaoxing chaoxingmooc python3 spider terminal-ui xuexitong
Last synced: 03 Oct 2024
https://github.com/polyrabbit/hacker-news-digest
:newspaper: Let ChatGPT Summarize Hacker News for You
chatgpt chatgpt-api crawler data-extraction extract-summaries hacker-news hacker-news-digest hacker-news-reader machine-learning news-aggregator openai openai-api python rss spider
Last synced: 01 Aug 2024
https://github.com/bit4woo/domain_hunter
A Burp Suite Extension that try to find all sub-domain, similar-domain and related-domain of an organization automatically! 基于流量自动收集整个企业或组织的子域名、相似域名、相关域名的burp插件
burp-extensions burp-plugin burpsuite-extender certificate certification domain-discovery domain-hunter domains https-certificate organization-domain related-domain similar-domain sitemap spider subdomain subject-alternative-name subject-name subjectaltname
Last synced: 01 Aug 2024
https://github.com/fengzhizi715/netdiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3
Last synced: 28 Sep 2024
https://github.com/fengzhizi715/NetDiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3
Last synced: 02 Aug 2024
https://github.com/signal18/replication-manager
Signal 18 repman - Replication Manager for MySQL / MariaDB / Percona Server
alerting backups configuration-management failover group-replication gtid haproxy kubernetes leader-election mariadb monitoring mysql opensvc proxysql replication semi-sync slave spider vip
Last synced: 27 Sep 2024
https://github.com/bahaabdelwahed/killshot
A Penetration Testing Framework, Information gathering tool & Website Vulnerability Scanner
auto-scanner cms exploit information-gathering joomla spider vulnerability vulnerability-detection vulnerability-scanner webapp-vul-scanner website-vulnerability-scanner wordpress wp-admin
Last synced: 31 Jul 2024
https://github.com/1061700625/WeChat_Article
爬取微信公众号文章
pyqt5 python3 spider wechat wechat-article
Last synced: 31 Jul 2024
https://github.com/3nock/spidersuite
Advance web security spider/crawler
bugbounty cplusplus crawler gui information-gathering osint-tool pentest qt5 recon security-tools spider web-spider webcrawler
Last synced: 28 Sep 2024
https://github.com/speed/newcrawler
Free Web Scraping Tool with Java
crawler docker scraping spider
Last synced: 01 Aug 2024
https://github.com/josephlimtech/linkedin-profile-scraper-api
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON.
crawler crawling expressjs json linkedin linkedin-bot linkedin-crawler linkedin-profile linkedin-profile-scraper linkedin-scraper linkedin-scraping nodejs profile-data puppeteer scraper scrapers scraping scraping-websites spider website-scraper
Last synced: 26 Sep 2024
https://github.com/alltheplaces/alltheplaces
A set of spiders and scrapers to extract location information from places that post their location on the internet.
geojson hacktoberfest python scrapers scrapy spider
Last synced: 12 Aug 2024
https://github.com/wongzeon/ICP-Checker
ICP备案查询,可查询企业或域名的ICP备案信息,自动完成滑动验证,保存结果到Excel表格,适用于新版的工信部备案管理系统网站,告别频繁拖动验证,以及某站*工具要开通VIP才可查看备案信息的坑
beian icp information-gathering information-security osint-tool python python3 spider
Last synced: 04 Aug 2024
https://github.com/zhuyingda/webster
a reliable high-level web crawling & scraping framework for Node.js.
automation-test automation-ui chromium crawler crawling headless-chrome javascript javascript-framework nodejs nodejs-framework puppeteer scraping-framework spider
Last synced: 27 Sep 2024
https://github.com/spiritLHLS/Hang-up-items
问卷调查项目,云服务器推荐,挂机项目,免费代理,各种脚本收集。欢迎右上角点铃铛及时收取更新信息。(不要fork,低调) Questionnaire project, cloud server recommendation, hanging project, free proxy, various script collection. Welcome to the upper right corner of the point bell to receive timely updates. (Do not fork, low profile)
bitping cash earnapp earnfm honeygain income iproyal money myst p2pclient packetstream passive passiveincome pawns proxyrack shared spider traffmonetizer vps
Last synced: 01 Aug 2024
https://github.com/erma0/douyin
抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。
Last synced: 31 Jul 2024
https://github.com/dirtyfilthy/freshonions-torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
crawler darknet hidden-services onion scraper spider tor
Last synced: 01 Aug 2024
https://github.com/ChenZixinn/spider_reverse
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链
crawler python requests spider
Last synced: 31 Jul 2024
https://github.com/gadfly0x/signature_algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
crawler reverse-engineering spider
Last synced: 02 Aug 2024
https://github.com/asaotomo/FofaMap
FofaMap是一款基于Python3开发的跨平台FOFA API数据采集器,支持普通查询、网站存活检测、统计聚合查询、Host聚合查询、网站图标查询、批量查询等查询功能。同时FofaMap还能够自定义查询FOFA数据,并根据查询结果自动去重和筛选关键字,生成对应的Excel表格。另外春节特别版还可以调用Nuclei对FofaMap查询出来的目标进行漏洞扫描,让你在挖洞路上快人一步。
api bat excel fofa nuclei python3 scan spider
Last synced: 04 Aug 2024
https://github.com/ccforward/zhihu
✨ zhihu daily Node.js、Vue.js ...
node-vue nodejs spider vue vue2-vuex-webpack zhihu-daily
Last synced: 01 Aug 2024
https://github.com/barretlee/kindleBookMaker
Kindle Book Maker with KindleGen, Make Book from RSS/single URL/directory and so on.
book-generator kindle kindlegen rss spider
Last synced: 01 Aug 2024
https://github.com/sethsec/celerystalk
An asynchronous enumeration & vulnerability scanner. Run all the tools on all the hosts.
celery enumeration gobuster nessus nikto nmap scanning screenshot spider subdomain virtual-hosts vulnerability-assessment vulnerability-scanners
Last synced: 01 Aug 2024
https://github.com/cyubuchen/free_proxy_website
获取免费socks/https/http代理的网站集合
crawler free-proxy-list ip proxy proxy-checker spider
Last synced: 03 Aug 2024
https://github.com/Malwarize/webpalm
🕸️ Crawl in the web network
crawler crawling data data-science datamining go golang hack mining osint redteam spider tool
Last synced: 01 Aug 2024
https://github.com/hemin1003/java-spider
一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。
elasticsearch scraper spider webmagic
Last synced: 04 Aug 2024
https://github.com/xiyuan-fengyu/ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript
Last synced: 26 Sep 2024
https://github.com/IAmStoxe/urlgrab
A golang utility to spider through a website searching for additional links.
Last synced: 01 Aug 2024
https://github.com/xdavidhu/portspider
🕷 A lightning fast multithreaded network scanner framework with modules.
multi-threading networking portscan python scanner spider
Last synced: 01 Aug 2024
https://github.com/xdavidhu/portSpider
🕷 A lightning fast multithreaded network scanner framework with modules.
multi-threading networking portscan python scanner spider
Last synced: 30 Jul 2024
https://github.com/TRHX/Python3-Spider-Practice
Python3 各种爬虫实战练习,JS 逆向、反反爬、验证码处理、登录签到抽奖、数据可视化,Python 3 practice of various spiders.
jsreverse python python3-spider-practice spider spiders
Last synced: 03 Aug 2024
https://github.com/Symbo1/wsltools
Web Scan Lazy Tools - Python Package
crawling-framework package python-package scanner-web security security-audit security-automation security-scanner security-tool security-tools spider spider-framework web-vulnerability-scanner
Last synced: 04 Aug 2024
https://github.com/f111fei/article_spider
微信公众号爬虫
javascript spider typescript wechat
Last synced: 08 Aug 2024
https://github.com/DoiiarX/NLCISBNPlugin
基于中国国家图书馆ISBN检索的calibre的source/metadata插件
calibre-plugin isbn metadata spider
Last synced: 31 Jul 2024
https://github.com/whxaxes/node-test
Nodejs demo
bigpipe nodejs pjax spider websockets
Last synced: 01 Oct 2024
https://github.com/infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider
Last synced: 04 Aug 2024
https://github.com/yields/ant
A web crawler for Go
go golang scraper spider web-crawler
Last synced: 03 Oct 2024
https://github.com/cxapython/mybackup-IT
技术文章备份,安卓,js,汇编以及对应的逆向
android frida javascript spider
Last synced: 31 Jul 2024
https://github.com/kong36088/ZhihuSpider
多线程知乎用户爬虫,基于python3
crawler multi-threading python python3 spider zhihu
Last synced: 07 Aug 2024
https://github.com/hss01248/ImageLoader
a wrapper for glidev4, a solution for image load and big image preview, debug tool for imageview. image spiders on Android
fresco glide glidev4 imageloader spider
Last synced: 02 Aug 2024
https://github.com/manimozaffar/linkedin-scraper
A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel
bot browser-fingerprint browser-fingerprinting chatgpt chatgpt-api cralwer fastapi linkedin linkedin-bot playwright python scraper scraping spider sqlalchemy
Last synced: 30 Sep 2024
https://github.com/fnk0c/cangibrina
A fast and powerfull dashboard (admin) finder
Last synced: 04 Aug 2024
https://github.com/mryuan0428/house-price-prediction
房价预测完整项目:1.爬取链家网数据 2.处理后,用sklearn中几个逻辑回归机器学习模型和keras神经网络搭建模型预测房价 最终结果神经网络效果更好,R^2值0.75左右
house-price-prediction keras machine-learning sklearn spider
Last synced: 26 Sep 2024
https://github.com/dwisiswant0/galer
A fast tool to fetch URLs from HTML attributes by crawl-in.
crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls
Last synced: 01 Aug 2024
https://github.com/ManiMozaffar/linkedIn-scraper
A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel
bot browser-fingerprint browser-fingerprinting chatgpt chatgpt-api cralwer fastapi linkedin linkedin-bot playwright python scraper scraping spider sqlalchemy
Last synced: 01 Aug 2024
https://github.com/QIN2DIM/sspanel-mining
🥤 Collect, clean, classify, and store exposed SSPanel-Uim sites on the Internet.
python search-engine selenium spider sspanel sspanel-mining sspanel-uim
Last synced: 01 Aug 2024
https://github.com/qin2dim/sspanel-mining
🥤 Collect, clean, classify, and store exposed SSPanel-Uim sites on the Internet.
python search-engine selenium spider sspanel sspanel-mining sspanel-uim
Last synced: 01 Oct 2024
https://github.com/ovnrain/javbus-api
一个自我托管的 JavBus API 服务
adults api api-server crawler docker javbus magnet nodejs spider typescript vercel vercel-deployment
Last synced: 04 Aug 2024
https://github.com/elliotxx/zhihu-crawler-people
A simple distributed crawler for zhihu && data analysis
crawler python python-crawler spider web-crawler web-spider
Last synced: 31 Jul 2024
https://github.com/wangshibiaoflytiger/apiproject
[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)
alipay api-server compress cors gin-framework golang gomodule gorm headless jwt kafka livereload makefile mongo redis reverseproxy spider wxpay zap
Last synced: 26 Sep 2024
https://github.com/akynazh/tg-search-bot
A telegram bot for searching.
bot dmm jav pikpak python python3 redis-cache spider telegram telegram-bot wiki wikipedia
Last synced: 02 Aug 2024
https://github.com/wangshibiaoFlytiger/apiproject
[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)
alipay api-server compress cors gin-framework golang gomodule gorm headless jwt kafka livereload makefile mongo redis reverseproxy spider wxpay zap
Last synced: 01 Aug 2024
https://github.com/Jiramew/spoon
🥄 A package for building specific Proxy Pool for different Sites.
crawler distributed ip proxies proxy proxy-provider proxypool python redis spider spoon
Last synced: 01 Aug 2024
https://github.com/toobigdata/papa
一个浏览器端数据爬虫,做每个人的数据助手
chrome data-analysis kickstarter spider
Last synced: 01 Aug 2024
https://github.com/tijme/not-your-average-web-crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
bug-bounty callbacks crawler custom get post python request scanner scraper security spider vulnerability
Last synced: 04 Aug 2024
https://github.com/luohaha/jlitespider
A lite distributed Java spider framework :-)
crawler distributed distributed-systems rabbitmq spider
Last synced: 03 Aug 2024
https://github.com/twiny/spidy
Domain names collector - Crawl websites and collect domain names along with their availability status.
backlinks crawler domain expired-domain golang scraper seotools spider
Last synced: 01 Aug 2024
https://github.com/rxgirlz/openyspider
千万级图片爬虫、视频爬虫 [开源版本] Image Spider
image java mzsock rosi selenium selenium-webdriver spider spring-boot tangyun tujidao yalayi yande
Last synced: 28 Sep 2024
https://github.com/hominee/dyer
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
crawler rust rust-programming-language spider web-crawler web-framework web-scraping
Last synced: 01 Aug 2024
https://github.com/sjdirect/abotx
Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
abotx abotx-website cross-platform csharp csharp-library framework headless headless-br headless-browser javascript-renderer netcore netcore3 netstan netstandard netstandard-libraries netstandard20 spider spiders spiders- web-crawler
Last synced: 28 Sep 2024
https://github.com/luckylittle/blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
audiobooks blinkist books crawler data-archiving data-mining data-processing go golang scraper spider
Last synced: 31 Jul 2024
https://github.com/Escape-Technologies/graphinder
🕸️ Blazing fast GraphQL endpoints finder using subdomain enumeration, scripts analysis and bruteforce. 🕸️
bugbounty finder graphql osint reconnaissance security spider subdomain-enumeration subdomain-scanner
Last synced: 01 Aug 2024
https://github.com/adamdehaven/fetchurls
A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
bash-scripting crawl shell-script spider urls website wget
Last synced: 01 Aug 2024
https://github.com/leishufei/JS-Crack-Records
各大网站逆向demo。企名片、震坤行工业超市、天翼云登录、物超所值、瓜子二手车、马蜂窝、中华诗词库、澳门彩票、药智网、福建省招标投标在线监管平台、全国公共资源交易平台、问卷星、中国人民银行条法司、中华人民共和国公安部、AqiStudy、巨量星图、HeyTap、掌上高考、船讯网、百度指数、今日头条、知乎、七麦数据、途牛、七猫小说、企查查、同花顺、网易云音乐、拉勾招聘、玩物得志、房天下
Last synced: 02 Aug 2024