Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lb2281075105/Python-Spider
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath
Last synced: 30 Jun 2024
https://github.com/mtianyan/FunpySpiderSearchEngine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
django elasticsearch elasticsearch-analysis-ik lagou mysql python redis scrapy search-engine spider zhihu
Last synced: 30 Jun 2024
https://github.com/my8100/scrapydweb
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
dashboard log-analysis log-parsing scrapy scrapy-log-analysis scrapy-visualization scrapyd scrapyd-admin scrapyd-api scrapyd-cluster-management scrapyd-control scrapyd-keeper scrapyd-log-analysis scrapyd-manage scrapyd-monitor scrapyd-ui scrapyd-visualization spider
Last synced: 30 Jun 2024
https://github.com/QianyanTech/Image-Downloader
Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.
baidu bing google google-images image-downloader pyqt scrapy spider
Last synced: 27 Jun 2024
https://github.com/hemin1003/java-spider
一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。
elasticsearch scraper spider webmagic
Last synced: 27 Jun 2024
https://github.com/JAVClub/core
🔞 JAVClub - 让你的大姐姐不再走丢
adult adult-content google-drive japanese jav javbus javiewer magnet porn spider video-streaming
Last synced: 26 Jun 2024
https://github.com/xuxueli/xxl-crawler
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
crawler distributed flexible java object-oriented spider web xxl-crawler
Last synced: 26 Jun 2024
https://github.com/medcl/gopa-abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
crawler golang lightweight spider
Last synced: 26 Jun 2024
https://github.com/luohaha/jlitespider
A lite distributed Java spider framework :-)
crawler distributed distributed-systems rabbitmq spider
Last synced: 23 Jun 2024
https://github.com/clindet/bget
Portable command-line tool to query bioinformatics APIs, data, databases and files.
bioinformatics database spider
Last synced: 21 Jun 2024
https://github.com/wycm/zhihu-crawler
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Last synced: 20 Jun 2024
https://github.com/fengzhizi715/NetDiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3
Last synced: 20 Jun 2024
https://github.com/keenwon/antcolony
Nodejs实现的一个磁力链接爬虫 https://findit.keenwon.com (原域名http://findit.so )
antcolony bencode bittorrent dht javascript nodejs spider torrent
Last synced: 19 Jun 2024
https://github.com/leishufei/JS-Crack-Records
各大网站逆向demo。企名片、震坤行工业超市、天翼云登录、物超所值、瓜子二手车、马蜂窝、中华诗词库、澳门彩票、药智网、福建省招标投标在线监管平台、全国公共资源交易平台、问卷星、中国人民银行条法司、中华人民共和国公安部、AqiStudy、巨量星图、HeyTap、掌上高考、船讯网、百度指数、今日头条、知乎、七麦数据、途牛、七猫小说、企查查、同花顺、网易云音乐、拉勾招聘、玩物得志、房天下
Last synced: 19 Jun 2024
https://github.com/gadfly0x/signature_algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
crawler reverse-engineering spider
Last synced: 17 Jun 2024
https://github.com/JefferyHus/es6-crawler-detect
:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
bots crawler detection es6-javascript spider
Last synced: 16 Jun 2024
https://github.com/hss01248/ImageLoader
a wrapper for glidev4, a solution for image load and big image preview, debug tool for imageview. image spiders on Android
fresco glide glidev4 imageloader spider
Last synced: 15 Jun 2024
https://github.com/l4rm4nd/XingDumper
Python 3 script to dump/scrape/extract company employees from XING API
crawling employees osint profile python reconnaissance spider xing xing-api
Last synced: 14 Jun 2024
https://github.com/IAmStoxe/urlgrab
A golang utility to spider through a website searching for additional links.
Last synced: 14 Jun 2024
https://github.com/Nemo2011/bilibili-api
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
api bilibili bilibili-api python spider
Last synced: 13 Jun 2024
https://github.com/Randark-JMT/Bilibili_manga_download
带图形界面的哔哩哔哩漫画下载工具
bilibili crawler downloader pyside6 python python3 qt spider
Last synced: 13 Jun 2024
https://github.com/lihe07/bilibili_comics_downloader
Rust制作的BiliBili漫画下载器:无环境依赖,高性能,支持导出pdf、epub、zip
bilibili bilibili-download downloader epub pdf rust rust-lang spider
Last synced: 13 Jun 2024
https://github.com/niyuancheng/bilibili-service
提供B站的弹幕和视频下载服务,只需输入B站视频的bvid即可获取下载超清以上的高画质视频和弹幕池信息!!!
bilibili-api bilibili-download nodejs python3 spider video-streaming
Last synced: 13 Jun 2024
https://github.com/Montaro2017/bili_novel_packer
轻小说打包器,通过获取哔哩轻小说网站内容,将其打包成EPUB格式,支持封面、插图、目录,支持分卷合并。
dart epub epub-generation novel spider
Last synced: 13 Jun 2024
https://github.com/polyrabbit/hacker-news-digest
:newspaper: Let ChatGPT Summarize Hacker News for You
chatgpt chatgpt-api crawler data-extraction extract-summaries hacker-news hacker-news-digest hacker-news-reader machine-learning news-aggregator openai openai-api python rss spider
Last synced: 11 Jun 2024
https://github.com/Henryyy-Hung/Web-Crawler-of-Chinese-Fiction
基于python的中文网络小说爬虫/下载器,可以爬取并校对网络小说,输出txt文件
Last synced: 11 Jun 2024
https://github.com/wangshibiaoFlytiger/apiproject
[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)
alipay api-server compress cors gin-framework golang gomodule gorm headless jwt kafka livereload makefile mongo redis reverseproxy spider wxpay zap
Last synced: 09 Jun 2024
https://github.com/zachleat/glyphhanger
Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.
font glyphs spider subset subsetting unicode web-fonts webfonts
Last synced: 09 Jun 2024
https://github.com/Malwarize/webpalm
🕸️ Crawl in the web network
crawler crawling data data-science datamining go golang hack mining osint redteam spider tool
Last synced: 09 Jun 2024
https://github.com/coder-hxl/x-crawl
x-crawl is a flexible Node.js AI-assisted crawler library. Making crawler work more efficient, intelligent and convenient. ------ x-crawl 是一个灵活的 Node.js AI 辅助爬虫库。使爬虫工作变得更加高效、智能和便捷。(v10 版本已发布)
ai ai-crawl chromium crawl crawler fingerprint flexible javascript multifunction nodejs promise puppeteer spider typescript web
Last synced: 09 Jun 2024
https://github.com/tophubs/TopList
今日热榜,一个获取各大热门网站热门头条的聚合网站,使用Go语言编写,多协程异步快速抓取信息,预览:https://mo.fish
golang hot hotlist spider today-s-hot-list
Last synced: 08 Jun 2024
https://github.com/guyueyingmu/avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
adult adult-video avmoo crawler database guzzlehttp javbus javlibrary laravel magnet magnet-link scraper spider
Last synced: 06 Jun 2024
https://github.com/spiritLHLS/Hang-up-items
问卷调查项目,云服务器推荐,挂机项目,免费代理,各种脚本收集。欢迎右上角点铃铛及时收取更新信息。(不要fork,低调) Questionnaire project, cloud server recommendation, hanging project, free proxy, various script collection. Welcome to the upper right corner of the point bell to receive timely updates. (Do not fork, low profile)
bitping cash earnapp earnfm honeygain income iproyal money myst p2pclient packetstream passive passiveincome pawns proxyrack shared spider traffmonetizer vps
Last synced: 06 Jun 2024
https://github.com/omarhashem123/venom
Tool designed for fast crawl and extract endpoints
Last synced: 05 Jun 2024
https://github.com/wongzeon/ICP-Checker
ICP备案查询,可查询企业或域名的ICP备案信息,自动完成滑动验证,保存结果到Excel表格,适用于新版的工信部备案管理系统网站,告别频繁拖动验证,以及某站*工具要开通VIP才可查看备案信息的坑
beian icp information-gathering information-security osint-tool python python3 spider
Last synced: 05 Jun 2024
https://github.com/asaotomo/FofaMap
FofaMap是一款基于Python3开发的跨平台FOFA API数据采集器,支持普通查询、网站存活检测、统计聚合查询、Host聚合查询、网站图标查询、批量查询等查询功能。同时FofaMap还能够自定义查询FOFA数据,并根据查询结果自动去重和筛选关键字,生成对应的Excel表格。另外春节特别版还可以调用Nuclei对FofaMap查询出来的目标进行漏洞扫描,让你在挖洞路上快人一步。
api bat excel fofa nuclei python3 scan spider
Last synced: 05 Jun 2024
https://github.com/kingschan1204/istock
:point_right:一个基于spring boot 实现的java股票爬虫(仅支持A股),如果你:heart:请:star: . V2升级版正在开发中!
bootstrap echarts java jqgrid mongodb spider spring-boot stock vue2
Last synced: 04 Jun 2024
https://github.com/ManiMozaffar/linkedIn-scraper
A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel
bot browser-fingerprint browser-fingerprinting chatgpt chatgpt-api cralwer fastapi linkedin linkedin-bot playwright python scraper scraping spider sqlalchemy
Last synced: 03 Jun 2024
https://github.com/socketry/benchmark-http
async async-http benchmark concurrency latency spider
Last synced: 02 Jun 2024
https://github.com/ccforward/zhihu
✨ zhihu daily Node.js、Vue.js ...
node-vue nodejs spider vue vue2-vuex-webpack zhihu-daily
Last synced: 02 Jun 2024
https://github.com/bahaabdelwahed/killshot
A Penetration Testing Framework, Information gathering tool & Website Vulnerability Scanner
auto-scanner cms exploit information-gathering joomla spider vulnerability vulnerability-detection vulnerability-scanner webapp-vul-scanner website-vulnerability-scanner wordpress wp-admin
Last synced: 02 Jun 2024
https://github.com/1N3/BlackWidow
A Python based web application scanner to gather OSINT and fuzz for OWASP vulnerabilities on a target website.
active application automated bugbounty csrf fuzzer lfi osint owasp passive python rce rfi scan scanner spider sqli vulnerability web xss
Last synced: 02 Jun 2024
https://github.com/5ime/video_spider
短视频去水印:抖音,皮皮虾,火山,微视,微博,绿洲,最右,轻视频,快手,全民小视频,巴塞电影,陌陌,Before避风,开眼,Vue Vlog 小咖秀,皮皮搞笑,全民K歌,西瓜视频,逗拍,虎牙,6间房,梨视频,新片场,acfun,美拍...
Last synced: 31 May 2024
https://github.com/Evil0ctal/Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
api asgi async asyncio crawler douyin douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi httpx no-watermark online-parsing python pywebio scraper spider tiktok tiktok-scraper web-scraping
Last synced: 31 May 2024
https://github.com/crawlab-team/crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
crawlab crawler crawling-tasks docker go platform scrapy scrapyd-ui spider spiders-management web-crawler webcrawler webspider
Last synced: 31 May 2024
https://github.com/bookstairs/bookhunter
A download tools for clawing the ebooks from internets.
Last synced: 31 May 2024
https://github.com/thundernet8/AlipayOrdersSupervisor
:sparkles: 使用Node监视支付宝订单,即时通知服务器以实现免签约支付接口
Last synced: 31 May 2024
https://github.com/wechatsync/Wechatsync
一键同步文章到多个内容平台,支持今日头条、WordPress、知乎、简书、掘金、CSDN、typecho各大平台,一次发布,多平台同步发布。解放个人生产力
blog chrome chrome-extension markdown multiplatform spider vue wechat-official-account writer
Last synced: 31 May 2024
https://github.com/ShunCai/QZoneExport
QQ空间导出助手,用于备份QQ空间的说说、日志、私密日记、相册、视频、留言板、QQ好友、收藏夹、分享、最近访客为文件,便于迁移与保存
backup chrome chrome-extension chromium crx export qq qqzone qzone qzone-spider spider
Last synced: 31 May 2024
https://github.com/wnma3mz/wechat_articles_spider
微信公众号文章的爬虫
officialaccounts python36 spider wechat wechat-official-account
Last synced: 31 May 2024
https://github.com/1061700625/WeChat_Article
爬取微信公众号文章
pyqt5 python3 spider wechat wechat-article
Last synced: 31 May 2024
https://github.com/xboxeer/NScrapy
NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
distributed dotnet scrapy spider
Last synced: 31 May 2024
https://github.com/kangvcar/InfoSpider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。
automation chrome crawl csdn hotmail outlook python3 selenium spider tkinter wxpython
Last synced: 30 May 2024
https://github.com/yutto-dev/yutto
:ice_cube: 一个可爱且任性的 B 站视频下载器(bilili V2)
aiohttp asyncio bangumi bilibili coroutines cross-platform danmaku downloader spider video
Last synced: 30 May 2024
https://github.com/yutto-dev/bilili
:beers: bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
bilibili crawler danmaku download downloader multithread python3 requests spider subtitle video
Last synced: 30 May 2024
https://github.com/librauee/Reptile
🏀 Python3 网络爬虫实战(部分含详细教程)猫眼 腾讯视频 豆瓣 研招网 微博 笔趣阁小说 百度热点 B站 CSDN 网易云阅读 阿里文学 百度股票 今日头条 微信公众号 网易云音乐 拉勾 有道 unsplash 实习僧 汽车之家 英雄联盟盒子 大众点评 链家 LPL赛程 台风 梦幻西游、阴阳师藏宝阁 天气 牛客网 百度文库 睡前故事 知乎 Wish
python3 requests scrapy spider
Last synced: 30 May 2024
https://github.com/Boris-code/feapder
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度
crawler feapder feaplat python scrapy spider
Last synced: 30 May 2024
https://github.com/tijme/not-your-average-web-crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
bug-bounty callbacks crawler custom get post python request scanner scraper security spider vulnerability
Last synced: 30 May 2024
https://github.com/fnk0c/cangibrina
A fast and powerfull dashboard (admin) finder
Last synced: 30 May 2024
https://github.com/niespodd/browser-fingerprinting
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping
Last synced: 30 May 2024
https://github.com/shengqiangzhang/examples-of-web-crawlers
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
agent-pool crawler example fund multithreading pyquery python selenium spider stock taobao tmall wechat wechat-report wereader
Last synced: 27 May 2024
https://github.com/DoiiarX/NLCISBNPlugin
基于中国国家图书馆ISBN检索的calibre的source/metadata插件
calibre-plugin isbn metadata spider
Last synced: 27 May 2024
https://github.com/speed/newcrawler
Free Web Scraping Tool with Java
crawler docker scraping spider
Last synced: 26 May 2024
https://github.com/gsh199449/spider
A configurable web spider with a easy-to-use web console
cralwer gatherplatform spider text-mining web-console
Last synced: 26 May 2024
https://github.com/adamdehaven/fetchurls
A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
bash-scripting crawl shell-script spider urls website wget
Last synced: 26 May 2024
https://github.com/f111fei/article_spider
微信公众号爬虫
javascript spider typescript wechat
Last synced: 23 May 2024
https://github.com/akynazh/tg-search-bot
A telegram bot for searching.
bot dmm jav pikpak python python3 redis-cache spider telegram telegram-bot wiki wikipedia
Last synced: 23 May 2024
https://github.com/twiny/spidy
Domain names collector - Crawl websites and collect domain names along with their availability status.
backlinks crawler domain expired-domain golang scraper seotools spider
Last synced: 22 May 2024
https://github.com/liameno/librengine
Privacy Web Search Engine (not meta, own crawler)
cpp crawler encryption frontend privacy robots-txt rsa search-engine self-hosted spider websearch websearchengine
Last synced: 21 May 2024
https://github.com/kiddyuchina/Beanbun
Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。
Last synced: 19 May 2024
https://github.com/lorien/awesome-web-scraping
List of libraries, tools and APIs for web scraping and data processing.
captcha-bypass captcha-recaptcha crawler crawling crawling-framework crawling-python crawling-tool scraping scraping-framework scraping-python scraping-tool spider web-scraping webscraping
Last synced: 17 May 2024
https://github.com/okfn-brasil/querido-diario
📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.
artificial-intelligence civic-tech data-science governments-gazettes govtech hacktoberfest hacktoberfest2023 machine-learning open-data politics scraping spider
Last synced: 17 May 2024
https://github.com/DedSecInside/TorBot
Dark Web OSINT Tool
algorithm crawler dark-web dedsec-inside deepweb go hacking hacktoberfest osint projects psnappz python python-web-crawler python3 security security-tools spider tor tor-network torbot
Last synced: 15 May 2024
https://github.com/NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www
Last synced: 14 May 2024
https://github.com/Symbo1/wsltools
Web Scan Lazy Tools - Python Package
crawling-framework package python-package scanner-web security security-audit security-automation security-scanner security-tool security-tools spider spider-framework web-vulnerability-scanner
Last synced: 12 May 2024
https://github.com/ArchiveTeam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd
Last synced: 11 May 2024
https://github.com/spider-rs/spider
The fastest web crawler written in Rust. Maintained by @a11ywatch.
crawler headless-chrome indexer rust scraping spider
Last synced: 11 May 2024
https://github.com/Conso1eCowb0y/Deepminer
Deep web crawler and search engine
crawler crawling dark-web data-mining deepminer deepweb github hacking onion osint python-web-scraper python3 search-engine security security-tools spider the-onion-router tor tor-network webcrawler
Last synced: 10 May 2024