Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with spider

A curated list of projects in awesome lists tagged with spider .

https://github.com/wycm/zhihu-crawler

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

crawler java spider zhihu

Last synced: 02 Aug 2024

https://github.com/oltarasenko/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider

Last synced: 01 Aug 2024

https://github.com/spider-rs/spider

The fastest web crawler written in Rust. Maintained by @a11ywatch.

ai-scraping crawler headless-chrome indexer llm-crawler rust scraping spider web-crawler

Last synced: 31 Jul 2024

https://github.com/elixir-crawly/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider

Last synced: 29 Sep 2024

https://github.com/madeindjs/spider

The fastest web crawler written in Rust. Maintained by @a11ywatch.

ai-scraping crawler headless-chrome indexer llm-crawler rust scraping spider web-crawler

Last synced: 03 Aug 2024

https://github.com/coder-hxl/x-crawl

x-crawl is a flexible Node.js AI-assisted crawler library. Making crawler work more efficient, intelligent and convenient. ------ x-crawl 是一个灵活的 Node.js AI 辅助爬虫库。使爬虫工作变得更加高效、智能和便捷。(v10 版本已发布)

ai ai-crawl chromium crawl crawler fingerprint flexible javascript multifunction nodejs promise puppeteer spider typescript web

Last synced: 27 Sep 2024

https://github.com/arry-lee/wereader

一个功能全面的微信读书笔记助手 wereader

notes python spider weread

Last synced: 31 Jul 2024

https://github.com/kingschan1204/istock

:point_right:一个基于spring boot 实现的java股票爬虫(仅支持A股),如果你:heart:请:star: . V2升级版正在开发中!

bootstrap echarts java jqgrid mongodb spider spring-boot stock vue2

Last synced: 27 Sep 2024

https://github.com/postmodern/spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

crawler ruby scraper spider spider-links web web-crawler web-scraper web-scraping web-spider

Last synced: 31 Jul 2024

https://github.com/wspl/creeper

:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

crawler cross-platform framework golang language script spider

Last synced: 31 Jul 2024

https://github.com/lb2281075105/python-spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath

Last synced: 28 Sep 2024

https://github.com/lb2281075105/Python-Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath

Last synced: 30 Jul 2024

https://github.com/yutto-dev/yutto

:ice_cube: 一个可爱且任性的 B 站视频下载器(bilili V2)

aiohttp asyncio bangumi bilibili coroutines cross-platform danmaku downloader spider video

Last synced: 31 Jul 2024

https://github.com/bookstairs/bookhunter

A download tools for clawing the ebooks from internets.

epub golang spider

Last synced: 31 Jul 2024

https://github.com/xnx3/templatespider

扒网站工具,看好哪个网站,指定好URL,自动扒下来做成模版。所见网站,皆可为我所用!

cms spider

Last synced: 04 Aug 2024

https://github.com/zachleat/glyphhanger

Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.

font glyphs spider subset subsetting unicode web-fonts webfonts

Last synced: 31 Jul 2024

https://github.com/xuxueli/xxl-crawler

A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)

crawler distributed flexible java object-oriented spider web xxl-crawler

Last synced: 02 Oct 2024

https://github.com/socialsisteryi/cxkitty

超星学习通答题姬(视频文档观看、模拟答题,无需浏览器、无需油猴,容器/host 运行ok!

beautifulsoup4 chaoxing chaoxingmooc python3 spider terminal-ui xuexitong

Last synced: 03 Oct 2024

https://github.com/zaxtyson/AnimeSearcher

整合第三方网站的视频和弹幕资源, 为白嫖党提供最佳看番追剧体验

bilibili cctv danmaku movies player spider

Last synced: 02 Aug 2024

https://github.com/bit4woo/domain_hunter

A Burp Suite Extension that try to find all sub-domain, similar-domain and related-domain of an organization automatically! 基于流量自动收集整个企业或组织的子域名、相似域名、相关域名的burp插件

burp-extensions burp-plugin burpsuite-extender certificate certification domain-discovery domain-hunter domains https-certificate organization-domain related-domain similar-domain sitemap spider subdomain subject-alternative-name subject-name subjectaltname

Last synced: 01 Aug 2024

https://github.com/fengzhizi715/netdiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3

Last synced: 28 Sep 2024

https://github.com/fengzhizi715/NetDiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3

Last synced: 02 Aug 2024

https://github.com/1061700625/WeChat_Article

爬取微信公众号文章

pyqt5 python3 spider wechat wechat-article

Last synced: 31 Jul 2024

https://github.com/speed/newcrawler

Free Web Scraping Tool with Java

crawler docker scraping spider

Last synced: 01 Aug 2024

https://github.com/chengyumeng/spider163

抓取网易云音乐热门评论

163 python spider

Last synced: 30 Jul 2024

https://github.com/alltheplaces/alltheplaces

A set of spiders and scrapers to extract location information from places that post their location on the internet.

geojson hacktoberfest python scrapers scrapy spider

Last synced: 12 Aug 2024

https://github.com/wongzeon/ICP-Checker

ICP备案查询,可查询企业或域名的ICP备案信息,自动完成滑动验证,保存结果到Excel表格,适用于新版的工信部备案管理系统网站,告别频繁拖动验证,以及某站*工具要开通VIP才可查看备案信息的坑

beian icp information-gathering information-security osint-tool python python3 spider

Last synced: 04 Aug 2024

https://github.com/spiritLHLS/Hang-up-items

问卷调查项目,云服务器推荐,挂机项目,免费代理,各种脚本收集。欢迎右上角点铃铛及时收取更新信息。(不要fork,低调) Questionnaire project, cloud server recommendation, hanging project, free proxy, various script collection. Welcome to the upper right corner of the point bell to receive timely updates. (Do not fork, low profile)

bitping cash earnapp earnfm honeygain income iproyal money myst p2pclient packetstream passive passiveincome pawns proxyrack shared spider traffmonetizer vps

Last synced: 01 Aug 2024

https://github.com/erma0/douyin

抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。

crawler douyin python spider

Last synced: 31 Jul 2024

https://github.com/dirtyfilthy/freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

crawler darknet hidden-services onion scraper spider tor

Last synced: 01 Aug 2024

https://github.com/stanzhai/Html2Article

Html网页正文提取

article content crawler html spider topic

Last synced: 04 Aug 2024

https://github.com/ChenZixinn/spider_reverse

爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链

crawler python requests spider

Last synced: 31 Jul 2024

https://github.com/facert/tumblr_spider

汤不热 python 多线程爬虫

python spider tumblr

Last synced: 31 Jul 2024

https://github.com/gadfly0x/signature_algorithm

各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

crawler reverse-engineering spider

Last synced: 02 Aug 2024

https://github.com/asaotomo/FofaMap

FofaMap是一款基于Python3开发的跨平台FOFA API数据采集器,支持普通查询、网站存活检测、统计聚合查询、Host聚合查询、网站图标查询、批量查询等查询功能。同时FofaMap还能够自定义查询FOFA数据,并根据查询结果自动去重和筛选关键字,生成对应的Excel表格。另外春节特别版还可以调用Nuclei对FofaMap查询出来的目标进行漏洞扫描,让你在挖洞路上快人一步。

api bat excel fofa nuclei python3 scan spider

Last synced: 04 Aug 2024

https://github.com/ccforward/zhihu

✨ zhihu daily Node.js、Vue.js ...

node-vue nodejs spider vue vue2-vuex-webpack zhihu-daily

Last synced: 01 Aug 2024

https://github.com/chenjiandongx/51job-spider

🔎 前程无忧 Python 招聘岗位信息爬取和分析

51job python spider

Last synced: 07 Aug 2024

https://github.com/barretlee/kindleBookMaker

Kindle Book Maker with KindleGen, Make Book from RSS/single URL/directory and so on.

book-generator kindle kindlegen rss spider

Last synced: 01 Aug 2024

https://github.com/sethsec/celerystalk

An asynchronous enumeration & vulnerability scanner. Run all the tools on all the hosts.

celery enumeration gobuster nessus nikto nmap scanning screenshot spider subdomain virtual-hosts vulnerability-assessment vulnerability-scanners

Last synced: 01 Aug 2024

https://github.com/cyubuchen/free_proxy_website

获取免费socks/https/http代理的网站集合

crawler free-proxy-list ip proxy proxy-checker spider

Last synced: 03 Aug 2024

https://github.com/knownsec/LSpider

LSpider 一个为被动扫描器定制的前端爬虫

python3 security spider

Last synced: 04 Aug 2024

https://github.com/hemin1003/java-spider

一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。

elasticsearch scraper spider webmagic

Last synced: 04 Aug 2024

https://github.com/xiyuan-fengyu/ppspider

web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案

angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript

Last synced: 26 Sep 2024

https://github.com/IAmStoxe/urlgrab

A golang utility to spider through a website searching for additional links.

spider

Last synced: 01 Aug 2024

https://github.com/xdavidhu/portspider

🕷 A lightning fast multithreaded network scanner framework with modules.

multi-threading networking portscan python scanner spider

Last synced: 01 Aug 2024

https://github.com/xdavidhu/portSpider

🕷 A lightning fast multithreaded network scanner framework with modules.

multi-threading networking portscan python scanner spider

Last synced: 30 Jul 2024

https://github.com/TRHX/Python3-Spider-Practice

Python3 各种爬虫实战练习,JS 逆向、反反爬、验证码处理、登录签到抽奖、数据可视化,Python 3 practice of various spiders.

jsreverse python python3-spider-practice spider spiders

Last synced: 03 Aug 2024

https://github.com/alanyang/dhtspider

Bittorrent dht network spider

bittorrent dht spider

Last synced: 03 Aug 2024

https://github.com/f111fei/article_spider

微信公众号爬虫

javascript spider typescript wechat

Last synced: 08 Aug 2024

https://github.com/DoiiarX/NLCISBNPlugin

基于中国国家图书馆ISBN检索的calibre的source/metadata插件

calibre-plugin isbn metadata spider

Last synced: 31 Jul 2024

https://github.com/infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider

Last synced: 04 Aug 2024

https://github.com/yields/ant

A web crawler for Go

go golang scraper spider web-crawler

Last synced: 03 Oct 2024

https://github.com/wolfbolin/BiliUtil

Bilibili.com视频批量下载工具包

bilibili pip python spider video

Last synced: 31 Jul 2024

https://github.com/cxapython/mybackup-IT

技术文章备份,安卓,js,汇编以及对应的逆向

android frida javascript spider

Last synced: 31 Jul 2024

https://github.com/myvyang/chromium_for_spider

dynamic crawler for web vulnerability scanner

chromium crawler puppeteer security spider

Last synced: 04 Aug 2024

https://github.com/cwjokaka/ok_ip_proxy_pool

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

aiohttp async beautifulsoup4 crawler flask http ip pool proxy proxypool py python python3 spider sqlite

Last synced: 01 Oct 2024

https://github.com/kong36088/ZhihuSpider

多线程知乎用户爬虫,基于python3

crawler multi-threading python python3 spider zhihu

Last synced: 07 Aug 2024

https://github.com/hss01248/ImageLoader

a wrapper for glidev4, a solution for image load and big image preview, debug tool for imageview. image spiders on Android

fresco glide glidev4 imageloader spider

Last synced: 02 Aug 2024

https://github.com/Denon/syncPlaylist

sync playlist between music platform

music python spider

Last synced: 07 Aug 2024

https://github.com/manimozaffar/linkedin-scraper

A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel

bot browser-fingerprint browser-fingerprinting chatgpt chatgpt-api cralwer fastapi linkedin linkedin-bot playwright python scraper scraping spider sqlalchemy

Last synced: 30 Sep 2024

https://github.com/fnk0c/cangibrina

A fast and powerfull dashboard (admin) finder

admin-finder python spider

Last synced: 04 Aug 2024

https://github.com/mryuan0428/house-price-prediction

房价预测完整项目:1.爬取链家网数据 2.处理后,用sklearn中几个逻辑回归机器学习模型和keras神经网络搭建模型预测房价 最终结果神经网络效果更好,R^2值0.75左右

house-price-prediction keras machine-learning sklearn spider

Last synced: 26 Sep 2024

https://github.com/dwisiswant0/galer

A fast tool to fetch URLs from HTML attributes by crawl-in.

crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls

Last synced: 01 Aug 2024

https://github.com/ManiMozaffar/linkedIn-scraper

A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel

bot browser-fingerprint browser-fingerprinting chatgpt chatgpt-api cralwer fastapi linkedin linkedin-bot playwright python scraper scraping spider sqlalchemy

Last synced: 01 Aug 2024

https://github.com/QIN2DIM/sspanel-mining

🥤 Collect, clean, classify, and store exposed SSPanel-Uim sites on the Internet.

python search-engine selenium spider sspanel sspanel-mining sspanel-uim

Last synced: 01 Aug 2024

https://github.com/qin2dim/sspanel-mining

🥤 Collect, clean, classify, and store exposed SSPanel-Uim sites on the Internet.

python search-engine selenium spider sspanel sspanel-mining sspanel-uim

Last synced: 01 Oct 2024

https://github.com/dantleech/fink

PHP Link Checker

link-checker php spider

Last synced: 04 Aug 2024

https://github.com/elliotxx/zhihu-crawler-people

A simple distributed crawler for zhihu && data analysis

crawler python python-crawler spider web-crawler web-spider

Last synced: 31 Jul 2024

https://github.com/wangshibiaoflytiger/apiproject

[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)

alipay api-server compress cors gin-framework golang gomodule gorm headless jwt kafka livereload makefile mongo redis reverseproxy spider wxpay zap

Last synced: 26 Sep 2024

https://github.com/wangshibiaoFlytiger/apiproject

[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)

alipay api-server compress cors gin-framework golang gomodule gorm headless jwt kafka livereload makefile mongo redis reverseproxy spider wxpay zap

Last synced: 01 Aug 2024

https://github.com/Jiramew/spoon

🥄 A package for building specific Proxy Pool for different Sites.

crawler distributed ip proxies proxy proxy-provider proxypool python redis spider spoon

Last synced: 01 Aug 2024

https://github.com/Karmenzind/fp-server

Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池

proxy proxypool python scrapy spider tornado

Last synced: 31 Jul 2024

https://github.com/fanhuaandluomu/pkulaw_spider

爬取北大法宝网http://www.pkulaw.cn/Case/

ai crawler law python-2 spider

Last synced: 01 Oct 2024

https://github.com/toobigdata/papa

一个浏览器端数据爬虫,做每个人的数据助手

chrome data-analysis kickstarter spider

Last synced: 01 Aug 2024

https://github.com/tijme/not-your-average-web-crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

bug-bounty callbacks crawler custom get post python request scanner scraper security spider vulnerability

Last synced: 04 Aug 2024

https://github.com/luohaha/jlitespider

A lite distributed Java spider framework :-)

crawler distributed distributed-systems rabbitmq spider

Last synced: 03 Aug 2024

https://github.com/twiny/spidy

Domain names collector - Crawl websites and collect domain names along with their availability status.

backlinks crawler domain expired-domain golang scraper seotools spider

Last synced: 01 Aug 2024

https://github.com/rxgirlz/openyspider

千万级图片爬虫、视频爬虫 [开源版本] Image Spider

image java mzsock rosi selenium selenium-webdriver spider spring-boot tangyun tujidao yalayi yande

Last synced: 28 Sep 2024

https://github.com/hominee/dyer

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

crawler rust rust-programming-language spider web-crawler web-framework web-scraping

Last synced: 01 Aug 2024

https://github.com/Escape-Technologies/graphinder

🕸️ Blazing fast GraphQL endpoints finder using subdomain enumeration, scripts analysis and bruteforce. 🕸️

bugbounty finder graphql osint reconnaissance security spider subdomain-enumeration subdomain-scanner

Last synced: 01 Aug 2024

https://github.com/adamdehaven/fetchurls

A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

bash-scripting crawl shell-script spider urls website wget

Last synced: 01 Aug 2024

https://github.com/leishufei/JS-Crack-Records

各大网站逆向demo。企名片、震坤行工业超市、天翼云登录、物超所值、瓜子二手车、马蜂窝、中华诗词库、澳门彩票、药智网、福建省招标投标在线监管平台、全国公共资源交易平台、问卷星、中国人民银行条法司、中华人民共和国公安部、AqiStudy、巨量星图、HeyTap、掌上高考、船讯网、百度指数、今日头条、知乎、七麦数据、途牛、七猫小说、企查查、同花顺、网易云音乐、拉勾招聘、玩物得志、房天下

js-reverse python spider

Last synced: 02 Aug 2024