Ecosyste.ms: Awesome

https://github.com/lb2281075105/Python-Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath

Last synced: 30 Jun 2024

https://github.com/chengyumeng/spider163

抓取网易云音乐热门评论

163 python spider

Last synced: 30 Jun 2024

https://github.com/mtianyan/FunpySpiderSearchEngine

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

django elasticsearch elasticsearch-analysis-ik lagou mysql python redis scrapy search-engine spider zhihu

Last synced: 30 Jun 2024

https://github.com/my8100/scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:

dashboard log-analysis log-parsing scrapy scrapy-log-analysis scrapy-visualization scrapyd scrapyd-admin scrapyd-api scrapyd-cluster-management scrapyd-control scrapyd-keeper scrapyd-log-analysis scrapyd-manage scrapyd-monitor scrapyd-ui scrapyd-visualization spider

Last synced: 30 Jun 2024

https://github.com/QianyanTech/Image-Downloader

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.

baidu bing google google-images image-downloader pyqt scrapy spider

Last synced: 27 Jun 2024

https://github.com/hemin1003/java-spider

一个基于webmagic框架二次开发的java爬虫框架实战，已实现能爬取腾讯，搜狐，今日头条（单独集成功能）等资讯内容，配合elasticsearch框架用法，实现了自动爬虫，已投入线上生产使用。

elasticsearch scraper spider webmagic

Last synced: 27 Jun 2024

https://github.com/OreosLab/checkinpanel

一个主要运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板，同时支持系统运行环境的签到项目（环境：𝑷𝒚𝒕𝒉𝒐𝒏 3.8+ / 𝑵𝒐𝒅𝒆.𝒋𝒔 10+ / 𝑩𝒂𝒔𝒉 4+ / 𝑶𝒑𝒆𝒏𝑱𝑫𝑲8 / 𝑷𝒆𝒓𝒍5）

alpine bash elecv2p java javascript json nodejs notify-module perl5 python3 qinglong requests selenium spider toml

Last synced: 26 Jun 2024

https://github.com/JAVClub/core

🔞 JAVClub - 让你的大姐姐不再走丢

adult adult-content google-drive japanese jav javbus javiewer magnet porn spider video-streaming

Last synced: 26 Jun 2024

https://github.com/xuxueli/xxl-crawler

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

crawler distributed flexible java object-oriented spider web xxl-crawler

Last synced: 26 Jun 2024

https://github.com/medcl/gopa-abandoned

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

crawler golang lightweight spider

Last synced: 26 Jun 2024

https://github.com/luohaha/jlitespider

A lite distributed Java spider framework :-)

crawler distributed distributed-systems rabbitmq spider

Last synced: 23 Jun 2024

https://github.com/wuomzfx/zhihu-spider

知乎爬虫程序，定时跟踪问题数据，定时推送热门话题

spider zhihu

Last synced: 22 Jun 2024

https://github.com/clindet/bget

Portable command-line tool to query bioinformatics APIs, data, databases and files.

bioinformatics database spider

Last synced: 21 Jun 2024

https://github.com/wycm/zhihu-crawler

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

crawler java spider zhihu

Last synced: 20 Jun 2024

https://github.com/fengzhizi715/NetDiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3

Last synced: 20 Jun 2024

https://github.com/entrepreneur-interet-general/OpenScraper

An open source webapp for scraping: towards a public service for webscraping

bulma entrepreneur-interet-general html mongodb python python2 scraper scrapy spider tornado xpath

Last synced: 20 Jun 2024

https://github.com/jumper2014/lianjia-beike-spider

链家网和贝壳网房价爬虫，采集北京上海广州深圳等21个中国主要城市的房价数据（小区，二手房，出租房，新房），稳定可靠快速！支持csv,MySQL, MongoDB,Excel, json存储，支持Python2和3，图表展示数据，注释丰富，点星支持，仅供学习参考，请勿用于商业用途，后果自负。

beike crawler house lianjia spider

Last synced: 20 Jun 2024

https://github.com/keenwon/antcolony

Nodejs实现的一个磁力链接爬虫 https://findit.keenwon.com (原域名http://findit.so )

antcolony bencode bittorrent dht javascript nodejs spider torrent

Last synced: 19 Jun 2024

https://github.com/leishufei/JS-Crack-Records

各大网站逆向demo。企名片、震坤行工业超市、天翼云登录、物超所值、瓜子二手车、马蜂窝、中华诗词库、澳门彩票、药智网、福建省招标投标在线监管平台、全国公共资源交易平台、问卷星、中国人民银行条法司、中华人民共和国公安部、AqiStudy、巨量星图、HeyTap、掌上高考、船讯网、百度指数、今日头条、知乎、七麦数据、途牛、七猫小说、企查查、同花顺、网易云音乐、拉勾招聘、玩物得志、房天下

js-reverse python spider

Last synced: 19 Jun 2024

https://github.com/gadfly0x/signature_algorithm

各种App、小程序、网站的请求签名或加密算法。现已有：自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

crawler reverse-engineering spider

Last synced: 17 Jun 2024

https://github.com/JefferyHus/es6-crawler-detect

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

bots crawler detection es6-javascript spider

Last synced: 16 Jun 2024

https://github.com/hss01248/ImageLoader

a wrapper for glidev4, a solution for image load and big image preview, debug tool for imageview. image spiders on Android

fresco glide glidev4 imageloader spider

Last synced: 15 Jun 2024

https://github.com/l4rm4nd/XingDumper

Python 3 script to dump/scrape/extract company employees from XING API

crawling employees osint profile python reconnaissance spider xing xing-api

Last synced: 14 Jun 2024

https://github.com/IAmStoxe/urlgrab

A golang utility to spider through a website searching for additional links.

spider

Last synced: 14 Jun 2024

https://github.com/Nemo2011/bilibili-api

哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址：https://github.com/MoyuScript/bilibili-api

api bilibili bilibili-api python spider

Last synced: 13 Jun 2024

https://github.com/Randark-JMT/Bilibili_manga_download

带图形界面的哔哩哔哩漫画下载工具

bilibili crawler downloader pyside6 python python3 qt spider

Last synced: 13 Jun 2024

https://github.com/lihe07/bilibili_comics_downloader

Rust制作的BiliBili漫画下载器：无环境依赖，高性能，支持导出pdf、epub、zip

bilibili bilibili-download downloader epub pdf rust rust-lang spider

Last synced: 13 Jun 2024

https://github.com/niyuancheng/bilibili-service

提供B站的弹幕和视频下载服务，只需输入B站视频的bvid即可获取下载超清以上的高画质视频和弹幕池信息！！！

bilibili-api bilibili-download nodejs python3 spider video-streaming

Last synced: 13 Jun 2024

https://github.com/Montaro2017/bili_novel_packer

轻小说打包器，通过获取哔哩轻小说网站内容，将其打包成EPUB格式，支持封面、插图、目录，支持分卷合并。

dart epub epub-generation novel spider

Last synced: 13 Jun 2024

https://github.com/polyrabbit/hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You

chatgpt chatgpt-api crawler data-extraction extract-summaries hacker-news hacker-news-digest hacker-news-reader machine-learning news-aggregator openai openai-api python rss spider

Last synced: 11 Jun 2024

https://github.com/CatVodTVOfficial/CatVodTVSpider

catvod crawler maotv player spider tv

Last synced: 11 Jun 2024

https://github.com/Henryyy-Hung/Web-Crawler-of-Chinese-Fiction

基于python的中文网络小说爬虫/下载器，可以爬取并校对网络小说，输出txt文件

chinese-fiction spider

Last synced: 11 Jun 2024

https://github.com/wangshibiaoFlytiger/apiproject

[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)

alipay api-server compress cors gin-framework golang gomodule gorm headless jwt kafka livereload makefile mongo redis reverseproxy spider wxpay zap

Last synced: 09 Jun 2024

https://github.com/pibigstar/go-demo

Go语言实例教程从入门到进阶，包括基础库使用、设计模式、面试易错点、工具类、对接第三方等

blockchain design go go-demo go-design go-utils goutils interview kafaka leetcode oss pprof qq redis spider

Last synced: 09 Jun 2024

https://github.com/jaeles-project/gospider

Gospider - Fast web spider written in Go

bugbounty crawler go gospider spider

Last synced: 09 Jun 2024

https://github.com/zachleat/glyphhanger

Your web font utility belt. It can subset web fonts. It can find unicode-ranges for you automatically. It makes julienne fries.

font glyphs spider subset subsetting unicode web-fonts webfonts

Last synced: 09 Jun 2024

https://github.com/Malwarize/webpalm

🕸️ Crawl in the web network

crawler crawling data data-science datamining go golang hack mining osint redteam spider tool

Last synced: 09 Jun 2024

https://github.com/coder-hxl/x-crawl

x-crawl is a flexible Node.js AI-assisted crawler library. Making crawler work more efficient, intelligent and convenient. ------ x-crawl 是一个灵活的 Node.js AI 辅助爬虫库。使爬虫工作变得更加高效、智能和便捷。（v10 版本已发布）

ai ai-crawl chromium crawl crawler fingerprint flexible javascript multifunction nodejs promise puppeteer spider typescript web

Last synced: 09 Jun 2024

https://github.com/tophubs/TopList

今日热榜，一个获取各大热门网站热门头条的聚合网站，使用Go语言编写，多协程异步快速抓取信息，预览:https://mo.fish

golang hot hotlist spider today-s-hot-list

Last synced: 08 Jun 2024

https://github.com/201206030/novel-plus

novel-plus 是一个多端（PC、WAP）阅读、功能完善的小说 CMS 系统。包括小说推荐、小说检索、小说排行、小说阅读、小说书架、小说评论、小说爬虫、会员中心、作家专区、充值订阅、新闻发布等功能。

book crawl novel read spider

Last synced: 07 Jun 2024

https://github.com/guyueyingmu/avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

adult adult-video avmoo crawler database guzzlehttp javbus javlibrary laravel magnet magnet-link scraper spider

Last synced: 06 Jun 2024

https://github.com/spiritLHLS/Hang-up-items

问卷调查项目，云服务器推荐，挂机项目，免费代理，各种脚本收集。欢迎右上角点铃铛及时收取更新信息。(不要fork，低调) Questionnaire project, cloud server recommendation, hanging project, free proxy, various script collection. Welcome to the upper right corner of the point bell to receive timely updates. (Do not fork, low profile)

bitping cash earnapp earnfm honeygain income iproyal money myst p2pclient packetstream passive passiveincome pawns proxyrack shared spider traffmonetizer vps

Last synced: 06 Jun 2024

https://github.com/omarhashem123/venom

Tool designed for fast crawl and extract endpoints

crawler python python3 spider

Last synced: 05 Jun 2024

https://github.com/knownsec/LSpider

LSpider 一个为被动扫描器定制的前端爬虫

python3 security spider

Last synced: 05 Jun 2024

https://github.com/wongzeon/ICP-Checker

ICP备案查询，可查询企业或域名的ICP备案信息，自动完成滑动验证，保存结果到Excel表格，适用于新版的工信部备案管理系统网站，告别频繁拖动验证，以及某站*工具要开通VIP才可查看备案信息的坑

beian icp information-gathering information-security osint-tool python python3 spider

Last synced: 05 Jun 2024

https://github.com/asaotomo/FofaMap

FofaMap是一款基于Python3开发的跨平台FOFA API数据采集器，支持普通查询、网站存活检测、统计聚合查询、Host聚合查询、网站图标查询、批量查询等查询功能。同时FofaMap还能够自定义查询FOFA数据，并根据查询结果自动去重和筛选关键字，生成对应的Excel表格。另外春节特别版还可以调用Nuclei对FofaMap查询出来的目标进行漏洞扫描，让你在挖洞路上快人一步。

api bat excel fofa nuclei python3 scan spider

Last synced: 05 Jun 2024

https://github.com/geziyor/geziyor

Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

crawler go scraper scraping spider

Last synced: 05 Jun 2024

https://github.com/kingschan1204/istock

:point_right:一个基于spring boot 实现的java股票爬虫(仅支持A股)，如果你:heart:请:star: . V2升级版正在开发中！

bootstrap echarts java jqgrid mongodb spider spring-boot stock vue2

Last synced: 04 Jun 2024

https://github.com/lrlna/puppeteer-walker

a puppeteer walker 🕷 🕸

chrome crawler headless puppeteer spider walker

Last synced: 03 Jun 2024

https://github.com/ManiMozaffar/linkedIn-scraper

A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel

bot browser-fingerprint browser-fingerprinting chatgpt chatgpt-api cralwer fastapi linkedin linkedin-bot playwright python scraper scraping spider sqlalchemy

Last synced: 03 Jun 2024

https://github.com/socketry/benchmark-http

async async-http benchmark concurrency latency spider

Last synced: 02 Jun 2024

https://github.com/ccforward/zhihu

✨ zhihu daily Node.js、Vue.js ...

node-vue nodejs spider vue vue2-vuex-webpack zhihu-daily

Last synced: 02 Jun 2024

https://github.com/bahaabdelwahed/killshot

A Penetration Testing Framework, Information gathering tool & Website Vulnerability Scanner

auto-scanner cms exploit information-gathering joomla spider vulnerability vulnerability-detection vulnerability-scanner webapp-vul-scanner website-vulnerability-scanner wordpress wp-admin

Last synced: 02 Jun 2024

https://github.com/1N3/BlackWidow

A Python based web application scanner to gather OSINT and fuzz for OWASP vulnerabilities on a target website.

active application automated bugbounty csrf fuzzer lfi osint owasp passive python rce rfi scan scanner spider sqli vulnerability web xss

Last synced: 02 Jun 2024

https://github.com/darbra/sperm

浏览过的精彩逆向文章汇总，值得一看

crawl crawler frida spider unidbg

Last synced: 01 Jun 2024

https://github.com/5ime/video_spider

短视频去水印：抖音,皮皮虾,火山,微视,微博,绿洲,最右,轻视频,快手,全民小视频,巴塞电影,陌陌,Before避风,开眼,Vue Vlog 小咖秀,皮皮搞笑,全民K歌,西瓜视频,逗拍,虎牙,6间房,梨视频,新片场,acfun,美拍...

php spider video

Last synced: 31 May 2024

https://github.com/Evil0ctal/Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

api asgi async asyncio crawler douyin douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi httpx no-watermark online-parsing python pywebio scraper spider tiktok tiktok-scraper web-scraping

Last synced: 31 May 2024

https://github.com/crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

crawlab crawler crawling-tasks docker go platform scrapy scrapyd-ui spider spiders-management web-crawler webcrawler webspider

Last synced: 31 May 2024

https://github.com/bookstairs/bookhunter

A download tools for clawing the ebooks from internets.

epub golang spider

Last synced: 31 May 2024

https://github.com/thundernet8/AlipayOrdersSupervisor

:sparkles: 使用Node监视支付宝订单，即时通知服务器以实现免签约支付接口

alipay nodejs spider

Last synced: 31 May 2024

https://github.com/wechatsync/Wechatsync

一键同步文章到多个内容平台，支持今日头条、WordPress、知乎、简书、掘金、CSDN、typecho各大平台，一次发布，多平台同步发布。解放个人生产力

blog chrome chrome-extension markdown multiplatform spider vue wechat-official-account writer

Last synced: 31 May 2024

https://github.com/ShunCai/QZoneExport

QQ空间导出助手，用于备份QQ空间的说说、日志、私密日记、相册、视频、留言板、QQ好友、收藏夹、分享、最近访客为文件，便于迁移与保存

backup chrome chrome-extension chromium crx export qq qqzone qzone qzone-spider spider

Last synced: 31 May 2024

https://github.com/wnma3mz/wechat_articles_spider

微信公众号文章的爬虫

officialaccounts python36 spider wechat wechat-official-account

Last synced: 31 May 2024

https://github.com/1061700625/WeChat_Article

爬取微信公众号文章

pyqt5 python3 spider wechat wechat-article

Last synced: 31 May 2024

https://github.com/xboxeer/NScrapy

NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider

distributed dotnet scrapy spider

Last synced: 31 May 2024

https://github.com/jinzhongjia/movie-getter

Go编写的影视资源采集器

golang gorm movies spider sqlite3

Last synced: 30 May 2024

https://github.com/kangvcar/InfoSpider

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

automation chrome crawl csdn hotmail outlook python3 selenium spider tkinter wxpython

Last synced: 30 May 2024

https://github.com/yutto-dev/yutto

:ice_cube: 一个可爱且任性的 B 站视频下载器（bilili V2）

aiohttp asyncio bangumi bilibili coroutines cross-platform danmaku downloader spider video

Last synced: 30 May 2024

https://github.com/yutto-dev/bilili

:beers: bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

bilibili crawler danmaku download downloader multithread python3 requests spider subtitle video

Last synced: 30 May 2024

https://github.com/librauee/Reptile

🏀 Python3 网络爬虫实战（部分含详细教程）猫眼腾讯视频豆瓣研招网微博笔趣阁小说百度热点 B站 CSDN 网易云阅读阿里文学百度股票今日头条微信公众号网易云音乐拉勾有道 unsplash 实习僧汽车之家英雄联盟盒子大众点评链家 LPL赛程台风梦幻西游、阴阳师藏宝阁天气牛客网百度文库睡前故事知乎 Wish

python3 requests scrapy spider

Last synced: 30 May 2024

https://github.com/Boris-code/feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单，功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度

crawler feapder feaplat python scrapy spider

Last synced: 30 May 2024

https://github.com/tijme/not-your-average-web-crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

bug-bounty callbacks crawler custom get post python request scanner scraper security spider vulnerability

Last synced: 30 May 2024

https://github.com/fnk0c/cangibrina

A fast and powerfull dashboard (admin) finder

admin-finder python spider

Last synced: 30 May 2024

https://github.com/niespodd/browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping

Last synced: 30 May 2024

https://github.com/stanzhai/Html2Article

Html网页正文提取

article content crawler html spider topic

Last synced: 28 May 2024

https://github.com/wkunzhi/Python3-Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

crawl crawler dianping geek meituan pyppeteer python scrapy scrapy-crawler selenium spider splash taobao

Last synced: 27 May 2024

https://github.com/shengqiangzhang/examples-of-web-crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

agent-pool crawler example fund multithreading pyquery python selenium spider stock taobao tmall wechat wechat-report wereader

Last synced: 27 May 2024

https://github.com/DoiiarX/NLCISBNPlugin

基于中国国家图书馆ISBN检索的calibre的source/metadata插件

calibre-plugin isbn metadata spider

Last synced: 27 May 2024

https://github.com/howie6879/owllook

owllook-小说搜索引擎

asyncio asyncio-spider biquge book crawler novel novels owllook python python3 qidian ruia sanic spider

Last synced: 26 May 2024

https://github.com/speed/newcrawler

Free Web Scraping Tool with Java

crawler docker scraping spider

Last synced: 26 May 2024

https://github.com/gsh199449/spider

A configurable web spider with a easy-to-use web console

cralwer gatherplatform spider text-mining web-console

Last synced: 26 May 2024

https://github.com/adamdehaven/fetchurls

A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

bash-scripting crawl shell-script spider urls website wget

Last synced: 26 May 2024

https://github.com/rugantio/fbcrawl

A Facebook crawler

crawl crawler facebook python scraper scrapy spider

Last synced: 26 May 2024

https://github.com/f111fei/article_spider

微信公众号爬虫

javascript spider typescript wechat

Last synced: 23 May 2024

https://github.com/zaxtyson/AnimeSearcher

整合第三方网站的视频和弹幕资源, 为白嫖党提供最佳看番追剧体验

bilibili cctv danmaku movies player spider

Last synced: 23 May 2024

https://github.com/akynazh/tg-search-bot

A telegram bot for searching.

bot dmm jav pikpak python python3 redis-cache spider telegram telegram-bot wiki wikipedia

Last synced: 23 May 2024

https://github.com/twiny/spidy

Domain names collector - Crawl websites and collect domain names along with their availability status.

backlinks crawler domain expired-domain golang scraper seotools spider

Last synced: 22 May 2024

https://github.com/liameno/librengine

Privacy Web Search Engine (not meta, own crawler)

cpp crawler encryption frontend privacy robots-txt rsa search-engine self-hosted spider websearch websearchengine

Last synced: 21 May 2024

https://github.com/kiddyuchina/Beanbun

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman。

beanbun crawler php spider

Last synced: 19 May 2024

https://github.com/lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

captcha-bypass captcha-recaptcha crawler crawling crawling-framework crawling-python crawling-tool scraping scraping-framework scraping-python scraping-tool spider web-scraping webscraping

Last synced: 17 May 2024

https://github.com/okfn-brasil/querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

artificial-intelligence civic-tech data-science governments-gazettes govtech hacktoberfest hacktoberfest2023 machine-learning open-data politics scraping spider

Last synced: 17 May 2024

https://github.com/jhao104/proxy_pool

Python ProxyPool for web spider

crawler http proxy redis spider

Last synced: 15 May 2024

https://github.com/DedSecInside/TorBot

Dark Web OSINT Tool

algorithm crawler dark-web dedsec-inside deepweb go hacking hacktoberfest osint projects psnappz python python-web-crawler python3 security security-tools spider tor tor-network torbot

Last synced: 15 May 2024

https://github.com/zkqiang/awesome-python-primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

awesome awesome-list book crawler django flask learn learning mechine-learning prime primer python python3 scraping spider spiders web

Last synced: 14 May 2024

https://github.com/NaiboWang/EasySpider

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。

batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www

Last synced: 14 May 2024

https://github.com/Symbo1/wsltools

Web Scan Lazy Tools - Python Package

crawling-framework package python-package scanner-web security security-audit security-automation security-scanner security-tool security-tools spider spider-framework web-vulnerability-scanner

Last synced: 12 May 2024

https://github.com/ArchiveTeam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd

Last synced: 11 May 2024

https://github.com/spider-rs/spider

The fastest web crawler written in Rust. Maintained by @a11ywatch.

crawler headless-chrome indexer rust scraping spider

Last synced: 11 May 2024

https://github.com/Conso1eCowb0y/Deepminer

Deep web crawler and search engine

crawler crawling dark-web data-mining deepminer deepweb github hacking onion osint python-web-scraper python3 search-engine security security-tools spider the-onion-router tor tor-network webcrawler

Last synced: 10 May 2024

https://github.com/elliotgao2/gain

Web crawling framework based on asyncio.

aiohttp asyncio crawler python spider uvloop

Last synced: 09 May 2024

Accepted Projects