https://github.com/dev-chenxing/jjwxc-crawler
基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档 | A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy
https://github.com/dev-chenxing/jjwxc-crawler
chinese cli crawler docx download jjwxc open-source python scraping scrapy terminal word
Last synced: about 1 year ago
JSON representation
基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档 | A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy
- Host: GitHub
- URL: https://github.com/dev-chenxing/jjwxc-crawler
- Owner: dev-chenxing
- Created: 2024-03-11T01:34:06.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-27T00:41:06.000Z (over 1 year ago)
- Last Synced: 2025-03-23T21:46:01.525Z (about 1 year ago)
- Topics: chinese, cli, crawler, docx, download, jjwxc, open-source, python, scraping, scrapy, terminal, word
- Language: Python
- Homepage:
- Size: 7.06 MB
- Stars: 13
- Watchers: 1
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
简体中文 |
English
### 特点功能
- 命令行界面
- 支持输出 DOCX 和 TXT 格式
- 可自定义输出路径
- ...................
有建议或 bug 可以提 issue.
命令行界面使用命令行 UI 库[Rich](https://github.com/Textualize/rich)编写。
界面样例:
# 安装文档
### 下载文件
点击 Code - Download ZIP,下载后解压缩得到文件夹,建议重命名为`jjwxc-crawler`
### 环境配置
- Python 3.9.15
- Windows
安装 Python 后,第一步,打开所在目录的命令行,输入以下命令创建并激活虚拟环境
```powershell
python -m venv venv # 创建名为venv的Python虚拟环境
venv\Scripts\activate # Windows系统下激活虚拟环境venv
```
在Linux系统下,
```bash
chmod +x venv/bin/activate
source venv/bin/activate
```
此时命令行前应显示有`(venv)`,表示当前已激活虚拟环境`venv`
第二步,在虚拟环境内安装 Scrapy 和其他依赖
```powershell
pip install -r requirements.txt
```
### 运行小程序
```powershell
# 进入程序所在目录
cd jjcrawler
# 运行爬虫命令,其中ID为书号
scrapy crawl novel -a id=ID
# 例如,我要下载书号为2的测试文,则运行以下命令行
scrapy crawl novel -a id=2
```
下载章节将保存至根目录下的 novels 文件夹
默认输出格式为.docx,如果要更改为.txt 格式输出,可编辑`\jjcrawler\jjcrawler\spiders\config.py`中参数
```python
# docx | txt
format = "txt"
```
下载一整页的小说
```bash
# 无CP-女主视角-仙侠修真标签
scrapy crawl novellist -a xx=5 -a mainview=2 -a bq=68
# 无CP-女主视角-古色古香-仙侠类型
scrapy crawl novellist -a xx=5 -a mainview=2 -a sd=2 -a lx=4
# 衍生-百合-武侠
scrapy crawl novellist -a yc=2 -a xx=3 -a bq=11
# 标题含有"神雕"的百合小说 (Experiental option, still in development)
scrapy crawl novellist -a title=神雕 -a xx=3
```
**[⬆ 回到顶部](#特点功能)**
