Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dev-chenxing/jjwxc-crawler
A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy | 基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档
https://github.com/dev-chenxing/jjwxc-crawler
chinese cli crawler docx download jjwxc open-source python scraping scrapy terminal word
Last synced: about 1 month ago
JSON representation
A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy | 基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档
- Host: GitHub
- URL: https://github.com/dev-chenxing/jjwxc-crawler
- Owner: dev-chenxing
- Created: 2024-03-11T01:34:06.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-11-08T10:23:15.000Z (about 1 month ago)
- Last Synced: 2024-11-08T11:25:35.890Z (about 1 month ago)
- Topics: chinese, cli, crawler, docx, download, jjwxc, open-source, python, scraping, scrapy, terminal, word
- Language: Python
- Homepage:
- Size: 7.05 MB
- Stars: 12
- Watchers: 1
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
简体中文 |
English
### 特点功能
- 命令行界面
- 支持输出 DOCX 和 TXT 格式
- 可自定义输出路径
- ...................有建议或 bug 可以提 issue.
命令行界面使用命令行 UI 库[Rich](https://github.com/Textualize/rich)编写。
界面样例:
# 安装文档
### 下载文件
点击 Code - Download ZIP,下载后解压缩得到文件夹,建议重命名为`jjwxc-crawler`
### 环境配置
- Python 3.9.15
- Windows安装 Python 后,第一步,打开所在目录的命令行,输入以下命令创建并激活虚拟环境
```powershell
python -m venv venv # 创建名为venv的Python虚拟环境
venv\Scripts\activate # Windows系统下激活虚拟环境venv
```在Linux系统下,
```bash
chmod +x venv/bin/activate
source venv/bin/activate
```此时命令行前应显示有`(venv)`,表示当前已激活虚拟环境`venv`
第二步,在虚拟环境内安装 Scrapy 和其他依赖
```powershell
pip install -r requirements.txt
```### 运行小程序
```powershell
# 进入程序所在目录
cd jjcrawler# 运行爬虫命令,其中ID为书号
scrapy crawl novel -a id=ID# 例如,我要下载书号为2的测试文,则运行以下命令行
scrapy crawl novel -a id=2
```下载章节将保存至根目录下的 novels 文件夹
默认输出格式为.docx,如果要更改为.txt 格式输出,可编辑`\jjcrawler\jjcrawler\spiders\config.py`中参数
```python
# docx | txt
format = "txt"
```下载一整页的小说
```bash
scrapy crawl novellist -a xx=3 -a sd=4 -a bq=39,45,124,313,314
```**[⬆ 回到顶部](#特点功能)**