https://github.com/mccranky83/aistudy-docs-crawler
上海市中小学数字教学系统爬虫
https://github.com/mccranky83/aistudy-docs-crawler
crawler hoarding puppeteer
Last synced: about 1 year ago
JSON representation
上海市中小学数字教学系统爬虫
- Host: GitHub
- URL: https://github.com/mccranky83/aistudy-docs-crawler
- Owner: Mccranky83
- Archived: true
- Created: 2024-07-23T14:04:36.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-23T12:57:49.000Z (over 1 year ago)
- Last Synced: 2025-02-16T00:23:10.282Z (over 1 year ago)
- Topics: crawler, hoarding, puppeteer
- Language: JavaScript
- Homepage:
- Size: 1.16 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AiStudy Documents Crawler

> Due to the official notice that `本旧版过渡使用期至2025年7月31日`, I'm archiving this project.
This project encompasses a sophisticated web crawler engineered to systematically acquire educational resources from the [上海市中小学数字教学系统](https://sz-api.ai-study.net/).
> The crawler leverages Puppeteer, a Node.js library, to simulate human-like interactions with the Chromium browser, enabling the efficient extraction of download links.
> Subsequently, the tool employs the `curl` command-line utility to facilitate the recursive downloading of these resources to the local system.
## Installation
```bash
# Clone the repository
npm i # Installs project dependencies, including compatible Chrome
npm run start # Executes the start script, which runs `app/start.js`
```
## Examples
**Crawl first**
prompt> npm run start
Directly download or crawl first? (d/C)
Run in headless mode? (Y/n)
subjectIndex [1-17]: 1
Crawl documents or answer sheets? (D/a)
subjectIndex [1-2]: 1
Startup grade [Default: 0]:
Offset [Default: 100]:
Startup semester [Default: 0]:
Offset [Default: 100]:
Startup unit [Default: 0]:
Offset [Default: 100]:
Startup course [Default: 0]:
Offset [Default: 100]:
**Direct download**
prompt> npm run start
Directly download or crawl first? (d/C) d
劳动 - 6.json
sitemapName:
**No available linkmaps**
prompt> npm run start
Directly download or crawl first? (d/C) d
No linkmaps available!